当前位置：首页 > news >正文

【Python百日进阶-Web开发-Peewee】Day279 - SQLite 扩展（四）

news 2026/5/12 3:46:45

文章目录

- - 12.2.10 class FTSModel

12.2.10 class FTSModel

class FTSModel

与FTS3 和 FTS4 全文搜索扩展VirtualModel一起使用的子类。

FTSModel 子类应该正常定义，但是有几个注意事项：

不支持唯一约束、非空约束、检查约束和外键。
字段索引和多列索引被完全忽略
Sqlite 会将所有列类型视为TEXT（尽管您可以存储其他数据类型，但 Sqlite 会将它们视为文本）。
FTS 模型包含一个rowid由 SQLite 自动创建和管理的字段（除非您选择在模型创建期间显式设置它）。此列的查找快速而有效。
鉴于这些约束，强烈建议在FTSModel子类上声明的所有字段都是的实例 SearchField（尽管显式声明 a 时例外RowIDField）。使用SearchField将有助于防止您意外创建无效的列约束。如果您希望将元数据存储在索引中，但不希望将其包含在全文索引中，则unindexed=True在实例化 SearchField.

上述情况的唯一例外是rowid主键，可以使用RowIDField. 查找rowid非常有效。如果您使用的是 FTS4，您也可以使用DocIDField，这是 rowid 的别名（尽管这样做没有任何好处）。

rowid由于缺少二级索引，因此将主键用作指向常规表中行的指针通常是有意义的。例如：

class Document(Model):# Canonical source of data, stored in a regular table.author = ForeignKeyField(User, backref='documents')title = TextField(null=False, unique=True)content = TextField(null=False)timestamp = DateTimeField()class Meta:database = dbclass DocumentIndex(FTSModel):# Full-text search index.rowid = RowIDField()title = SearchField()content = SearchField()class Meta:database = db# Use the porter stemming algorithm to tokenize content.options = {'tokenize': 'porter'}

要将文档存储在文档索引中，我们将INSERT一行放入DocumentIndex表中，手动设置rowid，使其与相应的主键匹配Document：

def store_document(document):DocumentIndex.insert({DocumentIndex.rowid: document.id,DocumentIndex.title: document.title,DocumentIndex.content: document.content}).execute()

要执行搜索并返回排名结果，我们可以查询 Document表并在DocumentIndex. 这种连接会很有效，因为在 FTSModelrowid字段上的查找速度很快：

def search(phrase):# Query the search index and join the corresponding Document# object on each search result.return (Document.select().join(DocumentIndex,on=(Document.id == DocumentIndex.rowid)).where(DocumentIndex.match(phrase)).order_by(DocumentIndex.bm25()))

警告
除了全文搜索和查找之外，所有关于类的 SQL 查询FTSModel都将是全表扫描。rowid

如果要索引的内容的主要来源存在于单独的表中，则可以通过指示 SQLite 不存储搜索索引内容的附加副本来节省一些磁盘空间。SQLite 仍将创建对内容执行搜索所需的元数据和数据结构，但内容本身不会存储在搜索索引中。

为此，您可以使用该content 选项指定表或列。FTS4 文档有更多信息。

这是一个简短的示例，说明如何使用 peewee 实现此功能：

class Blog(Model):title = TextField()pub_date = DateTimeField(default=datetime.datetime.now)content = TextField()  # We want to search this.class Meta:database = dbclass BlogIndex(FTSModel):content = SearchField()class Meta:database = dboptions = {'content': Blog.content}  # <-- specify data source.db.create_tables([Blog, BlogIndex])# Now, we can manage content in the BlogIndex. To populate the
# search index:
BlogIndex.rebuild()# Optimize the index.
BlogIndex.optimize()

该content选项接受 singleField或 a Model并且可以减少database文件使用的存储量。但是，内容将需要手动移入/移出关联的FTSModel.

classname match(term)

参数： term– 搜索词或表达。
生成表示在表中搜索给定术语或表达式的 SQL 表达式。SQLite 使用MATCH运算符来指示全文搜索。

例子：

# Search index for "search phrase" and return results ranked
# by relevancy using the BM25 algorithm.
query = (DocumentIndex.select().where(DocumentIndex.match('search phrase')).order_by(DocumentIndex.bm25()))
for result in query:print('Result: %s' % result.title)

classmethod search(term[, weights=None[, with_score=False[, score_alias=‘score’[, explicit_ordering=False]]]])

参数：

term ( str ) – 要使用的搜索词。
weights – 列的权重列表，根据列在表中的位置排序。或者，以字段或字段名称为键并映射到值的字典。
with_score – 分数是否应作为SELECT语句的一部分返回。
score_alias ( str ) – 用于计算排名分数的别名。这是您将用于访问分数的属性 if with_score=True。
explicit_ordering ( bool ) – 使用完整的 SQL 函数来计算排名，而不是简单地在 ORDER BY 子句中引用分数别名。
搜索术语并按匹配质量对结果进行排序的简写方式。

笔记
该方法使用简化的算法来确定结果的相关等级。要获得更复杂的结果排名，请使用该search_bm25()方法。

# Simple search.
docs = DocumentIndex.search('search term')
for result in docs:print(result.title)# More complete example.
docs = DocumentIndex.search('search term',weights={'title': 2.0, 'content': 1.0},with_score=True,score_alias='search_score')
for result in docs:print(result.title, result.search_score)

classmethod search_bm25(term[, weights=None[, with_score=False[, score_alias=‘score’[, explicit_ordering=False]]]])

参数：

term ( str ) – 要使用的搜索词。
weights – 列的权重列表，根据列在表中的位置排序。或者，以字段或字段名称为键并映射到值的字典。
with_score – 分数是否应作为SELECT语句的一部分返回。
score_alias ( str ) – 用于计算排名分数的别名。这是您将用于访问分数的属性 if with_score=True。
explicit_ordering ( bool ) – 使用完整的 SQL 函数来计算排名，而不是简单地在 ORDER BY 子句中引用分数别名。
使用 BM25 算法根据匹配质量搜索术语和排序结果的简写方式。

注意
BM25 排名算法仅适用于 FTS4。如果您使用的是 FTS3，请改用该search()方法。

classmethod search_bm25f(term[, weights=None[, with_score=False[, score_alias=‘score’[, explicit_ordering=False]]]])

与相同FTSModel.search_bm25()，但使用 BM25 排名算法的 BM25f 变体。

classmethod search_lucene(term[, weights=None[, with_score=False[, score_alias=‘score’[, explicit_ordering=False]]]])

与相同FTSModel.search_bm25()，但使用来自 Lucene 搜索引擎的结果排名算法。

classname rank([col1_weight , col2_weight…coln_weight])

参数： col_weight( float ) - (可选) 赋予模型第 i列的权重。默认情况下，所有列的权重为1.0.
生成将计算并返回搜索匹配质量的表达式。这rank可用于对搜索结果进行排序。较高的排名分数表示更好的匹配。

该rank函数接受允许您为各个列指定权重的可选参数。如果未指定权重，则认为所有列都具有同等重要性。

笔记

使用的算法rank()简单且相对较快。要获得更复杂的结果排名，请使用：

bm25()
bm25f()
lucene()

query = (DocumentIndex.select(DocumentIndex,DocumentIndex.rank().alias('score')).where(DocumentIndex.match('search phrase')).order_by(DocumentIndex.rank()))for search_result in query:print(search_result.title, search_result.score)

classname bm25([col1_weight , col2_weight…coln_weight])

参数： col_weight( float ) - (可选) 赋予模型第 i列的权重。默认情况下，所有列的权重为1.0.
生成一个表达式，该表达式将使用BM25 算法计算并返回搜索匹配的质量。该值可用于对搜索结果进行排序，分数越高，匹配越好。

像rank()，bm25function 接受可选参数，允许您为各个列指定权重。如果未指定权重，则认为所有列都具有同等重要性。

注意
BM25结果排名算法需要FTS4。如果您使用的是 FTS3，请rank()改用。

query = (DocumentIndex.select(DocumentIndex,DocumentIndex.bm25().alias('score')).where(DocumentIndex.match('search phrase')).order_by(DocumentIndex.bm25()))for search_result in query:print(search_result.title, search_result.score)

笔记
上面的代码示例等价于调用 search_bm25()方法：

query = DocumentIndex.search_bm25('search phrase', with_score=True)
for search_result in query:print(search_result.title, search_result.score)

classname bm25f([col1_weight , col2_weight…coln_weight])

与相同bm25()，只是它使用 BM25 排名算法的 BM25f 变体。

classname lucene([col1_weight , col2_weight…coln_weight])

与相同bm25()，只是它使用 Lucene 搜索结果排名算法。

classname rebuild()

重建搜索索引——这仅content在创建表期间指定选项时有效。

classname optimize()

优化搜索索引。

【Python百日进阶-Web开发-Peewee】Day279 - SQLite 扩展（四）

文章目录

12.2.10 class FTSModel

相关文章：

【Python百日进阶-Web开发-Peewee】Day279 - SQLite 扩展（四）

Postman接口压力测试 ---- Tests使用（断言）

nvue文件中@click.stop失效

【微信小程序开发】宠物预约医疗项目实战-开发功能介绍

vue网页缓存页面与不缓存页面处理

AI系统论文阅读：SmartMoE

AD20多层板设计中的平电层设计规则

压力测试有哪些评价指标

简单 php结合WebUploader实现文件上传功能

Pandas数据分析一览-短期内快速学会数据分析指南(文末送书)

应用程序分类与相关基本概念介绍

springcloude gateway的意义

重新定义每天进步一点点

代码随想录算法训练营第51天 | ● 309.最佳买卖股票时机含冷冻期 ● 714.买卖股票的最佳时机含手续费

李佳琦掉粉，国货品牌却从“商战大剧”走向“情景喜剧”

linux 下 C++ 与三菱PLC 通过MC Qna3E 二进制协议进行交互

Spring基础（2w字---学习总结版）

07 目标检测-YOLO的基本原理详解

每日一题 78子集（模板）

OpenCV之形态学操作

图解人工智能（8）图灵测试作为智能与否的标准

告别运行库安装烦恼：Visual C++ AIO合集一键搞定所有版本

5分钟快速上手：如何用Video2X免费AI工具让老旧视频焕发4K新生

嵌入式固件安全更新与密钥管理实践

GDScript Mod Loader：为Godot游戏打造专业模组生态的完整指南

AI技能文件管理工具agent-skills-lint：多助手环境下的统一质检方案

Cursor-Buddy：基于AI的Web界面语音交互与视觉引导助手

别只盯着SQL了！GaussDB健康度巡检，这5个‘外围’命令和日志文件更重要

41《CAN总线报文周期、抖动与实时性分析》

实测MPU6050低功耗电流：从Sleep到Cycle模式，不同唤醒频率下功耗到底差多少？