`reading` 模块¶

此模块包含允许从索引中读取的类。

Classes¶

class whoosh.reading.IndexReader¶

不要直接实例化此对象。而是使用index.reader（）。

all_doc_ids()¶: 返回读卡器中所有（未删除）文档ID的迭代器。

all_stored_fields()¶: 生成所有未删除文档的存储字段。

abstract all_terms()¶: 为索引中的每个词生成（fieldname，text）元组。

close()¶: 关闭与此读卡器关联的打开文件。

codec()¶: 返回 whoosh.codec.base.Codec 对象，用于读取此读卡器的段。如果这个阅读器不是原子的（ reader.is_atomic() == True ）返回无。

column_reader(fieldname, column=None, reverse=False, translate=False)¶

参数

fieldname -- 要为其获取读卡器的字段的名称。
column -- 如果传递，则使用此列对象而不是与架构中的字段关联的对象。
reverse -- 如果通过，则反转读卡器返回的键的顺序。 sort_key() 方法。If the column type is not reversible, this will raise a NotImplementedError .
translate -- 如果为真，则包装读卡器以调用字段的 from_bytes() 方法。

返回

一 whoosh.columns.ColumnReader 对象。

corrector(fieldname)¶: 返回A whoosh.spelling.Corrector object that suggests corrections based on the terms in the given field.

abstract doc_count()¶: 返回此读卡器中未删除的文档总数。

abstract doc_count_all()¶: 返回此读卡器中已删除或未删除的文档总数。

abstract doc_field_length(docnum, fieldname, default=0)¶: 返回给定文档中给定字段中的术语数。这是一些评分算法使用的。

abstract doc_frequency(fieldname, text)¶: 返回给定术语出现在多少文档中。

expand_prefix(fieldname, prefix)¶: 在给定字段中生成以给定前缀开头的术语。

abstract field_length(fieldname)¶: 返回给定字段中的术语总数。这是一些评分算法使用的。

field_terms(fieldname)¶: 生成给定字段中的所有术语值（从磁盘字节转换）。

first_id(fieldname, text)¶: 返回给定期限的过账列表中的第一个ID。这可以在某些后端进行优化。

abstract frequency(fieldname, text)¶: 返回集合中给定术语的实例总数。

generation()¶: 返回正在读取的索引的生成，如果后端没有版本控制，则返回-1。

abstract has_deletions()¶: 如果基础索引/段已删除文档，则返回true。

abstract has_vector(docnum, fieldname)¶: 如果给定文档具有给定字段的术语向量，则返回true。

abstract indexed_field_names()¶: 返回表示索引字段名称的字符串的ITerable。如果使用“glob”字段，则可能包括架构中未显式列出的其他名称。

abstract is_deleted(docnum)¶: 如果给定的文档编号标记为“已删除”，则返回“真”。

iter_docs()¶: 生成一系列 (docnum, stored_fields_dict) 读卡器中未删除文档的元组。

iter_field(fieldname, prefix='')¶: 为给定字段中的所有项生成（text，terminfo）元组。

iter_from(fieldname, text)¶: 从给定的术语开始，为读卡器中的所有术语生成（（fieldname，text），terminfo）元组。

iter_postings()¶: 低级方法，在读卡器中生成 (fieldname, text, docnum, weight, valuestring) 元组。

iter_prefix(fieldname, prefix)¶: 为给定字段中具有特定前缀的所有术语生成（text，terminfo）元组。

leaf_readers()¶: 如果此读卡器是复合读卡器，则返回此读卡器的子读卡器的（indexreader，docbase）对列表。如果这不是复合读卡器，则返回“[（self，0）]”。

lexicon(fieldname)¶: 生成给定字段中的所有字节。

abstract max_field_length(fieldname)¶: 返回所有文档中字段的最小长度。这是一些评分算法使用的。

abstract min_field_length(fieldname)¶: 返回所有文档中字段的最小长度。这是一些评分算法使用的。

most_distinctive_terms(fieldname, number=5, prefix='')¶: 返回最高的“数字”项 tf*idf 分数作为（分数、文本）元组的列表。

most_frequent_terms(fieldname, number=5, prefix='')¶: 以（frequency，text）元组列表的形式返回给定字段中最频繁出现的顶级“number”项。

abstract postings(fieldname, text)¶

返回A Matcher 对于给定术语的过账。

>>> pr = reader.postings("content", "render")
>>> pr.skip_to(10)
>>> pr.id
12

参数

fieldname -- 术语的字段名或字段号。
text -- 术语的文本。

返回类型

whoosh.matching.Matcher

segment()¶: 返回 whoosh.index.Segment 此读取器使用的对象。如果这个阅读器不是原子的（ reader.is_atomic() == True ）返回无。

storage()¶: 返回 whoosh.filedb.filestore.Storage 此读卡器用于读取其文件的对象。如果读卡器不是原子的，（ reader.is_atomic() == True ）返回无。

abstract stored_fields(docnum)¶

返回给定文档编号的存储字段。

参数: numerickeys -- 使用字段编号作为字典键，而不是字段名。

abstract term_info(fieldname, text)¶: 返回A TermInfo 对象，允许访问有关给定术语的各种统计信息。

terms_from(fieldname, prefix)¶: 为索引中以给定前缀开头的每个词生成（fieldname，text）元组。

terms_within(fieldname, text, maxdist, prefix=0)¶

返回给定字段中单词的生成器 maxdist Damerau Levenshtein编辑给定文本的距离。

重要事项：条款返回没有特别的命令. 唯一的标准是它们在 maxdist 编辑 text . 您可能希望随着 maxdist 值以确保首先获得最接近的匹配。您还可以使用其他信息（例如术语频率或声学匹配算法）对具有相同编辑距离的术语进行排名。

参数

maxdist -- 最大编辑距离。
prefix -- require suggestions to share a prefix of this length with the given word. This is often justifiable since most misspellings do not involve the first letter of the word. Using a prefix dramatically decreases the time it takes to generate the list of words.
seen -- 可选的集合对象。集合中出现的单词将不会生成。

abstract vector(docnum, fieldname, format_=None)¶

返回A Matcher 对象。

>>> docnum = searcher.document_number(path=u'/a/b/c')
>>> v = searcher.vector(docnum, "content")
>>> v.all_as("frequency")
[(u"apple", 3), (u"bear", 2), (u"cab", 2)]

参数

docnum -- 要使用术语向量的文档的文档编号。
fieldname -- 要使用术语向量的字段的字段名或字段号。

返回类型

whoosh.matching.Matcher

vector_as(astype, docnum, fieldname)¶

返回给定术语向量中术语对的（termtext，value）迭代器。这是一个调用vector（）和使用matcher对象的快捷方式，当您只需要术语和/或值时，可以使用matcher对象。

>>> docnum = searcher.document_number(path=u'/a/b/c')
>>> searcher.vector_as("frequency", docnum, "content")
[(u"apple", 3), (u"bear", 2), (u"cab", 2)]

参数

docnum -- 要使用术语向量的文档的文档编号。
fieldname -- 要使用术语向量的字段的字段名或字段号。
astype -- 一个字符串，包含要使用术语vector数据的格式的名称，例如“weights”。

class whoosh.reading.MultiReader(readers, generation=None)¶: 不要直接实例化此对象。而是使用index.reader（）。

class whoosh.reading.TermInfo(weight=0, df=0, minlength=None, maxlength=0, maxweight=0, minid=None, maxid=0)¶

表示一组有关术语的统计信息。此对象由返回 IndexReader.term_info() . 这些统计数据可能对优化和评分算法有用。

doc_frequency()¶: 返回术语出现的文档数。

max_id()¶: 返回此术语出现的最高文档ID。

max_length()¶: 返回术语出现的最长字段值的长度。

max_weight()¶: 返回术语在其出现最多的文档中出现的次数。

min_id()¶: 返回此术语出现的最低文档ID。

min_length()¶: 返回术语出现的最短字段值的长度。

weight()¶: 返回术语在所有文档中的总频率。

例外情况¶

exception whoosh.reading.TermNotFound¶

`reading` 模块¶

Classes¶

例外情况¶

目录

上一个主题

下一个主题

reading 模块¶

Classes¶

例外情况¶

`reading` 模块¶