红联Linux门户
Linux帮助

Apache Lucene 6.4.0发布,Java搜索引擎

发布时间:2017-01-25 06:23:20来源:红联作者:cocotoo
Apache Lucene 6.3.0 发布了。

Lucene 是apache软件基金会一个开放源代码的全文检索引擎工具包,是一个全文检索引擎的架构,提供了完整的查询引擎和索引引擎,部分文本分析引擎。Lucene的目的是为软件开发人员提供一个简单易用的工具包,以方便的在目标系统中实现全文检索的功能,或者是以此为基础建立起完整的全文检索引擎。 Lucene最初是由Doug Cutting所撰写的,是一位资深全文索引/检索专家,曾经是V-Twin搜索引擎的主要开发者,后来在Excite担任高级系统架构设计师,目前从事 于一些INTERNET底层架构的研究。他贡献出Lucene的目标是为各种中小型应用程式加入全文检索功能。

主要更新内容:

Lucene's best efforts to un-map memory mapped files with "MMapDirectory" now work with the latest Java9 early access builds

A new similarity "BooleanSimilarity" that gives terms a score that is equal to their query boost

The axiomatic family of similarities (6 in total) based on https://www.eecis.udel.edu/~hfang/pubs/sigir05-axiom.pdf

A new token filter "SynonymGraphFilter" that outputs a correct graph structure for multi-token synonyms at query time

Graph token streams, such as those produced by the "SynonymGraphFilter", are now handled accurately by query parsers

A new collector "DocValuesStatsCollector" gives the ability to compute statistics on DocValues field

It is now possible to filter "SortedDocValues" and "SortedSetDocValues" terms enum with a compiled automaton

The "UnifiedHighlighter" can now highlight fields with queries that don't necessarily refer to that field

DrillSideways can now run queries concurrently

Index sorting now supports sorting on multi-valued fields using MIN, MAX, etc. selectors

Points do not store the implicit split dimension in the 1-dimension case. This saves between 6% memory for the largest types such an InetAddressPoint to 33% for the smaller types such as HalfFloatPoint.

The BKD in-memory index for dimensional points now uses a compressed format, using substantially less RAM in some cases

The BKD writing now buffers each leaf block in heap before writing to disk, giving a small speedup in points-heavy use cases

"TermAutomatonQuery" now rewrites to more efficient queries when possible

更多内容及下载地址:http://lucene.apache.org/

来自:开源中国社区
文章评论

共有 0 条评论