红联Linux门户
Linux帮助

Apache Nutch 1.9发布,开源的搜索引擎

发布时间:2014-08-19 10:30:32来源:红联作者:empast
Nutch 是一个开源Java 实现的搜索引擎。它提供了我们运行自己的搜索引擎所需的全部工具。包括全文搜索和Web爬虫。

Nutch 致力于让每个人能很容易, 同时花费很少就可以配置世界一流的Web搜索引擎. 为了完成这一宏伟的目标, Nutch必须能够做到:

* 每个月取几十亿网页

* 为这些网页维护一个索引

* 对索引文件进行每秒上千次的搜索

* 提供高质量的搜索结果

近日,Apache Nutch 1.9 发布,主要改进包括:

Improvement

[NUTCH-1502] - Test for CrawlDatum state transitions

[NUTCH-1561] - improve usability of parse-metatags and index-metadata

[NUTCH-1676] - Add rudimentary SSL support to protocol-http

[NUTCH-1745] - Upgrade to ElasticSearch 1.1.0

[NUTCH-1747] - Use AtomicInteger as semaphore in Fetcher

[NUTCH-1757] - ParserChecker to take custom metadata as input

[NUTCH-1758] - IndexChecker to send document to IndexWriters

[NUTCH-1772] - Injector does not need merging if no pre-existing crawldb

[NUTCH-1782] - NodeWalker to return current node

[NUTCH-1787] - update and complete API doc overview page

[NUTCH-1794] - IndexingFilterChecker to optionally dumpText

[NUTCH-1799] - ANT Eclipse task discovers all plugin jars automatically

New Feature

[NUTCH-207] - Bandwidth target for fetcher rather than a thread count

[NUTCH-1327] - QueryStringNormalizer

[NUTCH-1590] - [SECURITY] Frame injection vulnerability in published Javadoc

软件详情:http://nutch.apache.org/

下载地址:http://www.apache.org/dyn/closer.cgi/nutch/

来自:开源中国社区
文章评论

共有 0 条评论