红联Linux门户
Linux帮助

Apache Tika 1.16发布,内容抽取工具集合

发布时间:2017-07-13 08:50:08来源:红联作者:baihuo
Apache Tika 1.16 发布了,Tika 是一个内容抽取的工具集合(a toolkit for text extracting)。它集成了 POI 和 Pdfbox,并且为文本抽取工作提供了一个统一的界面。其次,Tika 也提供了便利的扩展 API,用来丰富其对第三方文件格式的支持。

部分更新内容如下:

Exclude jj2000 from edu.ucar grip to avoid potential

license conflicts with ASL 2.0

Add Age recognition using Ensemble model for Linear regression

and Apache OpenNLP Maximum Entropy. Tika can now detect age from

text (TIKA-1988).

Add Tika Deep Learning support for the VGG16 model for

Very Deep Convolutional Networks for Large-Scale Image Recognition.

Now Tika supports both Inception v3/v4 and VGG16 based image

recognition (TIKA-2298).

Extract macros from PPT (TIKA-2089).

下载地址:

http://www.apache.org/dyn/closer.cgi/tika/apache-tika-1.16-src.zip

软件详情:http://www.apache.org/dist/tika/CHANGES-1.16.txt

来自:开源中国社区
文章评论

共有 0 条评论