红联Linux门户
Linux帮助

HTML解析库 html5lib

发布时间:2013-03-11 08:59:25来源:红联作者:empast
html5lib 是一个 Ruby 和 Python 用来解析 HTML 文档的类库,支持HTML 5 以及最大程度兼容桌面浏览器。

主要特性包括:

Parses valid and invalid HTML documents to a tree
Support for minidom, ElementTree (including cElementTree and lxml.etree), BeautifulSoup and custom simpletree output formats
DOM to SAX converter
Reports parse errors
Character encoding detection
XML mode for working with illformed XML e.g. feeds
Filtering and serializing of trees
HTML+CSS sanitizer
Many unit tests
Faster than before

项目主页:http://code.google.com/p/html5lib/

下载地址:http://code.google.com/p/html5lib/downloads/list

来自:开源中国社区
文章评论

共有 0 条评论