ºìÁªLinuxÃÅ»§
Linux°ïÖú

jsoup 1.8.1·¢²¼£¬¼«´óµÄÐÔÄÜÌáÉý

·¢²¼Ê±¼ä:2014-09-28 15:10:16À´Ô´:ºìÁª×÷Õß:empast
jsoup 1.8.1 ·¢²¼À²£¡

jsoup 1.8.1 ÏÔÖøÌáÉýÁËÎı¾ºÍÊ÷ÐòÁл¯µÄÐÔÄÜ£»¿ÉÒÔÑ¡Ôñ HTML »òÕß XML Êä³ö£»»¹ÓдóÁ¿µÄ¹¦ÄܸĽøºÍ bug ÐÞ¸´¡£´Ë°æ±¾ÏÖÒÑÌṩÏÂÔØ¡£

¸üÐÂÄÚÈÝÈçÏ£º

¸Ä½ø

¿ÉÒÔÑ¡Ôñ HTML »òÕß XML Êä³ö£¬Ä¬ÈÏÊÇ HTML

Element.text() ÐÔÄܸĽø

Element.html() ÐÔÄܸĽø

Ëõ¶ÌÎļþ¶ÁµÄʱ¼ä£¬Í¬Ê±Ò²¸Ä½øÁËÎļþ½âÎöÆ÷£¬ÌáÉý´ó¸Å 10% µÄËÙ¶È

Ìí¼Ó Element.cssSelector()

Tightened the scope of what characters are escaped in attributes and textnodes, to align with the spec.


Èç¹û½ûÓÃÁË pretty-print£¬½«²»»áÈ¥³ý Element.html() ÒÔÍâµÄ¿Õ¸ñ

HTML Cleaner ÖÐÔÊÐí»ù´¡°×Ãûµ¥ÖдøÓÐ span ±êÇ©£¬relaxed whitelist ÖдøÓÐ span ºÍ div ±êÇ©

·ÅËÉ doctype ÑéÖ¤£¬¿ÉÒÔ²»Ö¸¶¨Ãû³Æ

CSS Selectors Ö§³Ö quoted ÊôÐÔÖµ

Bug ÐÞ¸´

Fixed an issue where was parsed as

Fixed an issue where a UTF-8 BOM character was not detected if the HTTP response did not specify a charset, and the HTML body did, leading to the head contents incorrectly being parsed into the body. Changed the behavior so that when the UTF-8 BOM is detected, it will take precedence for determining the charset to decode with.

Fixed an issue in parsing a base URI when loading a URL containing a http-equiv element.

Fixed an issue for Java 1.5 / Android 2.2 compatibility, and verify it doesn't regress.

Fixed an issue that would throw an NPE when trying to set invalid HTML into a title element.

Fixed support for nth-of-type selectors with unknown tags.

Added support for application/*+xml mimetypes.

Fixed support for allowing script tags in cleaner whitelists.

¸ü¶àÄÚÈÝÇë¿´·¢ÐÐ˵Ã÷¡£

jsoup ÊÇÒ»¿î Java µÄHTML ½âÎöÆ÷£¬¿ÉÖ±½Ó½âÎöij¸öURLµØÖ·¡¢HTMLÎı¾ÄÚÈÝ¡£ËüÌṩÁËÒ»Ì׷dz£Ê¡Á¦µÄAPI£¬¿Éͨ¹ýDOM£¬CSSÒÔ¼°ÀàËÆÓÚJQueryµÄ²Ù×÷·½·¨À´È¡³öºÍ²Ù×÷Êý¾Ý¡£

jsoupµÄÖ÷Òª¹¦ÄÜÈçÏ£º

1.´ÓÒ»¸öURL£¬Îļþ»ò×Ö·û´®ÖнâÎöHTML£»

2.ʹÓÃDOM»òCSSÑ¡ÔñÆ÷À´²éÕÒ¡¢È¡³öÊý¾Ý£»

3.¿É²Ù×÷HTMLÔªËØ¡¢ÊôÐÔ¡¢Îı¾£»

jsoupÊÇ»ùÓÚMITЭÒé·¢²¼µÄ£¬¿É·ÅÐÄʹÓÃÓÚÉÌÒµÏîÄ¿¡£

Èí¼þÏêÇ飺http://jsoup.org/news/release-1.8.1

ÏÂÔØµØÖ·£ºhttp://jsoup.org/download

À´×Ô:¿ªÔ´ÖйúÉçÇø
ÎÄÕÂÆÀÂÛ

¹²ÓÐ 0 ÌõÆÀÂÛ