Change log
v0.2.7
ÄÚǶhttp½Ó¿ÚÔÚ¿ÉÒÔ½ÓÊÕµ¥¸öJsonÐÎʽRequest»ù´¡ÉÏÔö¼ÓÖ§³Ö½ÓÊÕJsonÊý×éÐÎʽµÄ¶à¸öRequest
Request¶ÔÏóÖ§³ÖÉèÖÃskipDuplicateFilterÓÃÀ´¸æËßseimi´¦ÀíÆ÷Ìø¹ýÈ¥ÖØ»úÖÆ£¬Ä¬Èϲ»Ìø¹ý
Ôö¼Ó¶¨Ê±µ÷¶ÈʹÓÃDemo
»Øµ÷º¯Êýͨ¹ýRequest´«µÝ×Ô¶¨Òå²ÎÊýÖµÀàÐÍÓÉObject¸ÄΪString£¬·½±ãÃ÷È·´¦Àí
Fix:ÐÞ¸´Ò»¸ö´òÈÕÖ¾µÄbug
v0.2.6
Ôö¼ÓͳһµÄÆô¶¯Èë¿ÚÀ࣬ÅäºÏδÀ´SeimiCrawlerµÄmaven¹¹½¨pluginÒ»ÆðʹÓÃ
meta refresh·½Ê½Ìø×ªÓÅ»¯£¬ÉèÖÃ×î¶àÉÏÏÞΪ3´Î£¬·ÀÖ¹Óöµ½³ÖÐøË¢ÐÂÒ³ÃæÎÞ·¨Ìø³ö
bug fix:ÐÞ¸´ÔÚRequestÖÐ×Ô¶¨ÒåÊý¾ÝÎÞ·¨´«ÏòResponseµÄÎÊÌâ
v0.2.5
Ôö¼ÓÇëÇóÔâÓöÑÏÖØÒì³£Ê±ÖØÐ´ò»Ø¶ÓÁд¦Àí»úÖÆ
µ±Ò»¸öÇëÇóÔÚ¾ÀúÍøÂçÇëÇóÒì³£µÄÖØÊÔ»úÖÆºóÒÀÈ»³öÏÖ·ÇÔ¤ÆÚÒì³££¬ÄÇôÕâ¸öÇëÇó»áÔÚ²»³¬¹ý¿ª·¢ÕßÉèÖõĻòÊÇĬÈϵÄ×î´óÖØÐ´¦Àí´ÎÊýµÄÇé¿öϱ»´ò»Ø¶ÓÁÐÖØÐµȴý±»´¦Àí£¬Èç¹û±»´ò»Ø´ÎÊý´ïµ½ÁË×î´óÏÞÖÆ£¬ÄÇôseimi»áµ÷Óÿª·¢Õß×ÔÐи²¸ÇʵÏÖµÄBaseSeimiCrawler.handleErrorRequest(Request request)À´´¦Àí¼Ç¼Õâ¸öÒì³£µÄÇëÇó¡£ÖØÐ´ò»ØµÈ´ý´¦Àí»úÖÆÅäºÏdelay¹¦ÄÜʹÓÿÉÒÔÔںܴó³Ì¶ÈÉϱÜÃâÒò·ÃÎÊÕ¾µãµÄ·´ÅÀ³æ²ßÂÔÒýÆðµÄÇëÇó´¦ÀíÒì³££¬²¢¶ªÊ§ÇëÇóµÄ¼Ç¼µÄÇé¿ö¡£
ÓÅ»¯È¥ÖØÅжÏ
ÓÅ»¯²»¹æ·¶Ò³ÃæµÄ±àÂë»ñÈ¡·½Ê½
v0.2.4
×Ô¶¯Ìø×ªÔöÇ¿£¬³ý301,302ÍâÔö¼ÓÖ§³Öʶ±ðͨ¹ýmeta refresh·½Ê½µÄÒ³ÃæÌø×ª
Response¶ÔÏóÔö¼Óͨ¹ýgetRealUrl()»ñÈ¡ÄÚÈݶÔÓ¦ÖØ¶¨ÏòÒÔ¼°Ìø×ªºóµÄÕæÊµÁ¬½Ó
ͨ¹ý×¢½â@CrawlerÖÐ'useUnrepeated'ÊôÐÔ¿ØÖÆÊÇ·ñÆôÓÃϵͳ¼¶È¥ÖØ»úÖÆ£¬Ä¬ÈÏ¿ªÆô
v0.2.3
Ö§³Ö×Ô¶¨Ò嶯̬´úÀí
¿ª·¢Õß¿ÉÒÔͨ¹ý¸²¸ÇBaseSeimiCrawler.proxy()À´×ÔÐоö¶¨Ã¿´ÎÇëÇóËùʹÓõĴúÀí£¬¸²¸Ç¸Ã·½·¨²¢·µ»ØÓÐЧ´úÀíµØÖ·Ôò@CrawlerÖÐproxyÊôÐÔʧЧ¡£
Ìí¼Ó¶¯Ì¬´úÀí£¬¶¯Ì¬User-AgentʹÓÃdemo
v0.2.2
ÔöÇ¿¶Ô²»¹æ·¶ÍøÒ³µÄ±àÂëʶ±ðÓë¼æÈÝÄÜÁ¦
v0.2.1
ÓÅ»¯ºÚ°×Ãûµ¥ÕýÔò¹ýÂË»úÖÆ
v0.2.0
Ôö¼ÓÖ§³ÖÄÚǶhttp·þÎñAPIÌá½»json¸ñʽµÄRequestÇëÇó
Ôö¼ÓÕë¶ÔÇëÇóURL½øÐÐУÑéµÄallowRulesºÍdenyRulesµÄ×Ô¶¨ÒåÉèÖ㬼´°×Ãûµ¥¹æÔòºÍºÚÃûµ¥¹æÔò£¬¸ñʽ¾ùΪÕýÔò±í´ïʽ¡£Ä¬ÈÏΪnull²»½øÐмì²é
Ôö¼Ó¶ÔRequestµÄºÏ·¨ÐÔµÄͳһУÑé
Ôö¼ÓÖ§³ÖÇëÇó¼äµÄdelayʱ¼äÉèÖÃ
¼ò½é
SeimiCrawlerÊÇÒ»¸öÃô½ÝµÄ£¬¶ÀÁ¢²¿ÊðµÄ£¬Ö§³Ö·Ö²¼Ê½µÄJavaÅÀ³æ¿ò¼Ü£¬Ï£ÍûÄÜÔÚ×î´ó³Ì¶ÈÉϽµµÍÐÂÊÖ¿ª·¢Ò»¸ö¿ÉÓÃÐÔ¸ßÇÒÐÔÄܲ»²îµÄÅÀ³æÏµÍ³µÄÃż÷£¬ÒÔ¼°ÌáÉý¿ª·¢ÅÀ³æÏµÍ³µÄ¿ª·¢Ð§ÂÊ¡£ÔÚSeimiCrawlerµÄÊÀ½çÀ¾ø´ó¶àÊýÈËÖ»Ðè¹ØÐÄȥдץȡµÄÒµÎñÂß¼¾Í¹»ÁË£¬ÆäÓàµÄSeimi°ïÄã¸ã¶¨¡£Éè¼ÆË¼ÏëÉÏSeimiCrawlerÊÜPythonµÄÅÀ³æ¿ò¼ÜScrapyÆô·¢ºÜ´ó£¬Í¬Ê±ÈÚºÏÁËJavaÓïÑÔ±¾ÉíÌØµãÓëSpringµÄÌØÐÔ£¬²¢Ï£ÍûÔÚ¹úÄÚ¸ü·½±ãÇÒÆÕ±éµÄʹÓøüÓÐЧÂʵÄXPath½âÎöHTML£¬ËùÒÔSeimiCrawlerĬÈϵÄHTML½âÎöÆ÷ÊÇJsoupXpath(¶ÀÁ¢À©Õ¹ÏîÄ¿£¬·Çjsoup×Ô´ø),ĬÈϽâÎöÌáÈ¡HTMLÊý¾Ý¹¤×÷¾ùʹÓÃXPathÀ´Íê³É£¨µ±È»£¬Êý¾Ý´¦ÀíÒà¿ÉÒÔ×ÔÐÐÑ¡ÔñÆäËû½âÎöÆ÷£©¡£
ÉçÇøÌÖÂÛ
´ó¼ÒÓÐʲôÎÊÌâ»ò½¨ÒéÏÖÔÚ¶¼¿ÉÒÔÑ¡Ôñͨ¹ýÏÂÃæµÄÓʼþÁбíÌÖÂÛ£¬Ê״η¢ÑÔǰÐèÏȶ©ÔIJ¢µÈ´ýÉóºËͨ¹ý£¨Ö÷ÒªÓÃÀ´ÆÁ±Î¹ã¸æÐû´«µÈ£©
¶©ÔÄ:Çë·¢Óʼþµ½ seimicrawler+subscribe@googlegroups.com
·¢ÑÔ:Çë·¢Óʼþµ½ seimicrawler@googlegroups.com
Í˶©:Çë·¢ÓʼþÖÁ seimicrawler+unsubscribe@googlegroups.com
ÏîĿԴÂë
Èç¹ûÄú¾õ×ÅÕâ¸öÏîÄ¿²»´í£¬µ½githubÉÏstarһϣ¬ÎÒÊDz»½éÒâµÄ
Èí¼þÏêÇ飺https://github.com/zhegexiaohuozi/SeimiCrawler
À´×Ô:¿ªÔ´ÖйúÉçÇø

