ºìÁªLinuxÃÅ»§
Linux°ïÖú

SeimiCrawler v1.2.0·¢²¼£¬JavaÅÀ³æ¿ò¼Ü

·¢²¼Ê±¼ä:2016-07-22 09:26:27À´Ô´:ºìÁª×÷Õß:baihuo
SeimiCrawler v1.2.0 ·¢²¼ÁË¡£

±ä¸üÈÕÖ¾

v1.2.0

OkhttpDownloaderÖ§³Ö´¦ÀícontentTypeÍ·ÖÐûÓÐÖ¸¶¨±àÂëµÄÖÐÎÄÒ³Ãæ

Ö§³Öͨ¹ý@Crawler×¢½âÖеÄhttpTimeOutÊôÐÔ×Ô¶¨ÒåhttpÇëÇóµÄ³¬Ê±Ê±¼ä£¬Ä¬ÈÏ15000ms

v1.1.0

¿Éͨ¹ýʵÏÖSeimiCrawlerµÄList startRequests();À´ÊµÏÖ¸ü¸´Ôӵįðʼ´¥·¢ÇëÇó

SemiQueueʵÏÖ°´Ðè¼ÓÔØ

ÐÞ¸´×¥È¡ÎļþÀàÐÍÊý¾Ý·µ»ØÊ±³¢ÊÔÆ¥Åämeta refreshʱ²úÉúµÄÎÊÌâ

¼ò½é

SeimiCrawlerÊÇÒ»¸öÃô½ÝµÄ£¬¶ÀÁ¢²¿ÊðµÄ£¬Ö§³Ö·Ö²¼Ê½µÄJavaÅÀ³æ¿ò¼Ü£¬Ï£ÍûÄÜÔÚ×î´ó³Ì¶ÈÉϽµµÍÐÂÊÖ¿ª·¢Ò»¸ö¿ÉÓÃÐÔ¸ßÇÒÐÔÄܲ»²îµÄÅÀ³æÏµÍ³µÄÃż÷£¬ÒÔ¼°ÌáÉý¿ª·¢ÅÀ³æÏµÍ³µÄ¿ª·¢Ð§ÂÊ¡£ÔÚSeimiCrawlerµÄÊÀ½çÀ¾ø´ó¶àÊýÈËÖ»Ðè¹ØÐÄȥдץȡµÄÒµÎñÂß¼­¾Í¹»ÁË£¬ÆäÓàµÄSeimi°ïÄã¸ã¶¨¡£Éè¼ÆË¼ÏëÉÏSeimiCrawlerÊÜPythonµÄÅÀ³æ¿ò¼ÜScrapyÆô·¢£¬Í¬Ê±ÈÚºÏÁËJavaÓïÑÔ±¾ÉíÌØµãÓëSpringµÄÌØÐÔ£¬²¢Ï£ÍûÔÚ¹úÄÚ¸ü·½±ãÇÒÆÕ±éµÄʹÓøüÓÐЧÂʵÄXPath½âÎöHTML£¬ËùÒÔSeimiCrawlerĬÈϵÄHTML½âÎöÆ÷ÊÇJsoupXpath(¶ÀÁ¢À©Õ¹ÏîÄ¿£¬·Çjsoup×Ô´ø),ĬÈϽâÎöÌáÈ¡HTMLÊý¾Ý¹¤×÷¾ùʹÓÃXPathÀ´Íê³É£¨µ±È»£¬Êý¾Ý´¦ÀíÒà¿ÉÒÔ×ÔÐÐÑ¡ÔñÆäËû½âÎöÆ÷£©¡£²¢½áºÏSeimiAgent³¹µ×ÍêÃÀ½â¾ö¸´ÔÓ¶¯Ì¬Ò³ÃæäÖȾץȡÎÊÌâ¡£

Èí¼þÏêÇ飺http://seimicrawler.org/

ÏÂÔØµØÖ·£ºhttps://github.com/zhegexiaohuozi/SeimiCrawler

À´×Ô:¿ªÔ´ÖйúÉçÇø
ÎÄÕÂÆÀÂÛ

¹²ÓÐ 0 ÌõÆÀÂÛ