红联Linux门户
Linux帮助

Scrapy 1.2.2发布,Web爬虫框架

发布时间:2016-12-07 09:13:30来源:红联作者:baihuo
Scrapy 1.2.2 发布了。

Scrapy 是一套基于基于Twisted的异步处理框架,纯python实现的爬虫框架,用户只需要定制开发几个模块就可以轻松的实现一个爬虫,用来抓取网页内容以及各种图片。

更新内容:

Bug 修复

Fix a cryptic traceback when a pipeline fails on open_spider()

Fix embedded IPython shell variables (fixing issue 396 that re-appeared in 1.2.0)

A couple of patches when dealing with robots.txt:

handle (non-standard) relative sitemap URLs

handle non-ASCII URLs and User-Agents in Python 2

文档

Document "download_latency" key in Request‘s meta dict

Remove page on (deprecated & unsupported) Ubuntu packages from ToC

A few fixed typos (issue 2346, issue 2369, issue 2369, issue 2380) and clarifications

其他变更

Advertize conda-forge as Scrapy’s official conda channel

More helpful error messages when trying to use .css() or .xpath() on non-Text Responses

startproject command now generates a sample middlewares.py file

Add more dependencies’ version info in scrapy version verbose output

Remove all *.pyc files from source distribution

软件详情:https://doc.scrapy.org/en/1.2/news.html

下载地址:https://github.com/scrapy/scrapy/archive/1.2.2.zip

来自:开源中国社区
文章评论

共有 0 条评论