Scrapy
2.5
第一步
Scrapy一目了然
安装指南
Scrapy 教程
实例
基本概念
命令行工具
蜘蛛
选择器
项目
项目加载器
Scrapy shell
项目管道
Feed 导出
请求和响应
链接提取器
设置
例外情况
内置服务
登录
统计数据集合
发送电子邮件
远程登录控制台
Web服务
解决具体问题
常见问题
调试spiders
蜘蛛合约
常用做法
宽爬行
使用浏览器的开发人员工具进行抓取
选择动态加载的内容
调试内存泄漏
下载和处理文件和图像
部署蜘蛛
AutoThrottle 扩展
标杆管理
作业:暂停和恢复爬行
协同程序
asyncio
扩展Scrapy
体系结构概述
下载器中间件
蜘蛛中间件
扩展
核心API
信号
调度程序
条目导出器
其余所有
发行说明
为 Scrapy 贡献
版本控制和API稳定性
Scrapy
»
索引
索引
_
|
A
|
B
|
C
|
D
|
E
|
F
|
G
|
H
|
I
|
J
|
L
|
M
|
N
|
O
|
P
|
Q
|
R
|
S
|
T
|
U
|
V
|
W
|
X
|
模
_
__bool__() (scrapy.selector.Selector 方法)
__init__()
A
adapt_response() (scrapy.spiders.XMLFeedSpider 方法)
adjust_request_args() (scrapy.contracts.Contract 方法)
AJAXCRAWL_ENABLED
setting
AjaxCrawlMiddleware (scrapy.downloadermiddlewares.ajaxcrawl 中的类)
allowed() (scrapy.robotstxt.RobotParser 方法)
allowed_domains (scrapy.spiders.scrapy.Spider 属性)
ASYNCIO_EVENT_LOOP
setting
attrib (scrapy.selector.Selector 属性)
(scrapy.selector.SelectorList 属性)
attributes (scrapy.http.JsonRequest 属性)
(scrapy.http.Request 属性)
(scrapy.http.Response 属性)
(scrapy.http.TextResponse 属性)
AUTOTHROTTLE_DEBUG
setting
AUTOTHROTTLE_ENABLED
setting
AUTOTHROTTLE_MAX_DELAY
setting
AUTOTHROTTLE_START_DELAY
setting
AUTOTHROTTLE_TARGET_CONCURRENCY
setting
AWS_ACCESS_KEY_ID
setting
AWS_ENDPOINT_URL
setting
AWS_REGION_NAME
setting
AWS_SECRET_ACCESS_KEY
setting
AWS_SESSION_TOKEN
setting
AWS_USE_SSL
setting
AWS_VERIFY
setting
B
BaseItemExporter (scrapy.exporters 中的类)
BaseSettings (scrapy.settings 中的类)
bench
command
bindaddress
reqmeta
body (scrapy.http.Request 属性)
(scrapy.http.Response 属性)
BOT_NAME
setting
bytes_received
signal
bytes_received() (在 scrapy.signals 模块中)
Bz2Plugin (scrapy.extensions.postprocessing 中的类)
C
CacheStorage (scrapy.extensions.httpcache 中的类)
CallbackKeywordArgumentsContract (scrapy.contracts.default 中的类)
cb_kwargs (scrapy.http.Request 属性)
(scrapy.http.Response 属性)
certificate (scrapy.http.Response 属性)
check
command
clear_stats() (scrapy.statscollectors.StatsCollector 方法)
close()
close_spider()
(scrapy.extensions.httpcache.CacheStorage 方法)
(scrapy.statscollectors.StatsCollector 方法)
closed() (scrapy.spiders.scrapy.Spider 方法)
CloseSpider
(scrapy.extensions.closespider 中的类)
CLOSESPIDER_ERRORCOUNT
setting
CLOSESPIDER_ITEMCOUNT
setting
CLOSESPIDER_PAGECOUNT
setting
CLOSESPIDER_TIMEOUT
setting
command
bench
check
crawl
edit
fetch
genspider
list
parse
runspider
settings
shell
startproject
version
view
COMMANDS_MODULE
setting
COMPRESSION_ENABLED
setting
CONCURRENT_ITEMS
setting
CONCURRENT_REQUESTS
setting
CONCURRENT_REQUESTS_PER_DOMAIN
setting
CONCURRENT_REQUESTS_PER_IP
setting
Contract (scrapy.contracts 中的类)
ContractFail (scrapy.exceptions 中的类)
cookiejar
reqmeta
COOKIES_DEBUG
setting
COOKIES_ENABLED
setting
CookiesMiddleware (scrapy.downloadermiddlewares.cookies 中的类)
copy() (scrapy.http.Request 方法)
(scrapy.http.Response 方法)
(scrapy.scrapy.Item.Item 方法)
(scrapy.settings.BaseSettings 方法)
copy_to_dict() (scrapy.settings.BaseSettings 方法)
CoreStats (scrapy.extensions.corestats 中的类)
crawl
command
crawl() (scrapy.crawler.Crawler 方法)
crawled() (scrapy.logformatter.LogFormatter 方法)
Crawler (scrapy.crawler 中的类)
crawler (scrapy.spiders.scrapy.Spider 属性)
CrawlSpider (scrapy.spiders 中的类)
css() (scrapy.http.TextResponse 方法)
(scrapy.selector.Selector 方法)
(scrapy.selector.SelectorList 方法)
CSVFeedSpider (scrapy.spiders 中的类)
CsvItemExporter (scrapy.exporters 中的类)
curl_to_request_kwargs() (在 scrapy.utils.curl 模块中)
custom_settings (scrapy.spiders.scrapy.Spider 属性)
D
DbmCacheStorage (scrapy.extensions.httpcache 中的类)
Debugger (scrapy.extensions.debug 中的类)
deepcopy() (scrapy.scrapy.Item.Item 方法)
DEFAULT_ITEM_CLASS
setting
DEFAULT_REQUEST_HEADERS
setting
DefaultHeadersMiddleware (scrapy.downloadermiddlewares.defaultheaders 中的类)
DefaultReferrerPolicy (scrapy.spidermiddlewares.referer 中的类)
deferred_to_future() (在 scrapy.utils.defer 模块中)
delimiter (scrapy.spiders.CSVFeedSpider 属性)
DEPTH_LIMIT
setting
DEPTH_PRIORITY
setting
DEPTH_STATS_VERBOSE
setting
DepthMiddleware (scrapy.spidermiddlewares.depth 中的类)
DNS_RESOLVER
setting
DNS_TIMEOUT
setting
DNSCACHE_ENABLED
setting
DNSCACHE_SIZE
setting
dont_cache
reqmeta
dont_merge_cookies
reqmeta
dont_obey_robotstxt
reqmeta
dont_redirect
reqmeta
dont_retry
reqmeta
DontCloseSpider
DOWNLOAD_DELAY
setting
download_error() (scrapy.logformatter.LogFormatter 方法)
DOWNLOAD_FAIL_ON_DATALOSS
setting
download_fail_on_dataloss
reqmeta
DOWNLOAD_HANDLERS
setting
DOWNLOAD_HANDLERS_BASE
setting
download_latency
reqmeta
DOWNLOAD_MAXSIZE
setting
download_maxsize
reqmeta
DOWNLOAD_TIMEOUT
setting
download_timeout
reqmeta
DOWNLOAD_WARNSIZE
setting
DOWNLOADER
setting
DOWNLOADER_CLIENT_TLS_CIPHERS
setting
DOWNLOADER_CLIENT_TLS_METHOD
setting
DOWNLOADER_CLIENT_TLS_VERBOSE_LOGGING
setting
DOWNLOADER_CLIENTCONTEXTFACTORY
setting
DOWNLOADER_HTTPCLIENTFACTORY
setting
DOWNLOADER_MIDDLEWARES
setting
DOWNLOADER_MIDDLEWARES_BASE
setting
DOWNLOADER_STATS
setting
DownloaderMiddleware (scrapy.downloadermiddlewares 中的类)
DownloaderStats (scrapy.downloadermiddlewares.stats 中的类)
DownloadTimeoutMiddleware (scrapy.downloadermiddlewares.downloadtimeout 中的类)
DropItem
dropped() (scrapy.logformatter.LogFormatter 方法)
DummyPolicy (scrapy.extensions.httpcache 中的类)
DummyStatsCollector (scrapy.statscollectors 中的类)
DUPEFILTER_CLASS
setting
DUPEFILTER_DEBUG
setting
E
edit
command
EDITOR
setting
encoding (scrapy.exporters.BaseItemExporter 属性)
(scrapy.http.TextResponse 属性)
engine (scrapy.crawler.Crawler 属性)
engine_started
signal
engine_started() (在 scrapy.signals 模块中)
engine_stopped
signal
engine_stopped() (在 scrapy.signals 模块中)
export_empty_fields (scrapy.exporters.BaseItemExporter 属性)
export_item() (scrapy.exporters.BaseItemExporter 方法)
EXTENSIONS
setting
extensions (scrapy.crawler.Crawler 属性)
EXTENSIONS_BASE
setting
extract_links() (scrapy.linkextractors.lxmlhtml.LxmlLinkExtractor 方法)
F
FEED_EXPORT_BATCH_ITEM_COUNT
setting
FEED_EXPORT_ENCODING
setting
FEED_EXPORT_FIELDS
setting
FEED_EXPORT_INDENT
setting
FEED_EXPORTERS
setting
FEED_EXPORTERS_BASE
setting
FEED_STORAGE_FTP_ACTIVE
setting
FEED_STORAGE_GCS_ACL
setting
FEED_STORAGE_S3_ACL
setting
FEED_STORAGES
setting
FEED_STORAGES_BASE
setting
FEED_STORE_EMPTY
setting
FEED_TEMPDIR
setting
FEED_URI_PARAMS
setting
FEEDS
setting
fetch
command
fields (scrapy.item.scrapy.Item 属性)
fields_to_export (scrapy.exporters.BaseItemExporter 属性)
file_path() (scrapy.pipelines.files.FilesPipeline 方法)
(scrapy.pipelines.images.ImagesPipeline 方法)
FILES_EXPIRES
setting
FILES_RESULT_FIELD
setting
FILES_STORE
setting
FILES_STORE_GCS_ACL
setting
FILES_STORE_S3_ACL
setting
FILES_URLS_FIELD
setting
FilesPipeline (scrapy.pipelines.files 中的类)
FilesystemCacheStorage (scrapy.extensions.httpcache 中的类)
find_by_request() (scrapy.spiderloader.SpiderLoader 方法)
finish_exporting() (scrapy.exporters.BaseItemExporter 方法)
flags (scrapy.http.Response 属性)
follow() (scrapy.http.Response 方法)
(scrapy.http.TextResponse 方法)
follow_all() (scrapy.http.Response 方法)
(scrapy.http.TextResponse 方法)
freeze() (scrapy.settings.BaseSettings 方法)
from_crawler()
(scrapy.downloadermiddlewares.DownloaderMiddleware 方法)
(scrapy.robotstxt.RobotParser 类方法)
(scrapy.spidermiddlewares.SpiderMiddleware 方法)
(scrapy.spiders.scrapy.Spider 方法)
from_curl() (scrapy.http.Request 类方法)
from_response() (scrapy.http.scrapy.FormRequest.FormRequest 类方法)
from_settings() (scrapy.mail.MailSender 类方法)
(scrapy.spiderloader.SpiderLoader 方法)
frozencopy() (scrapy.settings.BaseSettings 方法)
FTP_PASSIVE_MODE
setting
FTP_PASSWORD
setting
ftp_password
reqmeta
FTP_USER
setting
ftp_user
reqmeta
G
GCS_PROJECT_ID
setting
genspider
command
get() (scrapy.selector.Selector 方法)
(scrapy.selector.SelectorList 方法)
(scrapy.settings.BaseSettings 方法)
get_media_requests() (scrapy.pipelines.files.FilesPipeline 方法)
(scrapy.pipelines.images.ImagesPipeline 方法)
get_oldest() (在 scrapy.utils.trackref 模块中)
get_settings_priority() (在 scrapy.settings 模块中)
get_stats() (scrapy.statscollectors.StatsCollector 方法)
get_value() (scrapy.statscollectors.StatsCollector 方法)
getall() (scrapy.selector.Selector 方法)
(scrapy.selector.SelectorList 方法)
getbool() (scrapy.settings.BaseSettings 方法)
getdict() (scrapy.settings.BaseSettings 方法)
getfloat() (scrapy.settings.BaseSettings 方法)
getint() (scrapy.settings.BaseSettings 方法)
getlist() (scrapy.settings.BaseSettings 方法)
getpriority() (scrapy.settings.BaseSettings 方法)
getwithbase() (scrapy.settings.BaseSettings 方法)
GzipPlugin (scrapy.extensions.postprocessing 中的类)
H
handle_httpstatus_all
reqmeta
handle_httpstatus_list
reqmeta
headers (scrapy.http.Request 属性)
(scrapy.http.Response 属性)
(scrapy.spiders.CSVFeedSpider 属性)
headers_received
signal
headers_received() (在 scrapy.signals 模块中)
HtmlResponse (scrapy.http 中的类)
HttpAuthMiddleware (scrapy.downloadermiddlewares.httpauth 中的类)
HTTPCACHE_ALWAYS_STORE
setting
HTTPCACHE_DBM_MODULE
setting
HTTPCACHE_DIR
setting
HTTPCACHE_ENABLED
setting
HTTPCACHE_EXPIRATION_SECS
setting
HTTPCACHE_GZIP
setting
HTTPCACHE_IGNORE_HTTP_CODES
setting
HTTPCACHE_IGNORE_MISSING
setting
HTTPCACHE_IGNORE_RESPONSE_CACHE_CONTROLS
setting
HTTPCACHE_IGNORE_SCHEMES
setting
HTTPCACHE_POLICY
setting
HTTPCACHE_STORAGE
setting
HttpCacheMiddleware (scrapy.downloadermiddlewares.httpcache 中的类)
HttpCompressionMiddleware (scrapy.downloadermiddlewares.httpcompression 中的类)
HTTPERROR_ALLOW_ALL
setting
HTTPERROR_ALLOWED_CODES
setting
HttpErrorMiddleware (scrapy.spidermiddlewares.httperror 中的类)
HTTPPROXY_AUTH_ENCODING
setting
HTTPPROXY_ENABLED
setting
HttpProxyMiddleware (scrapy.downloadermiddlewares.httpproxy 中的类)
I
IgnoreRequest
IMAGES_EXPIRES
setting
IMAGES_MIN_HEIGHT
setting
IMAGES_MIN_WIDTH
setting
IMAGES_RESULT_FIELD
setting
IMAGES_STORE
setting
IMAGES_STORE_GCS_ACL
setting
IMAGES_STORE_S3_ACL
setting
IMAGES_THUMBS
setting
IMAGES_URLS_FIELD
setting
ImagesPipeline (scrapy.pipelines.images 中的类)
inc_value() (scrapy.statscollectors.StatsCollector 方法)
indent (scrapy.exporters.BaseItemExporter 属性)
install_reactor() (在 scrapy.utils.reactor 模块中)
ip_address (scrapy.http.Response 属性)
item_completed() (scrapy.pipelines.files.FilesPipeline 方法)
(scrapy.pipelines.images.ImagesPipeline 方法)
item_dropped
signal
item_dropped() (在 scrapy.signals 模块中)
item_error
signal
item_error() (scrapy.logformatter.LogFormatter 方法)
(在 scrapy.signals 模块中)
ITEM_PIPELINES
setting
ITEM_PIPELINES_BASE
setting
item_scraped
signal
item_scraped() (在 scrapy.signals 模块中)
ItemMeta (scrapy.item 中的类)
iter_all() (在 scrapy.utils.trackref 模块中)
iterator (scrapy.spiders.XMLFeedSpider 属性)
itertag (scrapy.spiders.XMLFeedSpider 属性)
J
JOBDIR
setting
json() (scrapy.http.TextResponse 方法)
JsonItemExporter (scrapy.exporters 中的类)
JsonLinesItemExporter (scrapy.exporters 中的类)
JsonRequest (scrapy.http 中的类)
L
Link (scrapy.link 中的类)
list
command
list() (scrapy.spiderloader.SpiderLoader 方法)
load() (scrapy.spiderloader.SpiderLoader 方法)
log() (scrapy.spiders.scrapy.Spider 方法)
LOG_DATEFORMAT
setting
LOG_ENABLED
setting
LOG_ENCODING
setting
LOG_FILE
setting
LOG_FILE_APPEND
setting
LOG_FORMAT
setting
LOG_FORMATTER
setting
LOG_LEVEL
setting
LOG_SHORT_NAMES
setting
LOG_STDOUT
setting
LogFormatter (scrapy.logformatter 中的类)
logger (scrapy.spiders.scrapy.Spider 属性)
LogStats (scrapy.extensions.logstats 中的类)
LOGSTATS_INTERVAL
setting
LxmlLinkExtractor (scrapy.linkextractors.lxmlhtml 中的类)
LZMAPlugin (scrapy.extensions.postprocessing 中的类)
M
MAIL_FROM
setting
MAIL_HOST
setting
MAIL_PASS
setting
MAIL_PORT
setting
MAIL_SSL
setting
MAIL_TLS
setting
MAIL_USER
setting
MailSender (scrapy.mail 中的类)
max_retry_times
reqmeta
max_value() (scrapy.statscollectors.StatsCollector 方法)
maxpriority() (scrapy.settings.BaseSettings 方法)
maybe_deferred_to_future() (在 scrapy.utils.defer 模块中)
MEDIA_ALLOW_REDIRECTS
setting
MEMDEBUG_ENABLED
setting
MEMDEBUG_NOTIFY
setting
MemoryDebugger (scrapy.extensions.memdebug 中的类)
MemoryStatsCollector (scrapy.statscollectors 中的类)
MemoryUsage (scrapy.extensions.memusage 中的类)
MEMUSAGE_CHECK_INTERVAL_SECONDS
setting
MEMUSAGE_ENABLED
setting
MEMUSAGE_LIMIT_MB
setting
MEMUSAGE_NOTIFY_MAIL
setting
MEMUSAGE_WARNING_MB
setting
meta (scrapy.http.Request 属性)
(scrapy.http.Response 属性)
METAREFRESH_ENABLED
setting
METAREFRESH_IGNORE_TAGS
setting
METAREFRESH_MAXDELAY
setting
MetaRefreshMiddleware (scrapy.downloadermiddlewares.redirect 中的类)
method (scrapy.http.Request 属性)
min_value() (scrapy.statscollectors.StatsCollector 方法)
N
name (scrapy.spiders.scrapy.Spider 属性)
namespaces (scrapy.spiders.XMLFeedSpider 属性)
NEWSPIDER_MODULE
setting
NoReferrerPolicy (scrapy.spidermiddlewares.referer 中的类)
NoReferrerWhenDowngradePolicy (scrapy.spidermiddlewares.referer 中的类)
NotConfigured
NotSupported
O
object_ref (scrapy.utils.trackref 中的类)
OffsiteMiddleware (scrapy.spidermiddlewares.offsite 中的类)
open_spider()
(scrapy.extensions.httpcache.CacheStorage 方法)
(scrapy.statscollectors.StatsCollector 方法)
OriginPolicy (scrapy.spidermiddlewares.referer 中的类)
OriginWhenCrossOriginPolicy (scrapy.spidermiddlewares.referer 中的类)
P
parse
command
parse() (scrapy.spiders.scrapy.Spider 方法)
parse_node() (scrapy.spiders.XMLFeedSpider 方法)
parse_row() (scrapy.spiders.CSVFeedSpider 方法)
parse_start_url() (scrapy.spiders.CrawlSpider 方法)
PickleItemExporter (scrapy.exporters 中的类)
post_process() (scrapy.contracts.Contract 方法)
PprintItemExporter (scrapy.exporters 中的类)
pre_process() (scrapy.contracts.Contract 方法)
print_live_refs() (在 scrapy.utils.trackref 模块中)
process_exception() (scrapy.downloadermiddlewares.DownloaderMiddleware 方法)
process_item()
process_request() (scrapy.downloadermiddlewares.DownloaderMiddleware 方法)
process_response() (scrapy.downloadermiddlewares.DownloaderMiddleware 方法)
process_results() (scrapy.spiders.XMLFeedSpider 方法)
process_spider_exception() (scrapy.spidermiddlewares.SpiderMiddleware 方法)
process_spider_input() (scrapy.spidermiddlewares.SpiderMiddleware 方法)
process_spider_output() (scrapy.spidermiddlewares.SpiderMiddleware 方法)
process_start_requests() (scrapy.spidermiddlewares.SpiderMiddleware 方法)
protocol (scrapy.http.Response 属性)
proxy
reqmeta
Python 提高建议
PEP 8
,
[1]
Q
quotechar (scrapy.spiders.CSVFeedSpider 属性)
R
RANDOMIZE_DOWNLOAD_DELAY
setting
re() (scrapy.selector.Selector 方法)
(scrapy.selector.SelectorList 方法)
re_first() (scrapy.selector.Selector 方法)
(scrapy.selector.SelectorList 方法)
REACTOR_THREADPOOL_MAXSIZE
setting
REDIRECT_ENABLED
setting
REDIRECT_MAX_TIMES
setting
REDIRECT_PRIORITY_ADJUST
setting
redirect_reasons
reqmeta
redirect_urls
reqmeta
RedirectMiddleware (scrapy.downloadermiddlewares.redirect 中的类)
REFERER_ENABLED
setting
RefererMiddleware (scrapy.spidermiddlewares.referer 中的类)
REFERRER_POLICY
setting
referrer_policy
reqmeta
register_namespace() (scrapy.selector.Selector 方法)
remove_namespaces() (scrapy.selector.Selector 方法)
replace() (scrapy.http.Request 方法)
(scrapy.http.Response 方法)
reqmeta
bindaddress
cookiejar
dont_cache
dont_merge_cookies
dont_obey_robotstxt
dont_redirect
dont_retry
download_fail_on_dataloss
download_latency
download_maxsize
download_timeout
ftp_password
ftp_user
handle_httpstatus_all
handle_httpstatus_list
max_retry_times
proxy
redirect_reasons
redirect_urls
referrer_policy
Request (scrapy.http 中的类)
request (scrapy.http.Response 属性)
request_dropped
signal
request_dropped() (在 scrapy.signals 模块中)
request_from_dict() (在 scrapy.utils.request 模块中)
request_left_downloader
signal
request_left_downloader() (在 scrapy.signals 模块中)
request_reached_downloader
signal
request_reached_downloader() (在 scrapy.signals 模块中)
request_scheduled
signal
request_scheduled() (在 scrapy.signals 模块中)
Response (scrapy.http 中的类)
response_downloaded
signal
response_downloaded() (在 scrapy.signals 模块中)
response_received
signal
response_received() (在 scrapy.signals 模块中)
retrieve_response() (scrapy.extensions.httpcache.CacheStorage 方法)
RETRY_ENABLED
setting
RETRY_HTTP_CODES
setting
RETRY_PRIORITY_ADJUST
setting
RETRY_TIMES
setting
RetryMiddleware (scrapy.downloadermiddlewares.retry 中的类)
ReturnsContract (scrapy.contracts.default 中的类)
RFC2616Policy (scrapy.extensions.httpcache 中的类)
RobotParser (scrapy.robotstxt 中的类)
ROBOTSTXT_OBEY
setting
ROBOTSTXT_PARSER
setting
ROBOTSTXT_USER_AGENT
setting
RobotsTxtMiddleware (scrapy.downloadermiddlewares.robotstxt 中的类)
Rule (scrapy.spiders 中的类)
rules (scrapy.spiders.CrawlSpider 属性)
runspider
command
S
SameOriginPolicy (scrapy.spidermiddlewares.referer 中的类)
SCHEDULER
setting
SCHEDULER_DEBUG
setting
SCHEDULER_DISK_QUEUE
setting
SCHEDULER_MEMORY_QUEUE
setting
SCHEDULER_PRIORITY_QUEUE
setting
scraped() (scrapy.logformatter.LogFormatter 方法)
SCRAPER_SLOT_MAX_ACTIVE_SIZE
setting
ScrapesContract (scrapy.contracts.default 中的类)
scrapy.contracts
模块
scrapy.contracts.default
模块
scrapy.core.scheduler
模块
scrapy.crawler
模块
scrapy.downloadermiddlewares
模块
scrapy.downloadermiddlewares.ajaxcrawl
模块
scrapy.downloadermiddlewares.cookies
模块
scrapy.downloadermiddlewares.defaultheaders
模块
scrapy.downloadermiddlewares.downloadtimeout
模块
scrapy.downloadermiddlewares.httpauth
模块
scrapy.downloadermiddlewares.httpcache
模块
scrapy.downloadermiddlewares.httpcompression
模块
scrapy.downloadermiddlewares.httpproxy
模块
scrapy.downloadermiddlewares.redirect
模块
scrapy.downloadermiddlewares.retry
模块
scrapy.downloadermiddlewares.robotstxt
模块
scrapy.downloadermiddlewares.stats
模块
scrapy.downloadermiddlewares.useragent
模块
scrapy.exceptions
模块
scrapy.exporters
模块
scrapy.extensions.closespider
模块
scrapy.extensions.corestats
模块
scrapy.extensions.debug
模块
scrapy.extensions.httpcache
模块
scrapy.extensions.logstats
模块
scrapy.extensions.memdebug
模块
scrapy.extensions.memusage
模块
scrapy.extensions.statsmailer
模块
scrapy.extensions.telnet
模块
scrapy.Field (scrapy.item 中的类)
scrapy.FormRequest (scrapy.http 中的类)
scrapy.http
模块
scrapy.http.FormRequest (scrapy.http 中的类)
scrapy.http.request.form.FormRequest (scrapy.http 中的类)
scrapy.item
模块
scrapy.Item (scrapy.item 中的类)
scrapy.item.Field (scrapy.item 中的类)
scrapy.item.Item (scrapy.item 中的类)
scrapy.link
模块
scrapy.linkextractors
模块
scrapy.linkextractors.lxmlhtml
模块
scrapy.loader
模块
scrapy.mail
模块
scrapy.pipelines.files
模块
scrapy.pipelines.images
模块
scrapy.robotstxt
模块
scrapy.selector
模块
scrapy.settings
模块
scrapy.signals
模块
scrapy.Spider (scrapy.spiders 中的类)
scrapy.spiderloader
模块
scrapy.spidermiddlewares
模块
scrapy.spidermiddlewares.depth
模块
scrapy.spidermiddlewares.httperror
模块
scrapy.spidermiddlewares.offsite
模块
scrapy.spidermiddlewares.referer
模块
scrapy.spidermiddlewares.urllength
模块
scrapy.spiders
模块
scrapy.spiders.Spider (scrapy.spiders 中的类)
scrapy.statscollectors
模块
scrapy.utils.log
模块
scrapy.utils.trackref
模块
selector (scrapy.http.TextResponse 属性)
Selector (scrapy.selector 中的类)
SelectorList (scrapy.selector 中的类)
send() (scrapy.mail.MailSender 方法)
serialize_field() (scrapy.exporters.BaseItemExporter 方法)
set() (scrapy.settings.BaseSettings 方法)
set_stats() (scrapy.statscollectors.StatsCollector 方法)
set_value() (scrapy.statscollectors.StatsCollector 方法)
set_xpathfunc() (在 parsel.xpathfuncs 模块中)
setmodule() (scrapy.settings.BaseSettings 方法)
setting
AJAXCRAWL_ENABLED
ASYNCIO_EVENT_LOOP
AUTOTHROTTLE_DEBUG
AUTOTHROTTLE_ENABLED
AUTOTHROTTLE_MAX_DELAY
AUTOTHROTTLE_START_DELAY
AUTOTHROTTLE_TARGET_CONCURRENCY
AWS_ACCESS_KEY_ID
AWS_ENDPOINT_URL
AWS_REGION_NAME
AWS_SECRET_ACCESS_KEY
AWS_SESSION_TOKEN
AWS_USE_SSL
AWS_VERIFY
BOT_NAME
CLOSESPIDER_ERRORCOUNT
CLOSESPIDER_ITEMCOUNT
CLOSESPIDER_PAGECOUNT
CLOSESPIDER_TIMEOUT
COMMANDS_MODULE
COMPRESSION_ENABLED
CONCURRENT_ITEMS
CONCURRENT_REQUESTS
CONCURRENT_REQUESTS_PER_DOMAIN
CONCURRENT_REQUESTS_PER_IP
COOKIES_DEBUG
COOKIES_ENABLED
DEFAULT_ITEM_CLASS
DEFAULT_REQUEST_HEADERS
DEPTH_LIMIT
DEPTH_PRIORITY
DEPTH_STATS_VERBOSE
DNS_RESOLVER
DNS_TIMEOUT
DNSCACHE_ENABLED
DNSCACHE_SIZE
DOWNLOAD_DELAY
DOWNLOAD_FAIL_ON_DATALOSS
DOWNLOAD_HANDLERS
DOWNLOAD_HANDLERS_BASE
DOWNLOAD_MAXSIZE
DOWNLOAD_TIMEOUT
DOWNLOAD_WARNSIZE
DOWNLOADER
DOWNLOADER_CLIENT_TLS_CIPHERS
DOWNLOADER_CLIENT_TLS_METHOD
DOWNLOADER_CLIENT_TLS_VERBOSE_LOGGING
DOWNLOADER_CLIENTCONTEXTFACTORY
DOWNLOADER_HTTPCLIENTFACTORY
DOWNLOADER_MIDDLEWARES
DOWNLOADER_MIDDLEWARES_BASE
DOWNLOADER_STATS
DUPEFILTER_CLASS
DUPEFILTER_DEBUG
EDITOR
EXTENSIONS
EXTENSIONS_BASE
FEED_EXPORT_BATCH_ITEM_COUNT
FEED_EXPORT_ENCODING
FEED_EXPORT_FIELDS
FEED_EXPORT_INDENT
FEED_EXPORTERS
FEED_EXPORTERS_BASE
FEED_STORAGE_FTP_ACTIVE
FEED_STORAGE_GCS_ACL
FEED_STORAGE_S3_ACL
FEED_STORAGES
FEED_STORAGES_BASE
FEED_STORE_EMPTY
FEED_TEMPDIR
FEED_URI_PARAMS
FEEDS
FILES_EXPIRES
FILES_RESULT_FIELD
FILES_STORE
FILES_STORE_GCS_ACL
FILES_STORE_S3_ACL
FILES_URLS_FIELD
FTP_PASSIVE_MODE
FTP_PASSWORD
FTP_USER
GCS_PROJECT_ID
HTTPCACHE_ALWAYS_STORE
HTTPCACHE_DBM_MODULE
HTTPCACHE_DIR
HTTPCACHE_ENABLED
HTTPCACHE_EXPIRATION_SECS
HTTPCACHE_GZIP
HTTPCACHE_IGNORE_HTTP_CODES
HTTPCACHE_IGNORE_MISSING
HTTPCACHE_IGNORE_RESPONSE_CACHE_CONTROLS
HTTPCACHE_IGNORE_SCHEMES
HTTPCACHE_POLICY
HTTPCACHE_STORAGE
HTTPERROR_ALLOW_ALL
HTTPERROR_ALLOWED_CODES
HTTPPROXY_AUTH_ENCODING
HTTPPROXY_ENABLED
IMAGES_EXPIRES
IMAGES_MIN_HEIGHT
IMAGES_MIN_WIDTH
IMAGES_RESULT_FIELD
IMAGES_STORE
IMAGES_STORE_GCS_ACL
IMAGES_STORE_S3_ACL
IMAGES_THUMBS
IMAGES_URLS_FIELD
ITEM_PIPELINES
ITEM_PIPELINES_BASE
JOBDIR
LOG_DATEFORMAT
LOG_ENABLED
LOG_ENCODING
LOG_FILE
LOG_FILE_APPEND
LOG_FORMAT
LOG_FORMATTER
LOG_LEVEL
LOG_SHORT_NAMES
LOG_STDOUT
LOGSTATS_INTERVAL
MAIL_FROM
MAIL_HOST
MAIL_PASS
MAIL_PORT
MAIL_SSL
MAIL_TLS
MAIL_USER
MEDIA_ALLOW_REDIRECTS
MEMDEBUG_ENABLED
MEMDEBUG_NOTIFY
MEMUSAGE_CHECK_INTERVAL_SECONDS
MEMUSAGE_ENABLED
MEMUSAGE_LIMIT_MB
MEMUSAGE_NOTIFY_MAIL
MEMUSAGE_WARNING_MB
METAREFRESH_ENABLED
METAREFRESH_IGNORE_TAGS
METAREFRESH_MAXDELAY
NEWSPIDER_MODULE
RANDOMIZE_DOWNLOAD_DELAY
REACTOR_THREADPOOL_MAXSIZE
REDIRECT_ENABLED
REDIRECT_MAX_TIMES
REDIRECT_PRIORITY_ADJUST
REFERER_ENABLED
REFERRER_POLICY
RETRY_ENABLED
RETRY_HTTP_CODES
RETRY_PRIORITY_ADJUST
RETRY_TIMES
ROBOTSTXT_OBEY
ROBOTSTXT_PARSER
ROBOTSTXT_USER_AGENT
SCHEDULER
SCHEDULER_DEBUG
SCHEDULER_DISK_QUEUE
SCHEDULER_MEMORY_QUEUE
SCHEDULER_PRIORITY_QUEUE
SCRAPER_SLOT_MAX_ACTIVE_SIZE
SPIDER_CONTRACTS
SPIDER_CONTRACTS_BASE
SPIDER_LOADER_CLASS
SPIDER_LOADER_WARN_ONLY
SPIDER_MIDDLEWARES
SPIDER_MIDDLEWARES_BASE
SPIDER_MODULES
STATS_CLASS
STATS_DUMP
STATSMAILER_RCPTS
TELNETCONSOLE_ENABLED
TELNETCONSOLE_HOST
TELNETCONSOLE_PASSWORD
TELNETCONSOLE_PORT
TELNETCONSOLE_USERNAME
TEMPLATES_DIR
TWISTED_REACTOR
URLLENGTH_LIMIT
USER_AGENT
settings
command
settings (scrapy.crawler.Crawler 属性)
Settings (scrapy.settings 中的类)
settings (scrapy.spiders.scrapy.Spider 属性)
SETTINGS_PRIORITIES() (在 scrapy.settings 模块中)
shell
command
signal
bytes_received
engine_started
engine_stopped
headers_received
item_dropped
item_error
item_scraped
request_dropped
request_left_downloader
request_reached_downloader
request_scheduled
response_downloaded
response_received
spider_closed
spider_error
spider_idle
spider_opened
update_telnet_vars
signals (scrapy.crawler.Crawler 属性)
sitemap_alternate_links (scrapy.spiders.SitemapSpider 属性)
sitemap_filter() (scrapy.spiders.SitemapSpider 方法)
sitemap_follow (scrapy.spiders.SitemapSpider 属性)
sitemap_rules (scrapy.spiders.SitemapSpider 属性)
sitemap_urls (scrapy.spiders.SitemapSpider 属性)
SitemapSpider (scrapy.spiders 中的类)
spider (scrapy.crawler.Crawler 属性)
spider_closed
signal
spider_closed() (在 scrapy.signals 模块中)
SPIDER_CONTRACTS
setting
SPIDER_CONTRACTS_BASE
setting
spider_error
signal
spider_error() (scrapy.logformatter.LogFormatter 方法)
(在 scrapy.signals 模块中)
spider_idle
signal
spider_idle() (在 scrapy.signals 模块中)
SPIDER_LOADER_CLASS
setting
SPIDER_LOADER_WARN_ONLY
setting
SPIDER_MIDDLEWARES
setting
SPIDER_MIDDLEWARES_BASE
setting
SPIDER_MODULES
setting
spider_opened
signal
spider_opened() (在 scrapy.signals 模块中)
spider_stats (scrapy.statscollectors.MemoryStatsCollector 属性)
SpiderLoader (scrapy.spiderloader 中的类)
SpiderMiddleware (scrapy.spidermiddlewares 中的类)
StackTraceDump (scrapy.extensions.debug 中的类)
start_exporting() (scrapy.exporters.BaseItemExporter 方法)
start_requests() (scrapy.spiders.scrapy.Spider 方法)
start_urls (scrapy.spiders.scrapy.Spider 属性)
startproject
command
state (scrapy.spiders.scrapy.Spider 属性)
stats (scrapy.crawler.Crawler 属性)
STATS_CLASS
setting
STATS_DUMP
setting
StatsCollector (scrapy.statscollectors 中的类)
StatsMailer (scrapy.extensions.statsmailer 中的类)
STATSMAILER_RCPTS
setting
status (scrapy.http.Response 属性)
StopDownload
store_response() (scrapy.extensions.httpcache.CacheStorage 方法)
StrictOriginPolicy (scrapy.spidermiddlewares.referer 中的类)
StrictOriginWhenCrossOriginPolicy (scrapy.spidermiddlewares.referer 中的类)
T
TelnetConsole (scrapy.extensions.telnet 中的类)
TELNETCONSOLE_ENABLED
setting
TELNETCONSOLE_HOST
setting
TELNETCONSOLE_PASSWORD
setting
TELNETCONSOLE_PORT
setting
TELNETCONSOLE_USERNAME
setting
TEMPLATES_DIR
setting
text (scrapy.http.TextResponse 属性)
TextResponse (scrapy.http 中的类)
to_dict() (scrapy.http.Request 方法)
TWISTED_REACTOR
setting
U
UnsafeUrlPolicy (scrapy.spidermiddlewares.referer 中的类)
update() (scrapy.settings.BaseSettings 方法)
update_telnet_vars
signal
update_telnet_vars() (在 scrapy.extensions.telnet 模块中)
uri_params() (在 scrapy.extensions.feedexport 模块中)
url (scrapy.http.Request 属性)
(scrapy.http.Response 属性)
UrlContract (scrapy.contracts.default 中的类)
urljoin() (scrapy.http.Response 方法)
URLLENGTH_LIMIT
setting
UrlLengthMiddleware (scrapy.spidermiddlewares.urllength 中的类)
USER_AGENT
setting
UserAgentMiddleware (scrapy.downloadermiddlewares.useragent 中的类)
V
version
command
view
command
W
write()
X
XMLFeedSpider (scrapy.spiders 中的类)
XmlItemExporter (scrapy.exporters 中的类)
XmlResponse (scrapy.http 中的类)
xpath() (scrapy.http.TextResponse 方法)
(scrapy.selector.Selector 方法)
(scrapy.selector.SelectorList 方法)
模
模块
scrapy.contracts
scrapy.contracts.default
scrapy.core.scheduler
scrapy.crawler
scrapy.downloadermiddlewares
scrapy.downloadermiddlewares.ajaxcrawl
scrapy.downloadermiddlewares.cookies
scrapy.downloadermiddlewares.defaultheaders
scrapy.downloadermiddlewares.downloadtimeout
scrapy.downloadermiddlewares.httpauth
scrapy.downloadermiddlewares.httpcache
scrapy.downloadermiddlewares.httpcompression
scrapy.downloadermiddlewares.httpproxy
scrapy.downloadermiddlewares.redirect
scrapy.downloadermiddlewares.retry
scrapy.downloadermiddlewares.robotstxt
scrapy.downloadermiddlewares.stats
scrapy.downloadermiddlewares.useragent
scrapy.exceptions
scrapy.exporters
scrapy.extensions.closespider
scrapy.extensions.corestats
scrapy.extensions.debug
scrapy.extensions.httpcache
scrapy.extensions.logstats
scrapy.extensions.memdebug
scrapy.extensions.memusage
scrapy.extensions.statsmailer
scrapy.extensions.telnet
scrapy.http
scrapy.item
scrapy.link
scrapy.linkextractors
scrapy.linkextractors.lxmlhtml
scrapy.loader
scrapy.mail
scrapy.pipelines.files
scrapy.pipelines.images
scrapy.robotstxt
scrapy.selector
scrapy.settings
scrapy.signals
scrapy.spiderloader
scrapy.spidermiddlewares
scrapy.spidermiddlewares.depth
scrapy.spidermiddlewares.httperror
scrapy.spidermiddlewares.offsite
scrapy.spidermiddlewares.referer
scrapy.spidermiddlewares.urllength
scrapy.spiders
scrapy.statscollectors
scrapy.utils.log
scrapy.utils.trackref