Pycrl快速启动¶
检索网络资源¶
一旦安装了pycurl,我们就可以执行网络操作。最简单的方法是通过其URL检索资源。要使用pycurl发出网络请求,需要执行以下步骤:
创建一个
pycurl.Curl
实例。使用
setopt
设置选项。呼叫
perform
执行操作。
下面是如何在Python 3中检索网络资源:
import pycurl
import certifi
from io import BytesIO
buffer = BytesIO()
c = pycurl.Curl()
c.setopt(c.URL, 'http://pycurl.io/')
c.setopt(c.WRITEDATA, buffer)
c.setopt(c.CAINFO, certifi.where())
c.perform()
c.close()
body = buffer.getvalue()
# Body is a byte string.
# We have to know the encoding in order to print it to a text file
# such as standard output.
print(body.decode('iso-8859-1'))
此代码可通过以下方式获得 examples/quickstart/get_python3.py
。有关仅限Python 2的示例,请参阅 examples/quickstart/get_python2.py
。有关针对Python 2和3的示例,请参阅 examples/quickstart/get.py
。
pycurl不为网络响应提供存储-这是应用程序的工作。因此,我们必须设置一个缓冲区(以stringio对象的形式),并指示pycurl写入该缓冲区。
大多数现有的pycurl代码使用writefunction而不是writedata,如下所示:
c.setopt(c.WRITEFUNCTION, buffer.write)
虽然writeFunction习惯用法继续工作,但现在不需要了。从pycurl 7.19.3开始,writedata接受具有 write
方法。
使用HTTPS¶
现在大多数网站都使用HTTPS,这是通过TLS/SSL的HTTP。为了利用HTTPS提供的安全性,pycurl需要使用 证书捆绑包 . 随着证书时间的推移,pycurl不提供这样的包;一个包可能由您的操作系统提供,但如果不提供,请考虑使用 certifi python包:
import pycurl
import certifi
from io import BytesIO
buffer = BytesIO()
c = pycurl.Curl()
c.setopt(c.URL, 'https://python.org/')
c.setopt(c.WRITEDATA, buffer)
c.setopt(c.CAINFO, certifi.where())
c.perform()
c.close()
body = buffer.getvalue()
# Body is a byte string.
# We have to know the encoding in order to print it to a text file
# such as standard output.
print(body.decode('iso-8859-1'))
此代码可通过以下方式获得 examples/quickstart/get_python3_https.py
。有关Python 2示例,请参见 examples/quickstart/get_python2_https.py
。
故障排除¶
当事情不能按预期工作时,使用libcurl's VERBOSE
接收与请求相关的大量调试输出的选项:
c.setopt(c.VERBOSE, True)
将使用pycurl的程序的详细输出与 curl
当使用命令行工具调用后者时 -v
选项:
curl -v http://pycurl.io/
正在检查响应头¶
实际上,我们希望使用服务器指定的编码而不是假定编码来解码响应。为此,我们需要检查响应头:
import pycurl
import re
try:
from io import BytesIO
except ImportError:
from StringIO import StringIO as BytesIO
headers = {}
def header_function(header_line):
# HTTP standard specifies that headers are encoded in iso-8859-1.
# On Python 2, decoding step can be skipped.
# On Python 3, decoding step is required.
header_line = header_line.decode('iso-8859-1')
# Header lines include the first status line (HTTP/1.x ...).
# We are going to ignore all lines that don't have a colon in them.
# This will botch headers that are split on multiple lines...
if ':' not in header_line:
return
# Break the header line into header name and value.
name, value = header_line.split(':', 1)
# Remove whitespace that may be present.
# Header lines include the trailing newline, and there may be whitespace
# around the colon.
name = name.strip()
value = value.strip()
# Header names are case insensitive.
# Lowercase name here.
name = name.lower()
# Now we can actually record the header name and value.
# Note: this only works when headers are not duplicated, see below.
headers[name] = value
buffer = BytesIO()
c = pycurl.Curl()
c.setopt(c.URL, 'http://pycurl.io')
c.setopt(c.WRITEFUNCTION, buffer.write)
# Set our header function.
c.setopt(c.HEADERFUNCTION, header_function)
c.perform()
c.close()
# Figure out what encoding was sent with the response, if any.
# Check against lowercased header name.
encoding = None
if 'content-type' in headers:
content_type = headers['content-type'].lower()
match = re.search('charset=(\S+)', content_type)
if match:
encoding = match.group(1)
print('Decoding using %s' % encoding)
if encoding is None:
# Default encoding for HTML is iso-8859-1.
# Other content types may have different default encoding,
# or in case of binary data, may have no encoding at all.
encoding = 'iso-8859-1'
print('Assuming encoding is %s' % encoding)
body = buffer.getvalue()
# Decode using the encoding we figured out.
print(body.decode(encoding))
此代码可用作 examples/quickstart/response_headers.py
.
这是非常简单的代码。不幸的是,由于libcurl避免为响应数据分配内存,所以在我们的应用程序上执行这个咕哝工作。
上述代码的一个警告是,如果同一名称有多个头(如set cookie),则只存储最后一个头值。要将多值头中的所有值记录为列表,可以使用以下代码而不是 headers[name] = value
线::
if name in headers:
if isinstance(headers[name], list):
headers[name].append(value)
else:
headers[name] = [headers[name], value]
else:
headers[name] = value
正在写入文件¶
假设我们想将响应体保存到一个文件中。这实际上很容易改变:
import pycurl
# As long as the file is opened in binary mode, both Python 2 and Python 3
# can write response body to it without decoding.
with open('out.html', 'wb') as f:
c = pycurl.Curl()
c.setopt(c.URL, 'http://pycurl.io/')
c.setopt(c.WRITEDATA, f)
c.perform()
c.close()
此代码可用作 examples/quickstart/write_file.py
.
重要的部分是以二进制模式打开文件,然后响应主体就可以由sewise写入,而无需解码或编码步骤。
跟踪重定向¶
默认情况下,libcurl和pycurl不遵循重定向。更改此行为涉及使用 setopt
像这样::
import pycurl
c = pycurl.Curl()
# Redirects to https://www.python.org/.
c.setopt(c.URL, 'http://www.python.org/')
# Follow redirect.
c.setopt(c.FOLLOWLOCATION, True)
c.perform()
c.close()
此代码可用作 examples/quickstart/follow_redirect.py
.
因为我们没有设置写回调,所以将响应主体写入标准输出的默认libcurl和pycurl行为将生效。
设置选项¶
下面的重定向是libcurl提供的一个选项。还有很多这样的选择,它们记录在 curl_easy_setopt 页。除了极少数例外,pycurl选项名是通过删除 CURLOPT_
前缀。因此, CURLOPT_URL
变得简单 URL
.
正在检查响应¶
我们已经讨论了检查响应头。其他响应信息可通过 getinfo
呼叫如下:
import pycurl
try:
from io import BytesIO
except ImportError:
from StringIO import StringIO as BytesIO
buffer = BytesIO()
c = pycurl.Curl()
c.setopt(c.URL, 'http://pycurl.io/')
c.setopt(c.WRITEDATA, buffer)
c.perform()
# HTTP response code, e.g. 200.
print('Status: %d' % c.getinfo(c.RESPONSE_CODE))
# Elapsed time for the transfer.
print('Time: %f' % c.getinfo(c.TOTAL_TIME))
# getinfo must be called before close.
c.close()
此代码可用作 examples/quickstart/response_info.py
.
在这里,我们将正文写入缓冲区,以避免将无趣的输出打印到标准输出。
libcurl公开的响应信息记录在 curl_easy_getinfo 页。除了极少数例外,pycurl常量是通过删除 CURLINFO_
前缀。因此, CURLINFO_RESPONSE_CODE
变得简单 RESPONSE_CODE
.
发送表单数据¶
要发送表单数据,请使用 POSTFIELDS
选择权。表单数据必须预先进行URL编码::
import pycurl
try:
# python 3
from urllib.parse import urlencode
except ImportError:
# python 2
from urllib import urlencode
c = pycurl.Curl()
c.setopt(c.URL, 'https://httpbin.org/post')
post_data = {'field': 'value'}
# Form data must be provided already urlencoded.
postfields = urlencode(post_data)
# Sets request method to POST,
# Content-Type header to application/x-www-form-urlencoded
# and data to send in request body.
c.setopt(c.POSTFIELDS, postfields)
c.perform()
c.close()
此代码可用作 examples/quickstart/form_post.py
.
POSTFIELDS
自动将HTTP请求方法设置为Post。可通过以下方式指定其他请求方法: CUSTOMREQUEST
选项:
c.setopt(c.CUSTOMREQUEST, 'PATCH')
文件上传-多部分发布¶
要复制HTML表单(特别是多部分表单)中文件上载的行为,请使用 HTTPPOST
选择权。这样的上传是用 POST
请求。请参阅下一个示例,了解如何使用 PUT
请求。
如果要上载的数据位于物理文件中,请使用 FORM_FILE
::
import pycurl
c = pycurl.Curl()
c.setopt(c.URL, 'https://httpbin.org/post')
c.setopt(c.HTTPPOST, [
('fileupload', (
# upload the contents of this file
c.FORM_FILE, __file__,
)),
])
c.perform()
c.close()
此代码可用作 examples/quickstart/file_upload_real.py
.
libcurl
提供了一些选项来调整文件上载和多部分表单提交。这些记录在 curl_formadd page . 例如,要设置不同的文件名和内容类型:
import pycurl
c = pycurl.Curl()
c.setopt(c.URL, 'https://httpbin.org/post')
c.setopt(c.HTTPPOST, [
('fileupload', (
# upload the contents of this file
c.FORM_FILE, __file__,
# specify a different file name for the upload
c.FORM_FILENAME, 'helloworld.py',
# specify a different content type
c.FORM_CONTENTTYPE, 'application/x-python',
)),
])
c.perform()
c.close()
此代码可用作 examples/quickstart/file_upload_real_fancy.py
.
如果文件数据在内存中,请使用 BUFFER
/BUFFERPTR
如下:
import pycurl
c = pycurl.Curl()
c.setopt(c.URL, 'https://httpbin.org/post')
c.setopt(c.HTTPPOST, [
('fileupload', (
c.FORM_BUFFER, 'readme.txt',
c.FORM_BUFFERPTR, 'This is a fancy readme file',
)),
])
c.perform()
c.close()
此代码可用作 examples/quickstart/file_upload_buffer.py
.
文件上传-放置¶
文件也可以通过 PUT
请求。以下是如何使用物理文件进行排列:
import pycurl
c = pycurl.Curl()
c.setopt(c.URL, 'https://httpbin.org/put')
c.setopt(c.UPLOAD, 1)
file = open('body.json')
c.setopt(c.READDATA, file)
c.perform()
c.close()
# File must be kept open while Curl object is using it
file.close()
此代码可用作 examples/quickstart/put_file.py
.
如果数据存储在缓冲区中:
import pycurl
try:
from io import BytesIO
except ImportError:
from StringIO import StringIO as BytesIO
c = pycurl.Curl()
c.setopt(c.URL, 'https://httpbin.org/put')
c.setopt(c.UPLOAD, 1)
data = '{"json":true}'
# READDATA requires an IO-like object; a string is not accepted
# encode() is necessary for Python 3
buffer = BytesIO(data.encode('utf-8'))
c.setopt(c.READDATA, buffer)
c.perform()
c.close()
此代码可用作 examples/quickstart/put_buffer.py
.