Pycrl快速启动

检索网络资源

一旦安装了pycurl,我们就可以执行网络操作。最简单的方法是通过其URL检索资源。要使用pycurl发出网络请求,需要执行以下步骤:

  1. 创建一个 pycurl.Curl 实例。

  2. 使用 setopt 设置选项。

  3. 呼叫 perform 执行操作。

下面是我们如何在python 2中检索网络资源:

import pycurl
from StringIO import StringIO

buffer = StringIO()
c = pycurl.Curl()
c.setopt(c.URL, 'http://pycurl.io/')
c.setopt(c.WRITEDATA, buffer)
c.perform()
c.close()

body = buffer.getvalue()
# Body is a string in some encoding.
# In Python 2, we can print it without knowing what the encoding is.
print(body)

此代码可用作 examples/quickstart/get_python2.py .

pycurl不为网络响应提供存储-这是应用程序的工作。因此,我们必须设置一个缓冲区(以stringio对象的形式),并指示pycurl写入该缓冲区。

大多数现有的pycurl代码使用writefunction而不是writedata,如下所示:

c.setopt(c.WRITEFUNCTION, buffer.write)

虽然writeFunction习惯用法继续工作,但现在不需要了。从pycurl 7.19.3开始,writedata接受具有 write 方法。

python 3版本稍微复杂一点:

import pycurl
from io import BytesIO

buffer = BytesIO()
c = pycurl.Curl()
c.setopt(c.URL, 'http://pycurl.io/')
c.setopt(c.WRITEDATA, buffer)
c.perform()
c.close()

body = buffer.getvalue()
# Body is a byte string.
# We have to know the encoding in order to print it to a text file
# such as standard output.
print(body.decode('iso-8859-1'))

此代码可用作 examples/quickstart/get_python3.py .

在python 3中,pycurl以字节字符串的形式响应响应响应主体。如果要下载二进制文件,这很方便,但是对于文本文档,我们必须解码字节字符串。在上面的例子中,我们假设主体是用ISO-8859-1编码的。

python 2和python 3版本可以组合在一起。这样做需要像在Python3版本中那样对响应体进行解码。组合示例的代码可以在 examples/quickstart/get.py .

使用HTTPS

现在大多数网站都使用HTTPS,这是通过TLS/SSL的HTTP。为了利用HTTPS提供的安全性,pycurl需要使用 证书捆绑包 . 随着证书时间的推移,pycurl不提供这样的包;一个包可能由您的操作系统提供,但如果不提供,请考虑使用 certifi python包:

import pycurl
import certifi
from StringIO import StringIO

buffer = StringIO()
c = pycurl.Curl()
c.setopt(c.URL, 'https://python.org/')
c.setopt(c.WRITEDATA, buffer)
c.setopt(c.CAINFO, certifi.where())
c.perform()
c.close()

body = buffer.getvalue()
print(body)

此代码可用作 examples/quickstart/get_python2_https.pyexamples/quickstart/get_python3_https.py .

故障排除

当事情不能按预期工作时,使用libcurl's VERBOSE 接收与请求相关的大量调试输出的选项:

c.setopt(c.VERBOSE, True)

将使用pycurl的程序的详细输出与 curl 当使用命令行工具调用后者时 -v 选项:

curl -v http://pycurl.io/

正在检查响应头

实际上,我们希望使用服务器指定的编码而不是假定编码来解码响应。为此,我们需要检查响应头:

import pycurl
import re
try:
    from io import BytesIO
except ImportError:
    from StringIO import StringIO as BytesIO

headers = {}
def header_function(header_line):
    # HTTP standard specifies that headers are encoded in iso-8859-1.
    # On Python 2, decoding step can be skipped.
    # On Python 3, decoding step is required.
    header_line = header_line.decode('iso-8859-1')

    # Header lines include the first status line (HTTP/1.x ...).
    # We are going to ignore all lines that don't have a colon in them.
    # This will botch headers that are split on multiple lines...
    if ':' not in header_line:
        return

    # Break the header line into header name and value.
    name, value = header_line.split(':', 1)

    # Remove whitespace that may be present.
    # Header lines include the trailing newline, and there may be whitespace
    # around the colon.
    name = name.strip()
    value = value.strip()

    # Header names are case insensitive.
    # Lowercase name here.
    name = name.lower()

    # Now we can actually record the header name and value.
    # Note: this only works when headers are not duplicated, see below.
    headers[name] = value

buffer = BytesIO()
c = pycurl.Curl()
c.setopt(c.URL, 'http://pycurl.io')
c.setopt(c.WRITEFUNCTION, buffer.write)
# Set our header function.
c.setopt(c.HEADERFUNCTION, header_function)
c.perform()
c.close()

# Figure out what encoding was sent with the response, if any.
# Check against lowercased header name.
encoding = None
if 'content-type' in headers:
    content_type = headers['content-type'].lower()
    match = re.search('charset=(\S+)', content_type)
    if match:
        encoding = match.group(1)
        print('Decoding using %s' % encoding)
if encoding is None:
    # Default encoding for HTML is iso-8859-1.
    # Other content types may have different default encoding,
    # or in case of binary data, may have no encoding at all.
    encoding = 'iso-8859-1'
    print('Assuming encoding is %s' % encoding)

body = buffer.getvalue()
# Decode using the encoding we figured out.
print(body.decode(encoding))

此代码可用作 examples/quickstart/response_headers.py .

这是非常简单的代码。不幸的是,由于libcurl避免为响应数据分配内存,所以在我们的应用程序上执行这个咕哝工作。

上述代码的一个警告是,如果同一名称有多个头(如set cookie),则只存储最后一个头值。要将多值头中的所有值记录为列表,可以使用以下代码而不是 headers[name] = value 线::

if name in headers:
    if isinstance(headers[name], list):
        headers[name].append(value)
    else:
        headers[name] = [headers[name], value]
else:
    headers[name] = value

正在写入文件

假设我们想将响应体保存到一个文件中。这实际上很容易改变:

import pycurl

# As long as the file is opened in binary mode, both Python 2 and Python 3
# can write response body to it without decoding.
with open('out.html', 'wb') as f:
    c = pycurl.Curl()
    c.setopt(c.URL, 'http://pycurl.io/')
    c.setopt(c.WRITEDATA, f)
    c.perform()
    c.close()

此代码可用作 examples/quickstart/write_file.py .

重要的部分是以二进制模式打开文件,然后响应主体就可以由sewise写入,而无需解码或编码步骤。

跟踪重定向

默认情况下,libcurl和pycurl不遵循重定向。更改此行为涉及使用 setopt 像这样::

import pycurl

c = pycurl.Curl()
# Redirects to https://www.python.org/.
c.setopt(c.URL, 'http://www.python.org/')
# Follow redirect.
c.setopt(c.FOLLOWLOCATION, True)
c.perform()
c.close()

此代码可用作 examples/quickstart/follow_redirect.py .

因为我们没有设置写回调,所以将响应主体写入标准输出的默认libcurl和pycurl行为将生效。

设置选项

下面的重定向是libcurl提供的一个选项。还有很多这样的选择,它们记录在 curl_easy_setopt 页。除了极少数例外,pycurl选项名是通过删除 CURLOPT_ 前缀。因此, CURLOPT_URL 变得简单 URL .

正在检查响应

我们已经讨论了检查响应头。其他响应信息可通过 getinfo 呼叫如下:

import pycurl
try:
    from io import BytesIO
except ImportError:
    from StringIO import StringIO as BytesIO

buffer = BytesIO()
c = pycurl.Curl()
c.setopt(c.URL, 'http://pycurl.io/')
c.setopt(c.WRITEDATA, buffer)
c.perform()

# HTTP response code, e.g. 200.
print('Status: %d' % c.getinfo(c.RESPONSE_CODE))
# Elapsed time for the transfer.
print('Status: %f' % c.getinfo(c.TOTAL_TIME))

# getinfo must be called before close.
c.close()

此代码可用作 examples/quickstart/response_info.py .

在这里,我们将正文写入缓冲区,以避免将无趣的输出打印到标准输出。

libcurl公开的响应信息记录在 curl_easy_getinfo 页。除了极少数例外,pycurl常量是通过删除 CURLINFO_ 前缀。因此, CURLINFO_RESPONSE_CODE 变得简单 RESPONSE_CODE .

发送表单数据

要发送表单数据,请使用 POSTFIELDS 选择权。表单数据必须预先进行URL编码::

import pycurl
try:
    # python 3
    from urllib.parse import urlencode
except ImportError:
    # python 2
    from urllib import urlencode

c = pycurl.Curl()
c.setopt(c.URL, 'https://httpbin.org/post')

post_data = {'field': 'value'}
# Form data must be provided already urlencoded.
postfields = urlencode(post_data)
# Sets request method to POST,
# Content-Type header to application/x-www-form-urlencoded
# and data to send in request body.
c.setopt(c.POSTFIELDS, postfields)

c.perform()
c.close()

此代码可用作 examples/quickstart/form_post.py .

POSTFIELDS 自动将HTTP请求方法设置为Post。可通过以下方式指定其他请求方法: CUSTOMREQUEST 选项:

c.setopt(c.CUSTOMREQUEST, 'PATCH')

文件上传-多部分发布

要复制HTML表单(特别是多部分表单)中文件上载的行为,请使用 HTTPPOST 选择权。这样的上传是用 POST 请求。请参阅下一个示例,了解如何使用 PUT 请求。

如果要上载的数据位于物理文件中,请使用 FORM_FILE ::

import pycurl

c = pycurl.Curl()
c.setopt(c.URL, 'https://httpbin.org/post')

c.setopt(c.HTTPPOST, [
    ('fileupload', (
        # upload the contents of this file
        c.FORM_FILE, __file__,
    )),
])

c.perform()
c.close()

此代码可用作 examples/quickstart/file_upload_real.py .

libcurl 提供了一些选项来调整文件上载和多部分表单提交。这些记录在 curl_formadd page . 例如,要设置不同的文件名和内容类型:

import pycurl

c = pycurl.Curl()
c.setopt(c.URL, 'https://httpbin.org/post')

c.setopt(c.HTTPPOST, [
    ('fileupload', (
        # upload the contents of this file
        c.FORM_FILE, __file__,
        # specify a different file name for the upload
        c.FORM_FILENAME, 'helloworld.py',
        # specify a different content type
        c.FORM_CONTENTTYPE, 'application/x-python',
    )),
])

c.perform()
c.close()

此代码可用作 examples/quickstart/file_upload_real_fancy.py .

如果文件数据在内存中,请使用 BUFFER/BUFFERPTR 如下:

import pycurl

c = pycurl.Curl()
c.setopt(c.URL, 'https://httpbin.org/post')

c.setopt(c.HTTPPOST, [
    ('fileupload', (
        c.FORM_BUFFER, 'readme.txt',
        c.FORM_BUFFERPTR, 'This is a fancy readme file',
    )),
])

c.perform()
c.close()

此代码可用作 examples/quickstart/file_upload_buffer.py .

文件上传-放置

文件也可以通过 PUT 请求。以下是如何使用物理文件进行排列:

import pycurl

c = pycurl.Curl()
c.setopt(c.URL, 'https://httpbin.org/put')

c.setopt(c.UPLOAD, 1)
file = open('body.json')
c.setopt(c.READDATA, file)

c.perform()
c.close()
# File must be kept open while Curl object is using it
file.close()

此代码可用作 examples/quickstart/put_file.py .

如果数据存储在缓冲区中:

import pycurl
try:
    from io import BytesIO
except ImportError:
    from StringIO import StringIO as BytesIO

c = pycurl.Curl()
c.setopt(c.URL, 'https://httpbin.org/put')

c.setopt(c.UPLOAD, 1)
data = '{"json":true}'
# READDATA requires an IO-like object; a string is not accepted
# encode() is necessary for Python 3
buffer = BytesIO(data.encode('utf-8'))
c.setopt(c.READDATA, buffer)

c.perform()
c.close()

此代码可用作 examples/quickstart/put_buffer.py .