解码原始URL路径

本文演示了如何使用非标准(WSGI)或可选(ASGI)应用服务器扩展访问“原始”请求路径。例如,当URI字段已进行百分比编码以区分字段值中的正斜杠和用于分隔字段的斜杠时,这很有用。另见: 为什么我的URL带有百分比编码的正斜杠 (%2F )路由不正确?

WSGI

在框架的WSGI风格中, req.path 是基于 PATH_INFO CGI变量,它已经以百分比解码的形式呈现。一些应用程序服务器以另一个非标准的CGI变量名公开原始URL。让我们实现理解两个这样的扩展的中间件组件, RAW_URI (Werkzeug的开发服务器Gunicorn)和 REQUEST_URI (uWSGI、服务员、Werkzeug的开发服务器),并取代 req.path 使用从原始URL提取的值:

import falcon
import falcon.uri


class RawPathComponent:
    def process_request(self, req, resp):
        raw_uri = req.env.get('RAW_URI') or req.env.get('REQUEST_URI')

        # NOTE: Reconstruct the percent-encoded path from the raw URI.
        if raw_uri:
            req.path, _, _ = raw_uri.partition('?')


class URLResource:
    def on_get(self, req, resp, url):
        # NOTE: url here is potentially percent-encoded.
        url = falcon.uri.decode(url)

        resp.media = {'url': url}

    def on_get_status(self, req, resp, url):
        # NOTE: url here is potentially percent-encoded.
        url = falcon.uri.decode(url)

        resp.media = {'cached': True}


app = falcon.App(middleware=[RawPathComponent()])
app.add_route('/cache/{url}', URLResource())
app.add_route('/cache/{url}/status', URLResource(), suffix='status')

在支持的服务器(如Gunicorn或uWSGI)上运行上述应用程序时,会将以下响应呈现给 GET /cache/http%3A%2F%2Ffalconframework.org 请求:

{
    "url": "http://falconframework.org"
}

我们还可以通过访问以下地址来检查此URI在我们想象的Web缓存系统中的状态 /cache/http%3A%2F%2Ffalconframework.org/status

{
    "cached": true
}

如果我们移除 RawPathComponent() 从应用程序的中间件列表中,请求将被路由为 /cache/http://falconframework.org ,并且找不到匹配的资源:

{
    "title": "404 Not Found"
}

更重要的是,即使我们可以实现一个灵活的路由器,能够匹配这些复杂的URI模式,应用程序仍然无法区分 /cache/http%3A%2F%2Ffalconframework.org%2Fstatus/cache/http%3A%2F%2Ffalconframework.org/status 如果两者都只以百分比解码的形式呈现。

ASGI

的ASGI版本 req.path 使用 path 来自ASGI作用域的关键字,其中百分比编码的序列已经被解码为字符,就像在WSGI中一样 PATH_INFO 。类似于上一章中的WSGI片段,让我们创建一个中间件组件来替换 req.path 具有价值的 raw_path (如果后者存在于ASGI HTTP作用域中):

import falcon.asgi
import falcon.uri


class RawPathComponent:
    async def process_request(self, req, resp):
        raw_path = req.scope.get('raw_path')

        # NOTE: Decode the raw path from the raw_path bytestring, disallowing
        #   non-ASCII characters, assuming they are correctly percent-coded.
        if raw_path:
            req.path = raw_path.decode('ascii')


class URLResource:
    async def on_get(self, req, resp, url):
        # NOTE: url here is potentially percent-encoded.
        url = falcon.uri.decode(url)

        resp.media = {'url': url}

    async def on_get_status(self, req, resp, url):
        # NOTE: url here is potentially percent-encoded.
        url = falcon.uri.decode(url)

        resp.media = {'cached': True}


app = falcon.asgi.App(middleware=[RawPathComponent()])
app.add_route('/cache/{url}', URLResource())
app.add_route('/cache/{url}/status', URLResource(), suffix='status')

使用运行上面的代码片段 uvicorn (这支持 raw_path ),以百分比编码 url 字段现在已正确处理 GET /cache/http%3A%2F%2Ffalconframework.org%2Fstatus 请求:

{
    "url": "http://falconframework.org/status"
}

同样,与WSGI版本一样,删除 RawPathComponent() 不再允许应用程序按预期路由上述请求:

{
    "title": "404 Not Found"
}