处理UUID数据#

PyMongo提供了处理UUID类型的内置支持。存储本机很简单 uuid.UUID 对象，并将其作为本机检索 uuid.UUID 物体：：

from pymongo import MongoClient
from bson.binary import UuidRepresentation
from uuid import uuid4

# use the 'standard' representation for cross-language compatibility.
client = MongoClient(uuidRepresentation='standard')
collection = client.get_database('uuid_db').get_collection('uuid_coll')

# remove all documents from collection
collection.delete_many({})

# create a native uuid object
uuid_obj = uuid4()

# save the native uuid object to MongoDB
collection.insert_one({'uuid': uuid_obj})

# retrieve the stored uuid object from MongoDB
document = collection.find_one({})

# check that the retrieved UUID matches the inserted UUID
assert document['uuid'] == uuid_obj

本地的 uuid.UUID 对象也可以用作MongoDB查询的一部分：

document = collection.find({'uuid': uuid_obj})
assert document['uuid'] == uuid_obj

上面的例子说明了最简单的用例——UUID由同一个应用程序生成并在同一个应用程序中使用。然而，当处理包含由其他驱动程序创建的uuid的MongoDB部署时，情况可能要复杂得多，因为Java和CSharp驱动程序历来都使用不同于PyMongo所使用的字节顺序对uuid进行编码。需要跨这些驱动程序的互操作性的应用程序必须指定适当的 UuidRepresentation .

在下面的部分中，我们将描述驱动程序在其uuid编码方面的历史差异，以及应用程序如何使用 UuidRepresentation 用于维护跨语言兼容性的配置选项。

注意

不与任何其他应用程序共享MongoDB部署并且从未在MongoDB中存储uuid的新应用程序应该使用 standard 跨语言兼容性的UUID表示。看到了吗配置UUID表示有关如何配置 UuidRepresentation .

UUID数据的遗留处理#

历史上，MongoDB驱动程序在将UUID类型序列化到 Binary . 例如，考虑具有以下规范文本表示的UUID:：

00112233-4455-6677-8899-aabbccddeeff

这个UUID在历史上被Python驱动程序序列化为：

00112233-4455-6677-8899-aabbccddeeff

同一个UUID在历史上会被C驱动程序序列化为：

33221100-5544-7766-8899-aabbccddeeff

最后，相同的UUID在历史上会被Java驱动程序序列化为：

77665544-3322-1100-ffee-ddccbbaa9988

备注

有关不同驱动程序历史上使用的字节顺序的详细信息，请参阅 Handling of Native UUID Types Specification .

在某些情况下，由不同驱动程序编码的uuid的字节顺序的这种差异可能导致非常不直观的行为。我们将在下一节中详细介绍两个这样的场景。

场景2：往返UUID#

在下面的示例中，我们将看到如何使用配置错误的 UuidRepresentation 会导致应用程序无意中更改 Binary 子类型，在某些情况下 Binary 当往返包含uuid的文档时，字段本身。

考虑以下情况：

from bson.codec_options import CodecOptions, DEFAULT_CODEC_OPTIONS
from bson.binary import Binary, UuidRepresentation
from uuid import uuid4

# Using UuidRepresentation.PYTHON_LEGACY stores a Binary subtype-3 UUID
python_opts = CodecOptions(uuid_representation=UuidRepresentation.PYTHON_LEGACY)
input_uuid = uuid4()
collection = client.testdb.get_collection('test', codec_options=python_opts)
collection.insert_one({'_id': 'foo', 'uuid': input_uuid})
assert collection.find_one({'uuid': Binary(input_uuid.bytes, 3)})['_id'] == 'foo'

# Retrieving this document using UuidRepresentation.STANDARD returns a Binary instance
std_opts = CodecOptions(uuid_representation=UuidRepresentation.STANDARD)
std_collection = client.testdb.get_collection('test', codec_options=std_opts)
doc = std_collection.find_one({'_id': 'foo'})
assert isinstance(doc['uuid'], Binary)

# Round-tripping the retrieved document yields the exact same document
std_collection.replace_one({'_id': 'foo'}, doc)
round_tripped_doc = collection.find_one({'uuid': Binary(input_uuid.bytes, 3)})
assert doc == round_tripped_doc

In this example, round-tripping the document using the incorrect UuidRepresentation (STANDARD instead of PYTHON_LEGACY) changes the Binary subtype as a side-effect. Note that this can also happen when the situation is reversed - i.e. when the original document is written using ``STANDARD`` representation and then round-tripped using the ``PYTHON_LEGACY`` representation.

在下一个示例中，我们将看到错误地使用修改字节顺序的表示的后果 (CSHARP_LEGACY 或 JAVA_LEGACY )往返文件时：

from bson.codec_options import CodecOptions, DEFAULT_CODEC_OPTIONS
from bson.binary import Binary, UuidRepresentation
from uuid import uuid4

# Using UuidRepresentation.STANDARD stores a Binary subtype-4 UUID
std_opts = CodecOptions(uuid_representation=UuidRepresentation.STANDARD)
input_uuid = uuid4()
collection = client.testdb.get_collection('test', codec_options=std_opts)
collection.insert_one({'_id': 'baz', 'uuid': input_uuid})
assert collection.find_one({'uuid': Binary(input_uuid.bytes, 4)})['_id'] == 'baz'

# Retrieving this document using UuidRepresentation.JAVA_LEGACY returns a native UUID
# without modifying the UUID byte-order
java_opts = CodecOptions(uuid_representation=UuidRepresentation.JAVA_LEGACY)
java_collection = client.testdb.get_collection('test', codec_options=java_opts)
doc = java_collection.find_one({'_id': 'baz'})
assert doc['uuid'] == input_uuid

# Round-tripping the retrieved document silently changes the Binary bytes and subtype
java_collection.replace_one({'_id': 'baz'}, doc)
assert collection.find_one({'uuid': Binary(input_uuid.bytes, 3)}) is None
assert collection.find_one({'uuid': Binary(input_uuid.bytes, 4)}) is None
round_tripped_doc = collection.find_one({'_id': 'baz'})
assert round_tripped_doc['uuid'] == Binary(input_uuid.bytes, 3).as_uuid(UuidRepresentation.JAVA_LEGACY)

In this case, using the incorrect UuidRepresentation (JAVA_LEGACY instead of STANDARD) changes the Binary bytes and subtype as a side-effect. Note that this happens when any representation that manipulates byte-order (``CSHARP_LEGACY`` or ``JAVA_LEGACY``) is incorrectly used to round-trip UUIDs written with ``STANDARD``. When the situation is reversed - i.e. when the original document is written using ``CSHARP_LEGACY`` or ``JAVA_LEGACY`` and then round-tripped using ``STANDARD`` - only the :class:`~bson.binary.Binary` subtype is changed.

备注

从pymongo4.0开始，这些问题将作为 STANDARD 表示法将二进制子类型3字段解码为 Binary 子类型3的对象（而不是 uuid.UUID )，以及每个 LEGACY_* 表示将二进制子类型4字段解码为 Binary 子类型4的对象（而不是 uuid.UUID ）

配置UUID表示#

用户可以通过将其应用程序配置为 UuidRepresentation . 配置表示将在编码时修改PyMongo的行为 uuid.UUID 对象，并解码BSON中的二进制子类型3和4字段。

应用程序可以通过以下方式之一设置UUID表示：

在 MongoClient 使用 uuidRepresentation URI选项，例如：
```
client = MongoClient("mongodb://a:27107/?uuidRepresentation=standard")
```
有效值：

价值

UUID表示

unspecified

UNSPECIFIED

standard

STANDARD

pythonLegacy

PYTHON_LEGACY

javaLegacy

JAVA_LEGACY

csharpLegacy

CSHARP_LEGACY

价值	UUID表示
`unspecified`	UNSPECIFIED
`standard`	STANDARD
`pythonLegacy`	PYTHON_LEGACY
`javaLegacy`	JAVA_LEGACY
`csharpLegacy`	CSHARP_LEGACY

在 MongoClient 级别使用 uuidRepresentation Kwarg选项，例如：

from bson.binary import UuidRepresentation
client = MongoClient(uuidRepresentation=UuidRepresentation.STANDARD)

在 Database 或 Collection 通过提供适当的 CodecOptions 实例，例如：：

from bson.codec_options import CodecOptions
csharp_opts = CodecOptions(uuid_representation=UuidRepresentation.CSHARP_LEGACY)
java_opts = CodecOptions(uuid_representation=UuidRepresentation.JAVA_LEGACY)

# Get database/collection from client with csharpLegacy UUID representation
csharp_database = client.get_database('csharp_db', codec_options=csharp_opts)
csharp_collection = client.testdb.get_collection('csharp_coll', codec_options=csharp_opts)

# Get database/collection from existing database/collection with javaLegacy UUID representation
java_database = csharp_database.with_options(codec_options=java_opts)
java_collection = csharp_collection.with_options(codec_options=java_opts)

支持的UUID表示#

UUID表示	违约？	编码 `uuid.UUID` 到	译码 `Binary` 子类型4到	译码 `Binary` 子类型3到
STANDARD	不	`Binary` 子类型4	`uuid.UUID`	`Binary` 亚型3
UNSPECIFIED	是的，在PyMongo>=4	提高 `ValueError`	`Binary` 子类型4	`Binary` 亚型3
PYTHON_LEGACY	不	`Binary` 具有标准字节顺序的子类型3	`Binary` 子类型4	`uuid.UUID`
JAVA_LEGACY	不	`Binary` 具有Java传统字节顺序的子类型3	`Binary` 子类型4	`uuid.UUID`
CSHARP_LEGACY	不	`Binary` 具有C#旧字节顺序的子类型3	`Binary` 子类型4	`uuid.UUID`

现在，我们将详细描述每个受支持的UUID表示的行为和用例。

`UNSPECIFIED`#

注意

从PyMongo 4.0开始， UNSPECIFIED 是PyMongo使用的默认UUID表示形式。

这个 UNSPECIFIED 表示通过停止自动将BSON中的UUID字段转换为原生UUID类型来防止对UUID字节的错误解释。在使用此表示形式时对UUID进行解码将返回 Binary 对象，而不是。如果需要，用户可以强制解码 Binary 对象转换为本机UUID。 as_uuid() 方法，并指定适当的表示格式。以下示例显示了C#驱动程序存储的UUID可能是什么样子：

from bson.codec_options import CodecOptions, DEFAULT_CODEC_OPTIONS
from bson.binary import Binary, UuidRepresentation
from uuid import uuid4

# Using UuidRepresentation.CSHARP_LEGACY
csharp_opts = CodecOptions(uuid_representation=UuidRepresentation.CSHARP_LEGACY)

# Store a legacy C#-formatted UUID
input_uuid = uuid4()
collection = client.testdb.get_collection('test', codec_options=csharp_opts)
collection.insert_one({'_id': 'foo', 'uuid': input_uuid})

# Using UuidRepresentation.UNSPECIFIED
unspec_opts = CodecOptions(uuid_representation=UuidRepresentation.UNSPECIFIED)
unspec_collection = client.testdb.get_collection('test', codec_options=unspec_opts)

# UUID fields are decoded as Binary when UuidRepresentation.UNSPECIFIED is configured
document = unspec_collection.find_one({'_id': 'foo'})
decoded_field = document['uuid']
assert isinstance(decoded_field, Binary)

# Binary.as_uuid() can be used to coerce the decoded value to a native UUID
decoded_uuid = decoded_field.as_uuid(UuidRepresentation.CSHARP_LEGACY)
assert decoded_uuid == input_uuid

本地的 uuid.UUID 对象不能直接编码为 Binary 当UUID表示为 UNSPECIFIED 尝试这样做会导致异常：

unspec_collection.insert_one({'_id': 'bar', 'uuid': uuid4()})
Traceback (most recent call last):
...
ValueError: cannot encode native uuid.UUID with UuidRepresentation.UNSPECIFIED. UUIDs can be manually converted to bson.Binary instances using bson.Binary.from_uuid() or a different UuidRepresentation can be configured. See the documentation for UuidRepresentation for more information.

相反，应用程序使用 UNSPECIFIED 必须使用 from_uuid() 方法：

explicit_binary = Binary.from_uuid(uuid4(), UuidRepresentation.STANDARD)
unspec_collection.insert_one({'_id': 'bar', 'uuid': explicit_binary})

`STANDARD`#

注意

这种UUID表示应该由新的应用程序或第一次在MongoDB中编码和/或解码UUID的应用程序使用。

这个 STANDARD 在对所有驱动程序的uuid进行编码时，通过确保相同的字节顺序，实现了跨语言兼容性。由配置了此表示的驱动程序编写的uuid将被其他驱动程序正确处理，前提是它也配置了 STANDARD 表示。

STANDARD 编码本机 uuid.UUID 对象到 Binary 子类型4对象。

`PYTHON_LEGACY`#

注意

当读取由使用Python驱动程序的现有应用程序生成的uuid时，应该使用这个uuid表示不要显式设置UUID表示。

注意

PYTHON_LEGACY 是PyMongo 3中的默认UUID表示形式。

这个 PYTHON_LEGACY 表示与PyMongo使用的uuid的遗留表示相对应。此表示符合 RFC 4122 Section 4.1.2 .

以下示例说明了此表示法的用法：

from bson.codec_options import CodecOptions, DEFAULT_CODEC_OPTIONS
from bson.binary import Binary, UuidRepresentation

# No configured UUID representation
collection = client.python_legacy.get_collection('test', codec_options=DEFAULT_CODEC_OPTIONS)

# Using UuidRepresentation.PYTHON_LEGACY
pylegacy_opts = CodecOptions(uuid_representation=UuidRepresentation.PYTHON_LEGACY)
pylegacy_collection = client.python_legacy.get_collection('test', codec_options=pylegacy_opts)

# UUIDs written by PyMongo 3 with no UuidRepresentation configured
# (or PyMongo 4.0 with PYTHON_LEGACY) can be queried using PYTHON_LEGACY
uuid_1 = uuid4()
pylegacy_collection.insert_one({'uuid': uuid_1})
document = pylegacy_collection.find_one({'uuid': uuid_1})

PYTHON_LEGACY 编码本机 uuid.UUID 对象到 Binary 子类型3对象，保留与 bytes ：：

from bson.binary import Binary

document = collection.find_one({'uuid': Binary(uuid_2.bytes, subtype=3)})
assert document['uuid'] == uuid_2

`JAVA_LEGACY`#

注意

当读取遗留应用程序（即不使用 STANDARD 表示）使用Java驱动程序。

这个 JAVA_LEGACY 表示与mongodbjava驱动程序使用的uuid的遗留表示相对应。

备注

这个 JAVA_LEGACY 表示法颠倒字节0-7和字节8-15的顺序。

例如，请考虑中描述的相同UUID UUID数据的遗留处理。让我们假设一个应用程序使用Java驱动程序，但没有显式指定的UUID表示来插入示例UUID 00112233-4455-6677-8899-aabbccddeeff 变成了MongoDB。如果我们尝试使用以下命令读取此值 PYTHON_LEGACY ，我们最终得到了一个完全不同的UUID：：

UUID('77665544-3322-1100-ffee-ddccbbaa9988')

但是，如果我们显式地将表示设置为 JAVA_LEGACY ，我们得到了正确的结果：

UUID('00112233-4455-6677-8899-aabbccddeeff')

PyMongo使用指定的UUID表示重新排序BSON字节并正确加载它们。 JAVA_LEGACY 编码本机 uuid.UUID 对象到 Binary 子类型3对象，同时执行与旧Java驱动程序的UUID到BSON编码器相同的字节重新排序。

`CSHARP_LEGACY`#

注意

当读取遗留应用程序（即不使用 STANDARD 表示）使用C#驱动程序。

这个 CSHARP_LEGACY 表示与mongodbjava驱动程序使用的uuid的遗留表示相对应。

备注

这个 CSHARP_LEGACY 表示法颠倒字节0-3、字节4-5和字节6-7的顺序。

例如，请考虑中描述的相同UUID UUID数据的遗留处理。让我们假设应用程序使用C#驱动程序，但没有显式指定的UUID表示来插入示例UUID 00112233-4455-6677-8899-aabbccddeeff 变成了MongoDB。如果我们尝试使用PYTHON_LEGISTION来读取此值，则会得到一个完全不同的UUID：：

UUID('33221100-5544-7766-8899-aabbccddeeff')

但是，如果我们显式地将表示设置为 CSHARP_LEGACY ，我们得到了正确的结果：

UUID('00112233-4455-6677-8899-aabbccddeeff')

PyMongo使用指定的UUID表示重新排序BSON字节并正确加载它们。 CSHARP_LEGACY 编码本机 uuid.UUID 对象到 Binary 子类型3对象，同时执行与旧版C驱动程序的UUID到BSON编码器相同的字节重新排序。

处理UUID数据#

UUID数据的遗留处理#

场景1：应用程序共享MongoDB部署#

场景2：往返UUID#

配置UUID表示#

支持的UUID表示#

UNSPECIFIED#

STANDARD#

PYTHON_LEGACY#

JAVA_LEGACY#

CSHARP_LEGACY#

`UNSPECIFIED`#

`STANDARD`#

`PYTHON_LEGACY`#

`JAVA_LEGACY`#

`CSHARP_LEGACY`#