黑客

使用peewee收集黑客。有一个你想分享的酷黑客吗?正常开放 an issue on GitHubcontact me .

乐观锁定

乐观锁在通常使用 SELECT FOR UPDATE (或在SQLite中, 立即开始) . 例如,您可以从数据库中获取用户记录,进行一些修改,然后保存修改后的用户记录。通常,这个场景需要我们在事务期间锁定用户记录,从我们选择它的那一刻到我们保存更改的那一刻。

在乐观锁定中,另一方面,我们是这样做的 not 获取任何锁并依赖内部 version 正在修改的行中的列。在读取时,我们会看到行当前的版本,在保存时,我们会确保只有当版本与最初读取的版本相同时才进行更新。如果版本更高,那么其他一些进程必须悄悄地进入并更改行——为了保存修改后的版本,可能会导致丢失重要的更改。

在Peewee中实现乐观锁非常简单,这里有一个基类,您可以将其用作起点:

from peewee import *

class ConflictDetectedException(Exception): pass

class BaseVersionedModel(Model):
    version = IntegerField(default=1, index=True)

    def save_optimistic(self):
        if not self.id:
            # This is a new record, so the default logic is to perform an
            # INSERT. Ideally your model would also have a unique
            # constraint that made it impossible for two INSERTs to happen
            # at the same time.
            return self.save()

        # Update any data that has changed and bump the version counter.
        field_data = dict(self.__data__)
        current_version = field_data.pop('version', 1)
        self._populate_unsaved_relations(field_data)
        field_data = self._prune_fields(field_data, self.dirty_fields)
        if not field_data:
            raise ValueError('No changes have been made.')

        ModelClass = type(self)
        field_data['version'] = ModelClass.version + 1  # Atomic increment.

        query = ModelClass.update(**field_data).where(
            (ModelClass.version == current_version) &
            (ModelClass.id == self.id))
        if query.execute() == 0:
            # No rows were updated, indicating another process has saved
            # a new version. How you handle this situation is up to you,
            # but for simplicity I'm just raising an exception.
            raise ConflictDetectedException()
        else:
            # Increment local version to match what is now in the db.
            self.version += 1
            return True

下面是一个如何工作的例子。假设我们有以下模型定义。注意,用户名上有一个唯一的约束——这一点很重要,因为它提供了一种防止双重插入的方法。

class User(BaseVersionedModel):
    username = CharField(unique=True)
    favorite_animal = CharField()

例子:

>>> u = User(username='charlie', favorite_animal='cat')
>>> u.save_optimistic()
True

>>> u.version
1

>>> u.save_optimistic()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "x.py", line 18, in save_optimistic
    raise ValueError('No changes have been made.')
ValueError: No changes have been made.

>>> u.favorite_animal = 'kitten'
>>> u.save_optimistic()
True

# Simulate a separate thread coming in and updating the model.
>>> u2 = User.get(User.username == 'charlie')
>>> u2.favorite_animal = 'macaw'
>>> u2.save_optimistic()
True

# Now, attempt to change and re-save the original instance:
>>> u.favorite_animal = 'little parrot'
>>> u.save_optimistic()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "x.py", line 30, in save_optimistic
    raise ConflictDetectedException()
ConflictDetectedException: current version is out of sync

每组顶部对象

这些示例描述了查询每个组的单个顶级项的几种方法。要深入讨论各种技术,请查看我的博客文章 Querying the top item by group with Peewee ORM. If you are interested in the more general problem of querying the top N items, see the section below 每组前n个对象.

在这些示例中,我们将使用 UserTweet 模型来查找每个用户及其最新推文。

我在测试中发现的最有效的方法是 MAX() 聚合函数。

我们将在一个不相关的子查询中执行聚合,因此我们可以确信这个方法将被执行。我们的想法是选择文章,按作者分组,其时间戳等于该用户观察到的最大时间戳。

# When referencing a table multiple times, we'll call Model.alias() to create
# a secondary reference to the table.
TweetAlias = Tweet.alias()

# Create a subquery that will calculate the maximum Tweet created_date for each
# user.
subquery = (TweetAlias
            .select(
                TweetAlias.user,
                fn.MAX(TweetAlias.created_date).alias('max_ts'))
            .group_by(TweetAlias.user)
            .alias('tweet_max_subquery'))

# Query for tweets and join using the subquery to match the tweet's user
# and created_date.
query = (Tweet
         .select(Tweet, User)
         .join(User)
         .switch(Tweet)
         .join(subquery, on=(
             (Tweet.created_date == subquery.c.max_ts) &
             (Tweet.user == subquery.c.user_id))))

sqlite和mysql有点松散,允许按所选列的子集进行分组。这意味着我们可以去掉子查询并非常简洁地表达它:

query = (Tweet
         .select(Tweet, User)
         .join(User)
         .group_by(Tweet.user)
         .having(Tweet.created_date == fn.MAX(Tweet.created_date)))

每组前n个对象

这些示例描述了查询top的几种方法 N 每组物品的效率相当高。要深入讨论各种技术,请查看我的博客文章 Querying the top N objects per group with Peewee ORM .

在这些示例中,我们将使用 UserTweet 模型来查找每个用户及其最近的三条推文。

Postgres横向连接

Lateral joins 是一个整洁的Postgres特性,允许合理有效的相关子查询。它们通常被描述为SQL for each 循环。

所需的SQL是:

SELECT * FROM
  (SELECT t2.id, t2.username FROM user AS t2) AS uq
   LEFT JOIN LATERAL
  (SELECT t2.message, t2.created_date
   FROM tweet AS t2
   WHERE (t2.user_id = uq.id)
   ORDER BY t2.created_date DESC LIMIT 3)
  AS pq ON true

要用peewee来完成这一点,我们需要将侧向连接表示为 Clause 这给了我们比 join() 方法。

# We'll reference `Tweet` twice, so keep an alias handy.
TweetAlias = Tweet.alias()

# The "outer loop" will be iterating over the users whose
# tweets we are trying to find.
user_query = User.select(User.id, User.username).alias('uq')

# The inner loop will select tweets and is correlated to the
# outer loop via the WHERE clause. Note that we are using a
# LIMIT clause.
tweet_query = (TweetAlias
               .select(TweetAlias.message, TweetAlias.created_date)
               .where(TweetAlias.user == user_query.c.id)
               .order_by(TweetAlias.created_date.desc())
               .limit(3)
               .alias('pq'))

# Now we join the outer and inner queries using the LEFT LATERAL
# JOIN. The join predicate is *ON TRUE*, since we're effectively
# joining in the tweet subquery's WHERE clause.
join_clause = NodeList((
    user_query,
    SQL('LEFT JOIN LATERAL'),
    tweet_query,
    SQL('ON %s', [True])))

# Finally, we'll wrap these up and SELECT from the result.
query = (Tweet
         .select(user_query.c.username, tweet_query.c.message,
                 tweet_query.c.created_date)
         .from_(join_clause))

窗口功能

Window functions 这些都是 supported by peewee 提供可扩展、高效的性能。

所需的SQL是:

SELECT subq.message, subq.username
FROM (
    SELECT
        t2.message,
        t3.username,
        RANK() OVER (
            PARTITION BY t2.user_id
            ORDER BY t2.created_date DESC
        ) AS rnk
    FROM tweet AS t2
    INNER JOIN user AS t3 ON (t2.user_id = t3.id)
) AS subq
WHERE (subq.rnk <= 3)

为了用peewee实现这一点,我们将把排名好的tweet包装在执行过滤的外部查询中。

TweetAlias = Tweet.alias()

# The subquery will select the relevant data from the Tweet and
# User table, as well as ranking the tweets by user from newest
# to oldest.
subquery = (TweetAlias
            .select(
                TweetAlias.message,
                User.username,
                fn.RANK().over(
                    partition_by=[TweetAlias.user],
                    order_by=[TweetAlias.created_date.desc()]).alias('rnk'))
            .join(User, on=(TweetAlias.user == User.id))
            .alias('subq'))

# Since we can't filter on the rank, we are wrapping it in a query
# and performing the filtering in the outer query.
query = (Tweet
         .select(subquery.c.message, subquery.c.username)
         .from_(subquery)
         .where(subquery.c.rnk <= 3))

其他方法

如果你不使用Postgres,那么不幸的是,你只能选择表现不理想的选项。有关常见方法的更完整概述,请查看 this blog post . 下面我将总结这些方法和相应的SQL。

使用 COUNT ,我们可以在存在少于 N 更新时间戳的tweets:

TweetAlias = Tweet.alias()

# Create a correlated subquery that calculates the number of
# tweets with a higher (newer) timestamp than the tweet we're
# looking at in the outer query.
subquery = (TweetAlias
            .select(fn.COUNT(TweetAlias.id))
            .where(
                (TweetAlias.created_date >= Tweet.created_date) &
                (TweetAlias.user == Tweet.user)))

# Wrap the subquery and filter on the count.
query = (Tweet
         .select(Tweet, User)
         .join(User)
         .where(subquery <= 3))

我们可以通过在 HAVING 条款:

TweetAlias = Tweet.alias()

# Use a self-join and join predicates to count the number of
# newer tweets.
query = (Tweet
         .select(Tweet.id, Tweet.message, Tweet.user, User.username)
         .join(User)
         .switch(Tweet)
         .join(TweetAlias, on=(
             (TweetAlias.user == Tweet.user) &
             (TweetAlias.created_date >= Tweet.created_date)))
         .group_by(Tweet.id, Tweet.content, Tweet.user, User.username)
         .having(fn.COUNT(Tweet.id) <= 3))

最后一个示例使用 LIMIT 相关子查询中的子句。

TweetAlias = Tweet.alias()

# The subquery here will calculate, for the user who created the
# tweet in the outer loop, the three newest tweets. The expression
# will evaluate to `True` if the outer-loop tweet is in the set of
# tweets represented by the inner query.
query = (Tweet
         .select(Tweet, User)
         .join(User)
         .where(Tweet.id << (
             TweetAlias
             .select(TweetAlias.id)
             .where(TweetAlias.user == Tweet.user)
             .order_by(TweetAlias.created_date.desc())
             .limit(3))))

使用sqlite编写自定义函数

使用用python编写的自定义函数扩展sqlite非常容易,这些自定义函数可以从SQL语句中调用。通过使用 SqliteExtDatabase 以及 func() decorator,您可以很容易地定义自己的函数。

下面是一个示例函数,它生成用户提供的密码的哈希版本。我们也可以用它来实现 login 用于匹配用户和密码的功能。

from hashlib import sha1
from random import random
from playhouse.sqlite_ext import SqliteExtDatabase

db = SqliteExtDatabase('my-blog.db')

def get_hexdigest(salt, raw_password):
    data = salt + raw_password
    return sha1(data.encode('utf8')).hexdigest()

@db.func()
def make_password(raw_password):
    salt = get_hexdigest(str(random()), str(random()))[:5]
    hsh = get_hexdigest(salt, raw_password)
    return '%s$%s' % (salt, hsh)

@db.func()
def check_password(raw_password, enc_password):
    salt, hsh = enc_password.split('$', 1)
    return hsh == get_hexdigest(salt, raw_password)

以下是如何使用该函数添加新用户,并存储哈希密码:

query = User.insert(
    username='charlie',
    password=fn.make_password('testing')).execute()

如果我们从数据库中检索用户,则存储的密码将被散列和加盐:

>>> user = User.get(User.username == 'charlie')
>>> print user.password
b76fa$88be1adcde66a1ac16054bc17c8a297523170949

实施 login -类型功能,您可以编写如下内容:

def login(username, password):
    try:
        return (User
                .select()
                .where(
                    (User.username == username) &
                    (fn.check_password(password, User.password) == True))
                .get())
    except User.DoesNotExist:
        # Incorrect username and/or password.
        return False

日期数学

Peewee支持的每个数据库都为日期/时间算法实现了自己的一组函数和语义。

本节将提供一个简短的场景和示例代码,演示如何利用Peewee在SQL中执行动态日期操作。

场景:我们需要每 X 秒,任务间隔和任务本身都在数据库中定义。我们需要编写一些代码来告诉我们在给定时间应该运行哪些任务:

class Schedule(Model):
    interval = IntegerField()  # Run this schedule every X seconds.


class Task(Model):
    schedule = ForeignKeyField(Schedule, backref='tasks')
    command = TextField()  # Run this command.
    last_run = DateTimeField()  # When was this run last?

我们的逻辑基本上可以归结为:

.. code-block:: python

#例如,如果任务最后一次运行是在12:00:05,并且相关的间隔为10秒,则下一次出现应该是12:00:15。因此,我们检查当前时间(现在)是12:00:15还是更晚。现在>=task.last_run+schedule.interval

因此我们可以编写以下代码:

next_occurrence = something  # ??? how do we define this ???

# We can express the current time as a Python datetime value, or we could
# alternatively use the appropriate SQL function/name.
now = Value(datetime.datetime.now())  # Or SQL('current_timestamp'), e.g.

query = (Task
         .select(Task, Schedule)
         .join(Schedule)
         .where(now >= next_occurrence))

对于PostgreSQL,我们将多个静态的1秒间隔来动态计算偏移量:

second = SQL("INTERVAL '1 second'")
next_occurrence = Task.last_run + (Schedule.interval * second)

对于MySQL,我们可以直接引用调度的间隔:

from peewee import NodeList  # Needed to construct sql entity.

interval = NodeList((SQL('INTERVAL'), Schedule.interval, SQL('SECOND')))
next_occurrence = fn.date_add(Task.last_run, interval)

对于sqlite,事情有些棘手,因为sqlite没有专用的日期时间类型。因此,对于sqlite,我们将转换为一个unix时间戳,添加调度秒数,然后转换回一个可比较的日期时间表示:

next_ts = fn.strftime('%s', Task.last_run) + Schedule.interval
next_occurrence = fn.datetime(next_ts, 'unixepoch')