RFC 54:数据集事务
作者:连鲁奥
联系人:spatialys.com上的even dot rouault
状态:采用,在GDAL 2.0中实现
总结
这个RFC引入了一个API来提供数据集级别的事务机制,并在PostgreSQL、SQLite和GPKG驱动程序中使用它。它还显著地重写了PostgreSQL驱动程序中事务的处理方式。它还引入了一种通用机制来实现对本机不支持它的数据源事务的模拟,并在FileGDB驱动程序中使用它。
理论基础
当前的抽象在层级别提供了一个事务API。然而,这通常是误导性的,因为当它用BEGIN/COMMIT/ROLLBACK sql语句(PostgreSQL、SQLite、GPKG、PGDump、mssqlspace)在DBMS中实现时,语义实际上是一个跨所有层/表的数据库级事务。因此,即使在层上调用StartTransaction(),它也会扩展到在其他层上所做的更改。在极少数驱动程序中,StartTransaction()/CommitTransaction()有时用作执行大容量插入的机制。这就是WFS、CartoDB、GFT、GME的例子。对其中一些人来说,它可能更倾向于在数据集级别,因为潜在的多个层修改可能会堆叠在一起。
进一步的模式一些用例需要一致地更新几个层,因此需要真正的数据库级事务抽象。
各种驱动程序的当前情况如下(分析得出的以下一些观察结果主要是为了需要在驱动程序中工作的开发人员的利益而保留的):
《PostgreSQL》
关于用于运行GetNextFeature()请求的游标的一些事实:
在不受内存限制的情况下,检索大量数据需要游标。
游标需要运行事务。
默认游标(不带HOLD)不能在创建tem的事务之外使用
当事务(是的,事务,而不是游标)仍处于活动状态时,不能修改表的结构,如果在另一个连接上执行此操作,则它将挂起,直到另一个连接提交或回滚)
在事务中,删除/修改的行只有在声明游标之前完成时才可见。
带有HOLD:的游标可以在事务外部使用,但会导致表的副本被复制-->不利于性能
目前的缺陷有:
由于为读取层A而创建的底层隐式事务在读取层B开始时关闭,因此无法进行交错层读取(除了获取的前500个功能外,使用OGR_PG_CURSOR_PAGE=1可以很容易地看到)。
GetFeature()刷新当前事务并启动一个新事务来执行光标选择。这是不必要的,因为我们只检索一条记录
SetAttributeFilter()发出一个ResetReading()命令,该命令当前刷新正在进行的事务。在需要事务性保证的长更新场景中可能很烦人
什么有效:
转发到数据源的层级别的事务支持。
交错书写作品(即使使用复制模式)
SQLite/GPKG
用于读取表内容的机制(sqlite3_prepare()/sqlite3_step())不需要事务。
如果在准备好的语句之后但在第一步之前运行,步骤将看到结构修改(例如列添加)。
步骤一发生行修改/添加,就会立即看到它们。
转发到数据源的层级别的事务支持。
MySQL
由于使用了mysql_use_result(),一次只能处理一个请求,因此无法执行交错层读取(在一个层中读取会重置另一个读取)。mysql_store_result()可能是一个解决方案,但需要将整个结果集摄取到内存中,这对于大型层来说是切实可行的。
步骤不会在查询启动后设置行更改(如果通过另一个连接完成,因为如果通过ExecuteSQL()完成,则长事务将被中断)
无事务支持
OCI
交错层阅读作品
选择后所做的更改似乎不可见。
无事务支持
文件数据库
交错层阅读作品
选择后所做的更改似乎不可见。
无事务支持
提议的变更
GDALDataset更改
将以下方法添加到GDALDataset(并可由继承自GDALDataset的OGRDataSource使用)。
/************************************************************************/
/* StartTransaction() */
/************************************************************************/
/**
\brief For datasources which support transactions, StartTransaction creates a transaction.
If starting the transaction fails, will return
OGRERR_FAILURE. Datasources which do not support transactions will
always return OGRERR_UNSUPPORTED_OPERATION.
Nested transactions are not supported.
All changes done after the start of the transaction are definitely applied in the
datasource if CommitTransaction() is called. They may be canceled by calling
RollbackTransaction() instead.
At the time of writing, transactions only apply on vector layers.
Datasets that support transactions will advertise the ODsCTransactions capability.
Use of transactions at dataset level is generally preferred to transactions at
layer level, whose scope is rarely limited to the layer from which it was started.
In case StartTransaction() fails, neither CommitTransaction() or RollbackTransaction()
should be called.
If an error occurs after a successful StartTransaction(), the whole
transaction may or may not be implicitly canceled, depending on drivers. (e.g.
the PG driver will cancel it, SQLite/GPKG not). In any case, in the event of an
error, an explicit call to RollbackTransaction() should be done to keep things balanced.
By default, when bForce is set to FALSE, only "efficient" transactions will be
attempted. Some drivers may offer an emulation of transactions, but sometimes
with significant overhead, in which case the user must explicitly allow for such
an emulation by setting bForce to TRUE. Drivers that offer emulated transactions
should advertise the ODsCEmulatedTransactions capability (and not ODsCTransactions).
This function is the same as the C function GDALDatasetStartTransaction().
@param bForce can be set to TRUE if an emulation, possibly slow, of a transaction
mechanism is acceptable.
@return OGRERR_NONE on success.
@since GDAL 2.0
*/
OGRErr GDALDataset::StartTransaction(CPL_UNUSED int bForce);
/************************************************************************/
/* CommitTransaction() */
/************************************************************************/
/**
\brief For datasources which support transactions, CommitTransaction commits a transaction.
If no transaction is active, or the commit fails, will return
OGRERR_FAILURE. Datasources which do not support transactions will
always return OGRERR_UNSUPPORTED_OPERATION.
Depending on drivers, this may or may not abort layer sequential readings that
are active.
This function is the same as the C function GDALDatasetCommitTransaction().
@return OGRERR_NONE on success.
@since GDAL 2.0
*/
OGRErr GDALDataset::CommitTransaction();
/************************************************************************/
/* RollbackTransaction() */
/************************************************************************/
/**
\brief For datasources which support transactions, RollbackTransaction will roll
back a datasource to its state before the start of the current transaction.
If no transaction is active, or the rollback fails, will return
OGRERR_FAILURE. Datasources which do not support transactions will
always return OGRERR_UNSUPPORTED_OPERATION.
This function is the same as the C function GDALDatasetRollbackTransaction().
@return OGRERR_NONE on success.
@since GDAL 2.0
*/
OGRErr GDALDataset::RollbackTransaction();
注意:在GDALDataset类本身中,这些方法有一个空的实现,该实现返回OGRERR_UNSUPPORTED_操作。
这3种方法在C级别映射为:
OGRErr CPL_DLL GDALDatasetStartTransaction(GDALDatasetH hDS, int bForce);
OGRErr CPL_DLL GDALDatasetCommitTransaction(GDALDatasetH hDS);
OGRErr CPL_DLL GDALDatasetRollbackTransaction(GDALDatasetH hDS);
添加了两个新闻数据集功能:
ODsCTransactions:如果此数据源支持(高效)事务,则为True。
ODsCEmulatedTransactions:如果此数据源通过模拟支持事务,则为True。
模拟事务
添加了一个新函数OGRCreateEmulatedTransactionDataSourceWrapper(),供本机不支持事务但希望模拟事务的驱动程序使用。它可能被文件/目录支持其数据的任何数据源采用。
/** Returns a new datasource object that adds transactional behavior to an existing datasource.
*
* The provided poTransactionBehaviour object should implement driver-specific
* behavior for transactions.
*
* The generic mechanisms offered by the wrapper class do not cover concurrent
* updates (though different datasource connections) to the same datasource files.
*
* There are restrictions on what can be accomplished. For example it is not
* allowed to have a unreleased layer returned by ExecuteSQL() before calling
* StartTransaction(), CommitTransaction() or RollbackTransaction().
*
* Layer structural changes are not allowed after StartTransaction() if the
* layer definition object has been returned previously with GetLayerDefn().
*
* @param poBaseDataSource the datasource to which to add transactional behavior.
* @param poTransactionBehaviour an implementation of the IOGRTransactionBehaviour interface.
* @param bTakeOwnershipDataSource whether the returned object should own the
* passed poBaseDataSource (and thus destroy it
* when it is destroyed itself).
* @param bTakeOwnershipTransactionBehavior whether the returned object should own
* the passed poTransactionBehaviour
* (and thus destroy it when
* it is destroyed itself).
* @return a new datasource handle
* @since GDAL 2.0
*/
OGRDataSource CPL_DLL* OGRCreateEmulatedTransactionDataSourceWrapper(
OGRDataSource* poBaseDataSource,
IOGRTransactionBehaviour* poTransactionBehaviour,
int bTakeOwnershipDataSource,
int bTakeOwnershipTransactionBehavior);
iogrtransactionbehavior接口的定义如下:
/** IOGRTransactionBehaviour is an interface that a driver must implement
* to provide emulation of transactions.
*
* @since GDAL 2.0
*/
class CPL_DLL IOGRTransactionBehaviour
{
public:
/** Start a transaction.
*
* The implementation may update the poDSInOut reference by closing
* and reopening the datasource (or assigning it to NULL in case of error).
* In which case bOutHasReopenedDS must be set to TRUE.
*
* The implementation can for example backup the existing files/directories
* that compose the current datasource.
*
* @param poDSInOut datasource handle that may be modified
* @param bOutHasReopenedDS output boolean to indicate if datasource has been closed
* @return OGRERR_NONE in case of success
*/
virtual OGRErr StartTransaction(OGRDataSource*& poDSInOut,
int& bOutHasReopenedDS) = 0;
/** Commit a transaction.
*
* The implementation may update the poDSInOut reference by closing
* and reopening the datasource (or assigning it to NULL in case of error).
* In which case bOutHasReopenedDS must be set to TRUE.
*
* The implementation can for example remove the backup it may have done
* at StartTransaction() time.
*
* @param poDSInOut datasource handle that may be modified
* @param bOutHasReopenedDS output boolean to indicate if datasource has been closed
* @return OGRERR_NONE in case of success
*/
virtual OGRErr CommitTransaction(OGRDataSource*& poDSInOut,
int& bOutHasReopenedDS) = 0;
/** Rollback a transaction.
*
* The implementation may update the poDSInOut reference by closing
* and reopening the datasource (or assigning it to NULL in case of error).
* In which case bOutHasReopenedDS must be set to TRUE.
*
* The implementation can for example restore the backup it may have done
* at StartTransaction() time.
*
* @param poDSInOut datasource handle that may be modified
* @param bOutHasReopenedDS output boolean to indicate if datasource has been closed
* @return OGRERR_NONE in case of success
*/
virtual OGRErr RollbackTransaction(OGRDataSource*& poDSInOut,
int& bOutHasReopenedDS) = 0;
};
OPGRLayer更改
在OGRLayer级别,GetNextFeature()的文档接收以下附加信息以澄清其语义:
Features returned by GetNextFeature() may or may not be affected by concurrent
modifications depending on drivers. A guaranteed way of seeing modifications in
effect is to call ResetReading() on layers where GetNextFeature() has been called,
before reading again. Structural changes in layers (field addition, deletion, ...)
when a read is in progress may or may not be possible depending on drivers.
If a transaction is committed/aborted, the current sequential reading may or may
not be valid after that operation and a call to ResetReading() might be needed.
PG驱动程序更改
数据集级别的事务已经实现,隐式创建的事务的使用也被重写。
交错层读取现在是可能的。
GetFeature()已被修改为在没有游标或事务的情况下运行,对事务的所有其他调用都已被选中/修改为不会意外重置用户启动的事务。
以下是更新后的drvu pg中描述的新行为_高级.html帮助页:
Efficient sequential reading in PostgreSQL requires to be done within a transaction
(technically this is a CURSOR WITHOUT HOLD).
So the PG driver will implicitly open such a transaction if none is currently
opened as soon as a feature is retrieved. This transaction will be released if
ResetReading() is called (provided that no other layer is still being read).
If within such an implicit transaction, an explicit dataset level StartTransaction()
is issued, the PG driver will use a SAVEPOINT to emulate properly the transaction
behavior while making the active cursor on the read layer still opened.
If an explicit transaction is opened with dataset level StartTransaction()
before reading a layer, this transaction will be used for the cursor that iterates
over the layer. When explicitly committing or rolling back the transaction, the
cursor will become invalid, and ResetReading() should be issued again to restart
reading from the beginning.
As calling SetAttributeFilter() or SetSpatialFilter() implies an implicit
ResetReading(), they have the same effect as ResetReading(). That is to say,
while an implicit transaction is in progress, the transaction will be committed
(if no other layer is being read), and a new one will be started again at the next
GetNextFeature() call. On the contrary, if they are called within an explicit
transaction, the transaction is maintained.
With the above rules, the below examples show the SQL instructions that are
run when using the OGR API in different scenarios.
lyr1->GetNextFeature() BEGIN (implicit)
DECLARE cur1 CURSOR FOR SELECT * FROM lyr1
FETCH 1 IN cur1
lyr1->SetAttributeFilter('xxx')
--> lyr1->ResetReading() CLOSE cur1
COMMIT (implicit)
lyr1->GetNextFeature() BEGIN (implicit)
DECLARE cur1 CURSOR FOR SELECT * FROM lyr1 WHERE xxx
FETCH 1 IN cur1
lyr2->GetNextFeature() DECLARE cur2 CURSOR FOR SELECT * FROM lyr2
FETCH 1 IN cur2
lyr1->GetNextFeature() FETCH 1 IN cur1
lyr2->GetNextFeature() FETCH 1 IN cur2
lyr1->CreateFeature(f) INSERT INTO cur1 ...
lyr1->SetAttributeFilter('xxx')
--> lyr1->ResetReading() CLOSE cur1
COMMIT (implicit)
lyr1->GetNextFeature() DECLARE cur1 CURSOR FOR SELECT * FROM lyr1 WHERE xxx
FETCH 1 IN cur1
lyr1->ResetReading() CLOSE cur1
lyr2->ResetReading() CLOSE cur2
COMMIT (implicit)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
ds->StartTransaction() BEGIN
lyr1->GetNextFeature() DECLARE cur1 CURSOR FOR SELECT * FROM lyr1
FETCH 1 IN cur1
lyr2->GetNextFeature() DECLARE cur2 CURSOR FOR SELECT * FROM lyr2
FETCH 1 IN cur2
lyr1->CreateFeature(f) INSERT INTO cur1 ...
lyr1->SetAttributeFilter('xxx')
--> lyr1->ResetReading() CLOSE cur1
COMMIT (implicit)
lyr1->GetNextFeature() DECLARE cur1 CURSOR FOR SELECT * FROM lyr1 WHERE xxx
FETCH 1 IN cur1
lyr1->ResetReading() CLOSE cur1
lyr2->ResetReading() CLOSE cur2
ds->CommitTransaction() COMMIT
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
ds->StartTransaction() BEGIN
lyr1->GetNextFeature() DECLARE cur1 CURSOR FOR SELECT * FROM lyr1
FETCH 1 IN cur1
lyr1->CreateFeature(f) INSERT INTO cur1 ...
ds->CommitTransaction() CLOSE cur1 (implicit)
COMMIT
lyr1->GetNextFeature() FETCH 1 IN cur1 ==> Error since the cursor was closed with the commit. Explicit ResetReading() required before
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
lyr1->GetNextFeature() BEGIN (implicit)
DECLARE cur1 CURSOR FOR SELECT * FROM lyr1
FETCH 1 IN cur1
ds->StartTransaction() SAVEPOINT savepoint
lyr1->CreateFeature(f) INSERT INTO cur1 ...
ds->CommitTransaction() RELEASE SAVEPOINT savepoint
lyr1->ResetReading() CLOSE cur1
COMMIT (implicit)
Note: in reality, the PG drivers fetches 500 features at once. The FETCH 1
is for clarity of the explanation.
建议在显式事务中执行操作,以方便使用(修复ogr_pg.py时遇到一些问题,但这确实是一些很奇怪的事情,例如重新打开连接,它在“隐式”事务中运行得不太好)
GPKG和SQLite驱动程序更改
已实现数据集级事务。这里和那里做了一些修复,以避免意外重置由用户启动的事务。
文件gdb驱动程序更改
FileGDB驱动程序使用上述仿真来提供事务机制。这是通过在调用StartTransaction(force=TRUE)时备份地理数据库的当前状态来实现的。如果事务已提交,则备份副本将被销毁。如果事务被回滚,则备份副本将被还原。因此,在大型地理数据库上进行操作时,这可能代价高昂。请注意,在并发更新的情况下(在同一个或另一个进程中具有不同的连接),此仿真具有未指定的行为。
SWIG绑定(Python/Java/C#/Perl)更改
已完成以下添加:
Dataset.StartTransaction(int force=FALSE)
Dataset.CommitTransaction()
Dataset.RollbackTransaction()
ogr.ODsCTransactions常量
ogr.ODsCEmulatedTransactions常量
公用事业
如果ODsCTransactions被广告,ogr2ogr现在使用数据集事务(而不是层事务)。
文档
新的/修改过的API被记录在案。更新了MIGRATION_GUIDE.TXT,并提到以下兼容性问题。
测试套件
测试套件被扩展到
更新的驱动程序:PG、GPKG、SQLite、FileGDB
ogr2ogr对数据库事务的使用
兼容性问题
如上所述,通过PG驱动程序可以观察到细微的行为变化,这些变化与以前刷新过的隐式事务有关,现在不再是了,但是这应该被希望限制在非典型用例中。因此,一些以前“有效”的案例可能不再有效,但新的行为应该更容易理解。
PG和SQLite驱动程序可以接受对StartTransaction()的嵌套调用(在层级别)。这已经不可能了,因为它们现在被重定向到数据集事务,而这些事务显式地不支持数据集事务。
超出范围
以下实现BEGIN/COMMIT/ROLLBACK的驱动程序可以在以后得到增强,以支持数据集事务:OCI、MySQL、mssqlspace。
GFT、CartoDB、WFS也可以为数据集事务带来好处。
VRT当前支持层事务(如果底层数据集支持它,则不包括联合层)。如果要实现数据集事务,是否包括将数据集事务转发到源数据集?如果同一个数据集被多个源使用,实现可能会很复杂,但更根本的是,不能保证在多个数据集上使用ACID。
实施
实施将由甚至鲁奥完成 (Spatialys _),并由 LINZ (Land Information New Zealand) .
建议的实现位于 https://github.com/rouault/gdal2/tree/rfc54_dataset_transactions 储存库。
更改列表: https://github.com/rouault/gdal2/compare/rfc54_dataset_transactions
投票历史
+1名来自JukkaR,HowardB和Ever