使用Fiona

读取矢量数据

读取GIS矢量文件时，可用 Fiona的 open 函数，再用 'r' 参数打开。返回类型为 :py:class:~fiona.collection.Collection。

In [25]:

import fiona
c = fiona.open('/gdata/prov_capital.shp', 'r')
c.closed

Out[25]:

False

其中 'r' 为缺省参数。

Fiona的 Collection 类似于Python的 file，但它返回的是迭代器，而不是文本行。

In [26]:

next(c)

Out[26]:

{'geometry': {'coordinates': (87.57610577000008, 43.78176563200003),
  'type': 'Point'},
 'id': '0',
 'properties': OrderedDict([('name', '乌鲁木齐'),
              ('lat', 43.7818),
              ('lon', 87.5761)]),
 'type': 'Feature'}

In [21]:

len(list(c))

Out[21]:

注：list 接口覆盖的是整个 collection,就像操作 Python文件对象一样，可有效地清除它。对于遍历过的 collection 对象，不支持查找前面的文件，你必须重新打开集合，才可以返回初始部分。

In [27]:

c = fiona.open('/gdata/prov_capital.shp')
len(list(c))

Out[27]:

注解：文件编码

格式驱动会尝试去检测数据的编码，但有可能会失败。例如GDAL 1.7.2就检测不到地球自然数据编码Windows-1252。在这种情况下，就会指定用:py:func:fiona.open: encoding='Windows-1252'函数的encoding 关键字参数来指定相应编码。版本为0.9.1。

索引集

版本为1.1.6，也可通过指数进入特性集。

In [28]:

from pprint import pprint
with fiona.open('/gdata/prov_capital.shp') as src:
    pprint(src[1])

{'geometry': {'coordinates': (91.16312837700008, 29.710352750000027),
              'type': 'Point'},
 'id': '1',
 'properties': OrderedDict([('name', '拉萨'),
                            ('lat', 29.7104),
                            ('lon', 91.1631)]),
 'type': 'Feature'}

关闭文件

:py:class:~fiona.collection.Collection 包含外部资源。除非用 :py:keyword:with statement明确 :py:meth:~fiona.collection.Collection.close 对象，否则不能保证资源会被释放。当:py:class:~fiona.collection.Collection 处于上下文管理器时，无论发生什么（是否有异常发生），都会被关闭。

In [29]:

try:
    with fiona.open('/gdata/prov_capital.shp') as c:
        print(len(list(c)))
        # assert True is False
except:
    print(c.closed)
    raise

有一项特殊，当:py:keyword:with模块raise时，你可以查看到输出表:keyword:except、:py:meth:c.__exit__ (从而导致:py:meth:c.close) 。

重点强调: 经常称作为:py:meth:~fiona.collection.Collection.close或使用 :keyword:with，而且你不会碰到资源外包与锁定文件等情况。

格式的驱动，CRS，界限和图式

数据的驱动

除了类似:py:class:file(:py:attr:~file.name, :py:attr:~file.mode, :py:attr:~file.closed)属性外， :py:class:~fiona.collection.Collection 有一个只读的:py:attr:~fiona.collection.Collection.driver属性，为:program:OGR :dfn:format driver 的属性，用于打开矢量文件。

In [30]:

import fiona
c = fiona.open('/gdata/prov_capital.shp')
c.driver

Out[30]:

'ESRI Shapefile'

数据的投影参数及转换方法

矢量数据集的coordinate reference system (CRS) 可通过只读的 :py:attr:~fiona.collection.Collection.crs 属性来访问。由于数据集的参数原因，可能会出现不同的结果。

In [ ]:

c.crs

{'no_defs': True, 'ellps': 'WGS84', 'datum': 'WGS84', 'proj': 'longlat'}

In [ ]:

c.crs

CRS返回的结果，是对 :program:PROJ.4 参数的映射。

:py:mod:fiona.crs 模块共有3个函数，以协助完成这些映射。 :py:func:~fiona.crs.to_string 函数是将映射转换到PROJ.4字符串中：

In [ ]:

from fiona.crs import to_string
print(to_string(c.crs))

:py:func:~fiona.crs.from_string 为不可逆。

In [ ]:

from fiona.crs import from_string
from_string("+datum=WGS84 +ellps=WGS84 +no_defs +proj=longlat")

:py:func:~fiona.crs.from_epsg 是EPSG代码CRS映射的一个捷径。

In [ ]:

from fiona.crs import from_epsg
from_epsg(3857)

数据集中要素的数目、范围

可通过Python的 :py:func:len 函数获取所有的收集记录。

In [ ]:

len(c)

收集记录的minimum bounding rectangle (MBR) 或 bounds 可通过只读的~fiona.collection.Collection.bounds 属性来获取。

In [ ]:

c.bounds

数据图表（Schema）

最后，记录类型的Schema（矢量文件是单一类型记录）可通过只读的 :py:attr:~fiona.collection.Collection.schema 属性访问。它有“几何”和“属性”参数。前者是一个字符串，后者是一个有序的库，具有相同命令的参数。

In [31]:

from pprint import pprint
import fiona
c = fiona.open('/gdata/prov_capital.shp')
pprint(c.schema)

{'geometry': 'Point',
 'properties': OrderedDict([('name', 'str:100'),
                            ('lat', 'float:13.11'),
                            ('lon', 'float:13.11')])}

保持架构简单

Fiona可减少更多的记录类型和模式。记录由数据组成。模型记录的是'id'关键词，模型映射与映射集的关键词一致。

In [ ]:

rec = next(c)
set(rec.keys()) - set(c.schema.keys())

In [ ]:

set(rec['properties'].keys()) == set(c.schema['properties'].keys())

模式映射值也可以是字段类型，如 'Polygon', 'float', 'str'。相应的Python类型可以在fiona.FIELD_TYPES_MAP 库中查到。

In [ ]:

pprint(fiona.FIELD_TYPES_MAP)

字段类型

简而言之，他们的命名、类型与Python（或JavaScript）相近。'str'与 'unicode'混合功能只在Python3.0版本以下才会有。 Fiona记录的是Unicode字符串，其字段类型均命令为 'str'。

In [ ]:

type(rec['properties']['CNTRY_NAME'])

In [ ]:

c.schema['properties']['CNTRY_NAME']

字符串字段可限制最大宽度。'str:25'设置的就是不可以超过25个字符。如果这个值用于打开文件，该值的属性就是将在25字符处截断。默认宽度为80个字符，这意味着 'str '和 'str:80' 属同一意思。

Fiona还有一个函数：可获取字符串属性宽度。

In [ ]:

from fiona import prop_width
prop_width('str:25')

In [ ]:

prop_width('str')

另一个函数是可获取Python属性类型。

In [ ]:

from fiona import prop_type
prop_type('int')

In [ ]:

prop_type('float')

In [ ]:

prop_type('str:25')

以上的例子针对于Python 3。Python 2'str'的性能是 'unicode'。

几何类型

Fiona 支持 GeoJSON 和三维变异的几何类型,几何元素的架构值如下：

点
线
多边形
点集合
线集合
面集合
混合数据类型
三维点
三维线
三维面
三维点集合
三维线集合
三维面集合
三维混合数据类型

后七个3D类型，只适用于集合模式。几何特征类型基本对应的是七个中的第一个。例如， '三维点 '集，总是与几何式的点特性对应。这些几何坐标就是（x，y，Z）元组。

注意，一个最常见的矢量数据格式，ESRI Shapefile，是没有'线集合'或'面集合'的图式结构。因此，一个shapefile '面' 可以是'面'，也可以是'面集合'。

记录

Python的 :py:class:dict 结构化与GeoJSON特征相似。Fiona可以自描述，其字段的命名包含在数据结构和字段中。数值字段的值类型就是:py:class:int 和:py:class:float，不是字符串格式。

In [32]:

import fiona
from pprint import pprint
c = fiona.open('/gdata/prov_capital.shp')
rec = c.next()
pprint(rec)

{'geometry': {'coordinates': (87.57610577000008, 43.78176563200003),
              'type': 'Point'},
 'id': '0',
 'properties': OrderedDict([('name', '乌鲁木齐'),
                            ('lat', 43.7818),
                            ('lon', 87.5761)]),
 'type': 'Feature'}

此条数据记录与本源或其他外部资源的 :py:class:~fiona.collection 无关。它是完全独立的，使用任何方式都很安全。关闭集并不影响数据记录。

In [ ]:

c.close()
rec['id']

记录ID

每一条记录都有 id号。根据GeoJSON规范，在数据文件中每一字符串都有相应且唯一的 id 值。

In [33]:

c = fiona.open('/gdata/prov_capital.shp')
rec = next(c)
rec['id']

Out[33]:

'0'

在 :program:OGR 模型中，ID号是长整数。因此在记录整数索引时，通常以字符串为表示形式。

记录属性

每条记录都有其 属性，其对应值就是一个映射，任一有序的库其映射值都特别精确。映射属性与同源属性集的模式相同（见上文）。

In [ ]:

pprint(rec['properties'])

几何记录

每条记录都有几何属性，其对应值是类型与坐标映射。

In [ ]:

pprint(rec['geometry'])

类型	坐标
点	单一（x，y）元组
线	（x，y）元组顶点列表
多边形	环列表[每个（x，y）元组列表]
点集合	点列表[每个（x，y）元组列表]
线集合	线列表[每个（x，y）元组列表]
面集合	多边形列表

Fiona，类似于GeoJSON格式，既有北半球的“北方”，又有笛卡尔的“X-Y”偏角。上文说的元组值 (x, y)，要么是（本初子午线的经度E、纬度N），要么是其他投影坐标系统（东，北）。

真正顺序是经-纬,而不是纬-经，尽管我们大多都在说 "纬度，经度" ，Fiona x,y 基本都是东向和北向，也就是 (经, 纬)。先说经度，后说纬度，与GeoJSON规范保持一致。

点集理论和简易特性

在一个适当且干净的矢量数据文件中，几何映射是指几何对象由 :dfn:point sets 组成，如下所示。

sourcecode:: python

In [ ]:

from shapely.geometry import shape
l1 = shape({'type': 'LineString', 'coordinates': [(0, 0), (2, 2)]})
l2 = shape({'type': 'LineString', 'coordinates': [(0, 0), (1, 1), (2, 2)]})
l1 == l2

In [ ]:

l1.equals(l2)

注解：Dirty数据，某些文件可能会包含矢量数据 :dfn:invalid （在生成结果的质量控制不足时）或intention（“dirty”矢量保存到一个特殊的文件）。Fiona不会清除dirty数据，所以你应该确保你所得到的是否为纯数据。

矢量数据写法

一个矢量文件可以通过模式 'a' (append) 打开，也可以通过模式 'w' （write）写入。

 admonition:: Note

OGR 的 "update" 模式没有独立格式，因此，Fiona不支持。

添加数据到现有文件

In [34]:

import fiona
with fiona.open('/gdata/prov_capital.shp') as c:
    rec = next(c)
rec['id'] = '-1'
rec['properties']['CNTRY_NAME'] = 'Gondor'

In [35]:

from fiona import os
os.system("cp /gdata/prov_capital.* /tmp")

Out[35]:

注意，在文件修改前要拷贝备份。

这样，坐标参考系统、.format与文件架构就定义完成，所以他不能以只读方式打开，也不能是 'a' 模式。新的记录用:py:meth:~fiona.collection.Collection.write 的方式写在文件最尾处，因此，该文件的长度就从48增加到了49。

In [36]:

import os, stat
shp_file = '/tmp/prov_capital'
os.chmod(shp_file + '.shp', stat.S_IRUSR + stat.S_IWUSR)
os.chmod(shp_file + '.dbf', stat.S_IRUSR + stat.S_IWUSR)
os.chmod(shp_file + '.shx', stat.S_IRUSR + stat.S_IWUSR)
with fiona.open(shp_file + '.shp', 'a') as c:
    print(len(c))

你写入的记录必须匹配文件模式（因为一个文件包含一个记录型）。如果不是的话，会出现 ValueError 的异常。

with fiona.open(shp_file + '.shp', 'a') as c:
    c.write({'properties': {'foo': 'bar'}})
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)

若忽略一个记录的ID值，可能就会被下一个生成的值替换掉。如果你只是读文件，仅参考上面的即可，

In [39]:

shp_file = '/tmp/prov_capital'
os.chmod(shp_file + '.shp', stat.S_IRUSR + stat.S_IWUSR)
os.chmod(shp_file + '.dbf', stat.S_IRUSR + stat.S_IWUSR)
os.chmod(shp_file + '.shx', stat.S_IRUSR + stat.S_IWUSR)
with fiona.open('/tmp/prov_capital.shp') as c:
    records = list(c)   # records = c.next()

In [ ]:

records[-1]['id']

In [ ]:

records[-1]['properties']['CNTRY_NAME']

你可能还会看到ID值为 '-1' ，这记录有时可能会被 '48' 替换掉。

:py:meth:~fiona.collection.Collection.是文件集的唯一记录。:py:meth:~fiona.collection.Collection.writerecords的写入会成为序列值（或迭代器）。

In [45]:

with fiona.open('/tmp/prov_capital.shp', 'a') as c:
    c.writerecords([rec, rec,rec])
    print(len(c))

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-45-26937f503ada> in <module>()
      1 with fiona.open('/tmp/prov_capital.shp', 'a') as c:
----> 2     c.writerecords([rec, rec,rec])
      3     print(len(c))

/usr/lib/python3/dist-packages/fiona/collection.py in writerecords(self, records)
    327         if self.mode not in ('a', 'w'):
    328             raise IOError("collection not open for writing")
--> 329         self.session.writerecs(records, self)
    330         self._len = self.session.get_length()
    331         self._bounds = self.session.get_extent()

fiona/ogrext.pyx in fiona.ogrext.WritingSession.writerecs (fiona/ogrext.c:17237)()

ValueError: Record does not match collection schema: odict_keys(['name', 'lat', 'lon', 'CNTRY_NAME']) != ['lat', 'lon', 'name']

复制。Fiona集有复制功能。以上的代码共复制了3次，而且它们都有独立的ID号。
缓冲。Fiona输出有缓冲功能。在集关闭后，write 和 writerecords 在磁盘内会刷新。你也可以叫作 flush，会定期写进磁盘的缓冲区域。

写入新文件

写一个新文件比添加现有文件要复杂，因为 CRS、格式、模式等都尚未创建，而且这些都是必做的。解决方法就是上面所述的映射模式。CRS有映射功能， :py:attr:fiona.drivers 列表基本把可能的CRS映射格式都已列举出来。

In [46]:

with fiona.open('/gdata/prov_capital.shp') as source:
    source_driver = source.driver
    source_crs = source.crs
    source_schema = source.schema
print(source_driver)

ESRI Shapefile

In [ ]:

source_crs

In [ ]:

from pprint import pprint
pprint(source_schema)

现在创建一个新文件。

In [47]:

with fiona.open( '/tmp/foo.shp', 'w', driver=source_driver, crs=source_crs, schema=source_schema) as c:
    print(len(c))

In [48]:

print(len(c))

In [49]:

c.closed

Out[49]:

True

In [50]:

len(c)

Out[50]:

由于对源架构的性能有要求，可以在写入的模式集中使用同一命令，书面文件的字段可与源文件使用同一命令。

ogrinfo '/tmp/foo.shp' foo -so
INFO: Open of `/tmp/foo.shp'
using driver `ESRI Shapefile' successful.

~fiona.collection.Collection.meta 属性可以使复制文件元属性更加容易。

In [51]:

source = fiona.open('/gdata/prov_capital.shp')
sink = fiona.open('/tmp/foo.shp', 'w', **source.meta)

Ordering Record Fields排序字段

在Fiona 1.0.1中， fiona.open 的 'schema '参数就是一个有序的库列表（键，值），并指定一个序列。如果给定一个常规库，用输出 ~items 库的方法来排序。例如，

In [52]:

{'bar': 'int', 'foo': 'str'}.keys()

Out[52]:

dict_keys(['foo', 'bar'])

{'properties': {'bar': 'int', 'foo': 'str'}} 模式可生成 shapefile，第一个字段是 'foo' ，第二个字段是 'bar' 。如果你想要把 'bar' 作为第一个字段，你必须用一个属性列表项。

另外，要注意在 schema 要声明 geometry ，其类型为 Polygon ，注意大小写。

In [53]:

c = fiona.open( '/tmp/foo2.shp', 'w',schema={'properties': [('bar', 'int'), ('foo', 'str')], 'geometry': 'Polygon'}, driver = 'ESRI Shapefile')

或一个有序的库。

In [54]:

from collections import OrderedDict
schema_props = OrderedDict([('bar', 'int'), ('foo', 'str')])

In [55]:

c = fiona.open('/tmp/foo.shp','w',schema={'properties': schema_props, 'geometry': 'Polygon'}, driver = 'ESRI Shapefile')

坐标和几何类型

如果你写一个三维坐标（X，Y，Z）元组到2D文件（'point' 几何模式），Z值就不会显示。

如果你写一个二维坐标（x，y）元组到3D文件（'3D Point' 几何模式），默认的Z值就是0。

多层数据的处理

读取多层数据

一般GIS数据格式可以在单个文件或库中编码多个层或特征类型，而ESRI Shapefile只有一个主题层或功能型的单一dataset。ESRI文件地理数据库 <http://www.gdal.org/ogr/drv_filegdb.html> 就是一个例子，它是一个包含多个shapefiles的库。以下命令是Fiona测试数据中创建的二层数据源。

In [56]:

import subprocess
import os
if not os.path.exists('/tmp/data'):
    os.mkdir('/tmp/data')     

cmd1 = 'ogr2ogr /tmp/data/ /gdata/world_borders.shp world_borders -nln foo'
cmd2 = 'ogr2ogr /tmp/data/ /gdata/world_borders.shp world_borders -nln bar'
subprocess.call(cmd1, shell=True)
subprocess.call(cmd2, shell=True)

Out[56]:

一个数据源层可以用函数fiona.listlayers 列一下，写法是 listlayers()。在 Shapefile格式的情况下，层的名称需匹配文件库的名称。

In [57]:

import fiona
fiona.listlayers('/tmp/data')

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-57-ccfcbf688402> in <module>()
      1 import fiona
----> 2 fiona.listlayers('/tmp/data')

/usr/lib/python3/dist-packages/fiona/__init__.py in listlayers(path, vfs)
    237 
    238     with drivers():
--> 239         return _listlayers(vsi_path(path, vsi, archive))
    240 
    241 def parse_paths(path, vfs=None):

fiona/ogrext.pyx in fiona.ogrext._listlayers (fiona/ogrext.c:20414)()

ValueError: No data available at path '/tmp/data'

不像OGR，Fiona没有层与数据源的分类。要想访问层，需用数据源路径打开一个集，并用 layer 来指定该层。

In [ ]:

import pprint
datasrc_path = '/tmp/data'
for name in fiona.listlayers(datasrc_path):
    with fiona.open(datasrc_path, layer=name) as c:
        pprint.pprint(c.schema)

层还可以由索引来指定。

In [ ]:

for i, name in enumerate(fiona.listlayers(datasrc_path)):
    with fiona.open(datasrc_path, layer=i) as c:
        print(len(c))

若没有指定层， fiona.open 会返回第一层的开放集。

In [ ]:

with fiona.open(datasrc_path) as c:
    c.name == fiona.listlayers(datasrc_path)[0]

要打开一个可读的shapefile，最简单的方法就是fiona.open参数，可以把它作为一个命名图层的数据源。

In [ ]:

fiona.open('/tmp/data/foo.shp', 'r', layer='foo')

在实践中，依靠隐含的第一层和默认 'r' 模式是很实用的，打开shapefile：

In [ ]:

fiona.open('/tmp/data/foo.shp')

写入多层数据

要写一个全新的多层源数据，只需提供唯一的名字到 layer 参数中即可。

In [ ]:

'wah' not in fiona.listlayers(datasrc_path)

In [ ]:

with fiona.open(datasrc_path, layer='bar') as c:
    with fiona.open(datasrc_path, 'w', layer='wah', **c.meta) as d:
        d.write(next(c))


fiona.listlayers(datasrc_path)

在 'w' 模式中，如果指定的话，会覆盖原有的层，就像是Python的open功能似的。

In [ ]:

'wah' in fiona.listlayers(datasrc_path)

In [ ]:

with fiona.open(datasrc_path, layer='bar') as c:
    with fiona.open(datasrc_path, 'w', layer='wah', **c.meta) as d:
        pass

高级主题

切片和masking迭代器

若想添加空间矢量数据格式，可查找bounding box。函数:py:meth:~fiona.collection.Collection.items 返回一个FIDs迭代器，并记录交叉指定的 (minx, miny, maxx, maxy) bounding box或几何对象。自己的坐标参考系统可用来解译box值。如果你要查看迭代器列表，可通过Python的builtin :py:func:list函数，如下所示。

In [58]:

import fiona
c = fiona.open('/gdata/prov_capital.shp')
hits = list(c.items(bbox=(-5.0, 55.0, 0.0, 60.0)))
len(hits)

Out[58]:

迭代器与切割参数itertools.islice 的 stop 或 start, stop[, step] 的功能相同。要想查找迭代器的前两个参数，暂停即可。

In [ ]:

hits = c.items(2, bbox=(-5.0, 55.0, 0.0, 60.0))
len(list(hits))

要想得到迭代器第三到第五个参数，启动和停止即可。

In [ ]:

hits = c.items(2, 5, bbox=(-5.0, 55.0, 0.0, 60.0))
len(list(hits))

要想过滤属性值，可使用Python的内置 :py:func:filter 和:py:keyword:lambda函数，或者你也可以用单一属性的过滤功能，返回值为 True or False 。

In [ ]:

def pass_positive_area(rec):
    return rec['properties'].get('AREA', 0.0)>0.0

In [59]:

c = fiona.open('/gdata/prov_capital.shp')
hits = filter(pass_positive_area, c)
len(list(hits))

---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
<ipython-input-59-9c8decdce416> in <module>()
      1 c = fiona.open('/gdata/prov_capital.shp')
----> 2 hits = filter(pass_positive_area, c)
      3 len(list(hits))

NameError: name 'pass_positive_area' is not defined

In [ ]:

Python与开源GIS：使用OGR模块打开矢量数据

获取图层（Layer）信息

使用OGR创建要素几何形状（Geometry）

空间过滤器（Spatial filters）

空间计算

使用Fiona

使用Fiona

读取矢量数据

索引集

关闭文件

格式的驱动，CRS，界限和图式

数据的驱动

数据的投影参数及转换方法

数据集中要素的数目、范围

数据图表（Schema）

保持架构简单

字段类型

几何类型

记录

记录ID

记录属性

几何记录

点集理论和简易特性

矢量数据写法

添加数据到现有文件

写入新文件

Ordering Record Fields排序字段

坐标和几何类型

多层数据的处理

读取多层数据

写入多层数据

高级主题

切片和masking迭代器