正在合并数据#

在中组合数据集有两种方法 地貌熊猫 --属性连接和空间连接。

在属性联接中, GeoSeriesGeoDataFrame 与常规的 pandas.Seriespandas.DataFrame 基于一个公共变量。这类似于正常的合并或加入 熊猫

在空间连接中,来自两个 GeoSeriesGeoDataFrame 根据它们彼此之间的空间关系组合在一起。

在以下示例中,我们使用以下数据集:

In [1]: world = geopandas.read_file(geopandas.datasets.get_path('naturalearth_lowres'))

In [2]: cities = geopandas.read_file(geopandas.datasets.get_path('naturalearth_cities'))

# For attribute join
In [3]: country_shapes = world[['geometry', 'iso_a3']]

In [4]: country_names = world[['name', 'iso_a3']]

# For spatial join
In [5]: countries = world[['geometry', 'name']]

In [6]: countries = countries.rename(columns={'name':'country'})

追加#

追加 GeoDataFrameGeoSeries 使用熊猫 append() 方法:研究方法。请记住,附加的几何图形列需要具有相同的CRS。

# Appending GeoSeries
In [7]: joined = world.geometry.append(cities.geometry)

# Appending GeoDataFrames
In [8]: europe = world[world.continent == 'Europe']

In [9]: asia = world[world.continent == 'Asia']

In [10]: eurasia = europe.append(asia)

属性联接#

属性联接是使用 merge() 方法。一般情况下,建议使用 merge() 从空间数据集中调用的方法。话虽如此,单机版 pandas.merge() 函数将在以下情况下工作 GeoDataFrame 是在 left 参数;如果 DataFrame 是在 left 参数和一个 GeoDataFrame 是在 right 位置,结果将不再是 GeoDataFrame

例如,考虑下面的合并,它将全名添加到 GeoDataFrame 最初只有每个国家的ISO代码,方法是将其与 DataFrame

# `country_shapes` is GeoDataFrame with country shapes and iso codes
In [11]: country_shapes.head()
Out[11]: 
                                            geometry iso_a3
0  MULTIPOLYGON (((180.000000000 -16.067132664, 1...    FJI
1  POLYGON ((33.903711197 -0.950000000, 34.072620...    TZA
2  POLYGON ((-8.665589565 27.656425890, -8.665124...    ESH
3  MULTIPOLYGON (((-122.840000000 49.000000000, -...    CAN
4  MULTIPOLYGON (((-122.840000000 49.000000000, -...    USA

# `country_names` is DataFrame with country names and iso codes
In [12]: country_names.head()
Out[12]: 
                       name iso_a3
0                      Fiji    FJI
1                  Tanzania    TZA
2                 W. Sahara    ESH
3                    Canada    CAN
4  United States of America    USA

# Merge with `merge` method on shared variable (iso codes):
In [13]: country_shapes = country_shapes.merge(country_names, on='iso_a3')

In [14]: country_shapes.head()
Out[14]: 
                                            geometry iso_a3                      name
0  MULTIPOLYGON (((180.000000000 -16.067132664, 1...    FJI                      Fiji
1  POLYGON ((33.903711197 -0.950000000, 34.072620...    TZA                  Tanzania
2  POLYGON ((-8.665589565 27.656425890, -8.665124...    ESH                 W. Sahara
3  MULTIPOLYGON (((-122.840000000 49.000000000, -...    CAN                    Canada
4  MULTIPOLYGON (((-122.840000000 49.000000000, -...    USA  United States of America

空间连接#

在空间连接中,两个几何图形对象根据它们彼此的空间关系进行合并。

# One GeoDataFrame of countries, one of Cities.
# Want to merge so we can get each city's country.
In [15]: countries.head()
Out[15]: 
                                            geometry                   country
0  MULTIPOLYGON (((180.000000000 -16.067132664, 1...                      Fiji
1  POLYGON ((33.903711197 -0.950000000, 34.072620...                  Tanzania
2  POLYGON ((-8.665589565 27.656425890, -8.665124...                 W. Sahara
3  MULTIPOLYGON (((-122.840000000 49.000000000, -...                    Canada
4  MULTIPOLYGON (((-122.840000000 49.000000000, -...  United States of America

In [16]: cities.head()
Out[16]: 
           name                           geometry
0  Vatican City  POINT (12.453386545 41.903282180)
1    San Marino  POINT (12.441770158 43.936095835)
2         Vaduz   POINT (9.516669473 47.133723774)
3    Luxembourg   POINT (6.130002806 49.611660379)
4       Palikir  POINT (158.149974324 6.916643696)

# Execute spatial join
In [17]: cities_with_country = cities.sjoin(countries, how="inner", predicate='intersects')
---------------------------------------------------------------------------
ImportError                               Traceback (most recent call last)
Input In [17], in <cell line: 1>()
----> 1 cities_with_country = cities.sjoin(countries, how="inner", predicate='intersects')

File /usr/local/lib/python3.10/dist-packages/geopandas-0.10.2+79.g3abc6a7-py3.10.egg/geopandas/geodataframe.py:1983, in GeoDataFrame.sjoin(self, df, *args, **kwargs)
   1905 def sjoin(self, df, *args, **kwargs):
   1906     """Spatial join of two GeoDataFrames.
   1907 
   1908     See the User Guide page :doc:`../../user_guide/mergingdata` for details.
   (...)
   1981     sjoin : equivalent top-level function
   1982     """
-> 1983     return geopandas.sjoin(left_df=self, right_df=df, *args, **kwargs)

File /usr/local/lib/python3.10/dist-packages/geopandas-0.10.2+79.g3abc6a7-py3.10.egg/geopandas/tools/sjoin.py:124, in sjoin(left_df, right_df, how, predicate, lsuffix, rsuffix, **kwargs)
    120     raise TypeError(f"sjoin() got an unexpected keyword argument '{first}'")
    122 _basic_checks(left_df, right_df, how, lsuffix, rsuffix)
--> 124 indices = _geom_predicate_query(left_df, right_df, predicate)
    126 joined = _frame_join(indices, left_df, right_df, how, lsuffix, rsuffix)
    128 return joined

File /usr/local/lib/python3.10/dist-packages/geopandas-0.10.2+79.g3abc6a7-py3.10.egg/geopandas/tools/sjoin.py:216, in _geom_predicate_query(left_df, right_df, predicate)
    212         input_geoms = right_df.geometry
    213     else:
    214         # all other predicates are symmetric
    215         # keep them the same
--> 216         sindex = right_df.sindex
    217         input_geoms = left_df.geometry
    219 if sindex:

File /usr/local/lib/python3.10/dist-packages/geopandas-0.10.2+79.g3abc6a7-py3.10.egg/geopandas/base.py:2706, in GeoPandasBase.sindex(self)
   2655 @property
   2656 def sindex(self):
   2657     """Generate the spatial index
   2658 
   2659     Creates R-tree spatial index based on ``pygeos.STRtree`` or
   (...)
   2704            [2]])
   2705     """
-> 2706     return self.geometry.values.sindex

File /usr/local/lib/python3.10/dist-packages/geopandas-0.10.2+79.g3abc6a7-py3.10.egg/geopandas/array.py:291, in GeometryArray.sindex(self)
    288 @property
    289 def sindex(self):
    290     if self._sindex is None:
--> 291         self._sindex = _get_sindex_class()(self.data)
    292     return self._sindex

File /usr/local/lib/python3.10/dist-packages/geopandas-0.10.2+79.g3abc6a7-py3.10.egg/geopandas/sindex.py:21, in _get_sindex_class()
     19 if compat.HAS_RTREE:
     20     return RTreeIndex
---> 21 raise ImportError(
     22     "Spatial indexes require either `rtree` or `pygeos`. "
     23     "See installation instructions at https://geopandas.org/install.html"
     24 )

ImportError: Spatial indexes require either `rtree` or `pygeos`. See installation instructions at https://geopandas.org/install.html

In [18]: cities_with_country.head()
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Input In [18], in <cell line: 1>()
----> 1 cities_with_country.head()

NameError: name 'cities_with_country' is not defined

GeoPandas提供了两个空间连接函数:

备注

由于历史原因,这两种方法也可以作为顶级函数使用 sjoin()sjoin_nearest() 。建议使用方法,因为这些函数在将来可能会被弃用。

二元谓词连接#

二进制谓词连接可通过 GeoDataFrame.sjoin()

GeoDataFrame.sjoin() 有两个核心论点: howpredicate

predicate

这个 predicate 参数指定如何 geopandas 根据对象的几何关系决定是否将一个对象的属性连接到另一个对象。

的值 predicate 对应于几何二元谓词的名称,并取决于空间索引实现。

中的默认空间索引 geopandas currently supports the following values for predicate which are defined in the Shapely documentation

  • intersects

  • contains

  • within

  • touches

  • crosses

  • overlaps

how

这个 how 参数指定将发生的连接类型以及结果中保留的几何 GeoDataFrame 。它接受以下选项:

  • left :使用第一个(或 left_df ) GeoDataFrame 您提供给 GeoDataFrame.sjoin() ;仅保留 left_df 几何图形列

  • right :使用从秒开始的索引(或 right_df );只保留 right_df 几何图形列

  • inner :使用两个索引值的交集 GeoDataFrame ;仅保留 left_df 几何图形列

注通过将几何运算与空间连接相结合,可以研究更复杂的空间关系。例如,要查找某个点的给定距离内的所有面,可以首先使用 buffer() 方法将每个点展开成一个具有适当半径的圆,然后将这些缓冲的圆与所讨论的多边形相交。

最近的连接#

基于邻近度的连接可以通过 GeoDataFrame.sjoin_nearest()

GeoDataFrame.sjoin_nearest() 分享 how 与……争论 GeoDataFrame.sjoin() ,并包括两个附加参数: max_distancedistance_col

max_distance

这个 max_distance 参数指定匹配几何图形的最大搜索半径。在某些情况下,这可能会对性能产生相当大的影响。如果可以,强烈建议您使用此参数。

distance_col

如果设置,则生成的GeoDataFrame将包括一个同名的列,该列包含输入几何图形和最近的几何图形之间的计算距离。