正在合并数据#
在中组合数据集有两种方法 地貌熊猫 --属性连接和空间连接。
在属性联接中, GeoSeries
或 GeoDataFrame
与常规的 pandas.Series
或 pandas.DataFrame
基于一个公共变量。这类似于正常的合并或加入 熊猫 。
在空间连接中,来自两个 GeoSeries
或 GeoDataFrame
根据它们彼此之间的空间关系组合在一起。
在以下示例中,我们使用以下数据集:
In [1]: world = geopandas.read_file(geopandas.datasets.get_path('naturalearth_lowres'))
In [2]: cities = geopandas.read_file(geopandas.datasets.get_path('naturalearth_cities'))
# For attribute join
In [3]: country_shapes = world[['geometry', 'iso_a3']]
In [4]: country_names = world[['name', 'iso_a3']]
# For spatial join
In [5]: countries = world[['geometry', 'name']]
In [6]: countries = countries.rename(columns={'name':'country'})
追加#
追加 GeoDataFrame
和 GeoSeries
使用熊猫 append()
方法:研究方法。请记住,附加的几何图形列需要具有相同的CRS。
# Appending GeoSeries
In [7]: joined = world.geometry.append(cities.geometry)
# Appending GeoDataFrames
In [8]: europe = world[world.continent == 'Europe']
In [9]: asia = world[world.continent == 'Asia']
In [10]: eurasia = europe.append(asia)
属性联接#
属性联接是使用 merge()
方法。一般情况下,建议使用 merge()
从空间数据集中调用的方法。话虽如此,单机版 pandas.merge()
函数将在以下情况下工作 GeoDataFrame
是在 left
参数;如果 DataFrame
是在 left
参数和一个 GeoDataFrame
是在 right
位置,结果将不再是 GeoDataFrame
。
例如,考虑下面的合并,它将全名添加到 GeoDataFrame
最初只有每个国家的ISO代码,方法是将其与 DataFrame
。
# `country_shapes` is GeoDataFrame with country shapes and iso codes
In [11]: country_shapes.head()
Out[11]:
geometry iso_a3
0 MULTIPOLYGON (((180.000000000 -16.067132664, 1... FJI
1 POLYGON ((33.903711197 -0.950000000, 34.072620... TZA
2 POLYGON ((-8.665589565 27.656425890, -8.665124... ESH
3 MULTIPOLYGON (((-122.840000000 49.000000000, -... CAN
4 MULTIPOLYGON (((-122.840000000 49.000000000, -... USA
# `country_names` is DataFrame with country names and iso codes
In [12]: country_names.head()
Out[12]:
name iso_a3
0 Fiji FJI
1 Tanzania TZA
2 W. Sahara ESH
3 Canada CAN
4 United States of America USA
# Merge with `merge` method on shared variable (iso codes):
In [13]: country_shapes = country_shapes.merge(country_names, on='iso_a3')
In [14]: country_shapes.head()
Out[14]:
geometry iso_a3 name
0 MULTIPOLYGON (((180.000000000 -16.067132664, 1... FJI Fiji
1 POLYGON ((33.903711197 -0.950000000, 34.072620... TZA Tanzania
2 POLYGON ((-8.665589565 27.656425890, -8.665124... ESH W. Sahara
3 MULTIPOLYGON (((-122.840000000 49.000000000, -... CAN Canada
4 MULTIPOLYGON (((-122.840000000 49.000000000, -... USA United States of America
空间连接#
在空间连接中,两个几何图形对象根据它们彼此的空间关系进行合并。
# One GeoDataFrame of countries, one of Cities.
# Want to merge so we can get each city's country.
In [15]: countries.head()
Out[15]:
geometry country
0 MULTIPOLYGON (((180.000000000 -16.067132664, 1... Fiji
1 POLYGON ((33.903711197 -0.950000000, 34.072620... Tanzania
2 POLYGON ((-8.665589565 27.656425890, -8.665124... W. Sahara
3 MULTIPOLYGON (((-122.840000000 49.000000000, -... Canada
4 MULTIPOLYGON (((-122.840000000 49.000000000, -... United States of America
In [16]: cities.head()
Out[16]:
name geometry
0 Vatican City POINT (12.453386545 41.903282180)
1 San Marino POINT (12.441770158 43.936095835)
2 Vaduz POINT (9.516669473 47.133723774)
3 Luxembourg POINT (6.130002806 49.611660379)
4 Palikir POINT (158.149974324 6.916643696)
# Execute spatial join
In [17]: cities_with_country = cities.sjoin(countries, how="inner", predicate='intersects')
---------------------------------------------------------------------------
ImportError Traceback (most recent call last)
Input In [17], in <cell line: 1>()
----> 1 cities_with_country = cities.sjoin(countries, how="inner", predicate='intersects')
File /usr/local/lib/python3.10/dist-packages/geopandas-0.10.2+79.g3abc6a7-py3.10.egg/geopandas/geodataframe.py:1983, in GeoDataFrame.sjoin(self, df, *args, **kwargs)
1905 def sjoin(self, df, *args, **kwargs):
1906 """Spatial join of two GeoDataFrames.
1907
1908 See the User Guide page :doc:`../../user_guide/mergingdata` for details.
(...)
1981 sjoin : equivalent top-level function
1982 """
-> 1983 return geopandas.sjoin(left_df=self, right_df=df, *args, **kwargs)
File /usr/local/lib/python3.10/dist-packages/geopandas-0.10.2+79.g3abc6a7-py3.10.egg/geopandas/tools/sjoin.py:124, in sjoin(left_df, right_df, how, predicate, lsuffix, rsuffix, **kwargs)
120 raise TypeError(f"sjoin() got an unexpected keyword argument '{first}'")
122 _basic_checks(left_df, right_df, how, lsuffix, rsuffix)
--> 124 indices = _geom_predicate_query(left_df, right_df, predicate)
126 joined = _frame_join(indices, left_df, right_df, how, lsuffix, rsuffix)
128 return joined
File /usr/local/lib/python3.10/dist-packages/geopandas-0.10.2+79.g3abc6a7-py3.10.egg/geopandas/tools/sjoin.py:216, in _geom_predicate_query(left_df, right_df, predicate)
212 input_geoms = right_df.geometry
213 else:
214 # all other predicates are symmetric
215 # keep them the same
--> 216 sindex = right_df.sindex
217 input_geoms = left_df.geometry
219 if sindex:
File /usr/local/lib/python3.10/dist-packages/geopandas-0.10.2+79.g3abc6a7-py3.10.egg/geopandas/base.py:2706, in GeoPandasBase.sindex(self)
2655 @property
2656 def sindex(self):
2657 """Generate the spatial index
2658
2659 Creates R-tree spatial index based on ``pygeos.STRtree`` or
(...)
2704 [2]])
2705 """
-> 2706 return self.geometry.values.sindex
File /usr/local/lib/python3.10/dist-packages/geopandas-0.10.2+79.g3abc6a7-py3.10.egg/geopandas/array.py:291, in GeometryArray.sindex(self)
288 @property
289 def sindex(self):
290 if self._sindex is None:
--> 291 self._sindex = _get_sindex_class()(self.data)
292 return self._sindex
File /usr/local/lib/python3.10/dist-packages/geopandas-0.10.2+79.g3abc6a7-py3.10.egg/geopandas/sindex.py:21, in _get_sindex_class()
19 if compat.HAS_RTREE:
20 return RTreeIndex
---> 21 raise ImportError(
22 "Spatial indexes require either `rtree` or `pygeos`. "
23 "See installation instructions at https://geopandas.org/install.html"
24 )
ImportError: Spatial indexes require either `rtree` or `pygeos`. See installation instructions at https://geopandas.org/install.html
In [18]: cities_with_country.head()
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
Input In [18], in <cell line: 1>()
----> 1 cities_with_country.head()
NameError: name 'cities_with_country' is not defined
GeoPandas提供了两个空间连接函数:
GeoDataFrame.sjoin()
:基于二元谓词(交集、包含等)的联接GeoDataFrame.sjoin_nearest()
:基于接近程度合并,并能够设置最大搜索半径。
备注
由于历史原因,这两种方法也可以作为顶级函数使用 sjoin()
和 sjoin_nearest()
。建议使用方法,因为这些函数在将来可能会被弃用。
二元谓词连接#
二进制谓词连接可通过 GeoDataFrame.sjoin()
。
GeoDataFrame.sjoin()
有两个核心论点: how
和 predicate
。
predicate
这个 predicate
参数指定如何 geopandas
根据对象的几何关系决定是否将一个对象的属性连接到另一个对象。
的值 predicate
对应于几何二元谓词的名称,并取决于空间索引实现。
中的默认空间索引 geopandas
currently supports the following values for predicate
which are defined in the Shapely documentation :
intersects
contains
within
touches
crosses
overlaps
how
这个 how 参数指定将发生的连接类型以及结果中保留的几何 GeoDataFrame
。它接受以下选项:
left
:使用第一个(或 left_df )GeoDataFrame
您提供给GeoDataFrame.sjoin()
;仅保留 left_df 几何图形列right
:使用从秒开始的索引(或 right_df );只保留 right_df 几何图形列inner
:使用两个索引值的交集GeoDataFrame
;仅保留 left_df 几何图形列
注通过将几何运算与空间连接相结合,可以研究更复杂的空间关系。例如,要查找某个点的给定距离内的所有面,可以首先使用 buffer()
方法将每个点展开成一个具有适当半径的圆,然后将这些缓冲的圆与所讨论的多边形相交。
最近的连接#
基于邻近度的连接可以通过 GeoDataFrame.sjoin_nearest()
。
GeoDataFrame.sjoin_nearest()
分享 how
与……争论 GeoDataFrame.sjoin()
,并包括两个附加参数: max_distance
和 distance_col
。
max_distance
这个 max_distance
参数指定匹配几何图形的最大搜索半径。在某些情况下,这可能会对性能产生相当大的影响。如果可以,强烈建议您使用此参数。
distance_col
如果设置,则生成的GeoDataFrame将包括一个同名的列,该列包含输入几何图形和最近的几何图形之间的计算距离。