备注
空间连接#
A 空间连接 用途 binary predicates 比如 intersects
和 crosses
将两个组合在一起 GeoDataFrames
基于它们几何图形之间的空间关系。
一个常见的用例可能是点图层和多边形图层之间的空间连接,其中您希望保留点几何图形并获取相交多边形的属性。
空间连接的类型#
我们目前支持以下空间连接方法。我们是指 left_df 和 right_df 它们对应于作为args传入的两个数据帧。
左向外连接#
在左外部连接中 (how='left'
),我们保持 all 行,并在必要时复制它们,以表示两个数据帧之间的多个命中。如果它们相交并丢失不相交的右行,我们将保留右行的属性。Left Out Join意味着我们对保留左侧的几何图形感兴趣。
这相当于PostGIS查询:
SELECT pts.geom, pts.id as ptid, polys.id as polyid
FROM pts
LEFT OUTER JOIN polys
ON ST_Intersects(pts.geom, polys.geom);
geom | ptid | polyid
--------------------------------------------+------+--------
010100000040A9FBF2D88AD03F349CD47D796CE9BF | 4 | 10
010100000048EABE3CB622D8BFA8FBF2D88AA0E9BF | 3 | 10
010100000048EABE3CB622D8BFA8FBF2D88AA0E9BF | 3 | 20
0101000000F0D88AA0E1A4EEBF7052F7E5B115E9BF | 2 | 20
0101000000818693BA2F8FF7BF4ADD97C75604E9BF | 1 |
(5 rows)
右外连接#
在右外部连接中 (how='right'
),我们保持 all 行,并在必要时复制它们,以表示两个数据帧之间的多个命中。如果左行相交并丢失不相交的左行,我们将保留它们的属性。右侧的外部连接意味着我们对保留右侧的几何图形感兴趣。
这相当于PostGIS查询:
SELECT polys.geom, pts.id as ptid, polys.id as polyid
FROM pts
RIGHT OUTER JOIN polys
ON ST_Intersects(pts.geom, polys.geom);
geom | ptid | polyid
----------+------+--------
01...9BF | 4 | 10
01...9BF | 3 | 10
02...7BF | 3 | 20
02...7BF | 2 | 20
00...5BF | | 30
(5 rows)
内连接#
在内部连接中 (how='inner'
),我们只在它们的二元谓词所在的位置保留从右到左的行 True
。如果有必要,我们会复制它们以表示两个数据帧之间的多个命中。只有当右和左的属性相交并丢弃所有不相交的行时,我们才保留它们的属性。内部联接意味着我们对保留左侧的几何图形感兴趣。
这相当于PostGIS查询:
SELECT pts.geom, pts.id as ptid, polys.id as polyid
FROM pts
INNER JOIN polys
ON ST_Intersects(pts.geom, polys.geom);
geom | ptid | polyid
--------------------------------------------+------+--------
010100000040A9FBF2D88AD03F349CD47D796CE9BF | 4 | 10
010100000048EABE3CB622D8BFA8FBF2D88AA0E9BF | 3 | 10
010100000048EABE3CB622D8BFA8FBF2D88AA0E9BF | 3 | 20
0101000000F0D88AA0E1A4EEBF7052F7E5B115E9BF | 2 | 20
(4 rows)
两个GeoDataFrame之间的空间连接#
让我们来看看我们将如何使用 GeoPandas
。首先,将NYC测试数据加载到 GeoDataFrames
:
[1]:
%matplotlib inline
from shapely.geometry import Point
from geopandas import datasets, GeoDataFrame, read_file
# NYC Boros
zippath = datasets.get_path('nybb')
polydf = read_file(zippath)
# Generate some points
b = [int(x) for x in polydf.total_bounds]
N = 8
pointdf = GeoDataFrame([
{'geometry': Point(x, y), 'value1': x + y, 'value2': x - y}
for x, y in zip(range(b[0], b[2], int((b[2] - b[0]) / N)),
range(b[1], b[3], int((b[3] - b[1]) / N)))])
# Make sure they're using the same projection reference
pointdf.crs = polydf.crs
[2]:
pointdf
[2]:
geometry | value1 | value2 | |
---|---|---|---|
0 | POINT (913175.000 120121.000) | 1033296 | 793054 |
1 | POINT (932450.000 139211.000) | 1071661 | 793239 |
2 | POINT (951725.000 158301.000) | 1110026 | 793424 |
3 | POINT (971000.000 177391.000) | 1148391 | 793609 |
4 | POINT (990275.000 196481.000) | 1186756 | 793794 |
5 | POINT (1009550.000 215571.000) | 1225121 | 793979 |
6 | POINT (1028825.000 234661.000) | 1263486 | 794164 |
7 | POINT (1048100.000 253751.000) | 1301851 | 794349 |
8 | POINT (1067375.000 272841.000) | 1340216 | 794534 |
[3]:
polydf
[3]:
BoroCode | BoroName | Shape_Leng | Shape_Area | geometry | |
---|---|---|---|---|---|
0 | 5 | Staten Island | 330470.010332 | 1.623820e+09 | MULTIPOLYGON (((970217.022 145643.332, 970227.... |
1 | 4 | Queens | 896344.047763 | 3.045213e+09 | MULTIPOLYGON (((1029606.077 156073.814, 102957... |
2 | 3 | Brooklyn | 741080.523166 | 1.937479e+09 | MULTIPOLYGON (((1021176.479 151374.797, 102100... |
3 | 1 | Manhattan | 359299.096471 | 6.364715e+08 | MULTIPOLYGON (((981219.056 188655.316, 980940.... |
4 | 2 | Bronx | 464392.991824 | 1.186925e+09 | MULTIPOLYGON (((1012821.806 229228.265, 101278... |
[4]:
pointdf.plot()
[4]:
<AxesSubplot:>

[5]:
polydf.plot()
[5]:
<AxesSubplot:>

加入#
[6]:
join_left_df = pointdf.sjoin(polydf, how="left")
join_left_df
# Note the NaNs where the point did not intersect a boro
---------------------------------------------------------------------------
ImportError Traceback (most recent call last)
Input In [6], in <cell line: 1>()
----> 1 join_left_df = pointdf.sjoin(polydf, how="left")
2 join_left_df
File /usr/local/lib/python3.10/dist-packages/geopandas-0.10.2+79.g3abc6a7-py3.10.egg/geopandas/geodataframe.py:1983, in GeoDataFrame.sjoin(self, df, *args, **kwargs)
1905 def sjoin(self, df, *args, **kwargs):
1906 """Spatial join of two GeoDataFrames.
1907
1908 See the User Guide page :doc:`../../user_guide/mergingdata` for details.
(...)
1981 sjoin : equivalent top-level function
1982 """
-> 1983 return geopandas.sjoin(left_df=self, right_df=df, *args, **kwargs)
File /usr/local/lib/python3.10/dist-packages/geopandas-0.10.2+79.g3abc6a7-py3.10.egg/geopandas/tools/sjoin.py:124, in sjoin(left_df, right_df, how, predicate, lsuffix, rsuffix, **kwargs)
120 raise TypeError(f"sjoin() got an unexpected keyword argument '{first}'")
122 _basic_checks(left_df, right_df, how, lsuffix, rsuffix)
--> 124 indices = _geom_predicate_query(left_df, right_df, predicate)
126 joined = _frame_join(indices, left_df, right_df, how, lsuffix, rsuffix)
128 return joined
File /usr/local/lib/python3.10/dist-packages/geopandas-0.10.2+79.g3abc6a7-py3.10.egg/geopandas/tools/sjoin.py:216, in _geom_predicate_query(left_df, right_df, predicate)
212 input_geoms = right_df.geometry
213 else:
214 # all other predicates are symmetric
215 # keep them the same
--> 216 sindex = right_df.sindex
217 input_geoms = left_df.geometry
219 if sindex:
File /usr/local/lib/python3.10/dist-packages/geopandas-0.10.2+79.g3abc6a7-py3.10.egg/geopandas/base.py:2706, in GeoPandasBase.sindex(self)
2655 @property
2656 def sindex(self):
2657 """Generate the spatial index
2658
2659 Creates R-tree spatial index based on ``pygeos.STRtree`` or
(...)
2704 [2]])
2705 """
-> 2706 return self.geometry.values.sindex
File /usr/local/lib/python3.10/dist-packages/geopandas-0.10.2+79.g3abc6a7-py3.10.egg/geopandas/array.py:291, in GeometryArray.sindex(self)
288 @property
289 def sindex(self):
290 if self._sindex is None:
--> 291 self._sindex = _get_sindex_class()(self.data)
292 return self._sindex
File /usr/local/lib/python3.10/dist-packages/geopandas-0.10.2+79.g3abc6a7-py3.10.egg/geopandas/sindex.py:21, in _get_sindex_class()
19 if compat.HAS_RTREE:
20 return RTreeIndex
---> 21 raise ImportError(
22 "Spatial indexes require either `rtree` or `pygeos`. "
23 "See installation instructions at https://geopandas.org/install.html"
24 )
ImportError: Spatial indexes require either `rtree` or `pygeos`. See installation instructions at https://geopandas.org/install.html
[7]:
join_right_df = pointdf.sjoin(polydf, how="right")
join_right_df
# Note Staten Island is repeated
---------------------------------------------------------------------------
ImportError Traceback (most recent call last)
Input In [7], in <cell line: 1>()
----> 1 join_right_df = pointdf.sjoin(polydf, how="right")
2 join_right_df
File /usr/local/lib/python3.10/dist-packages/geopandas-0.10.2+79.g3abc6a7-py3.10.egg/geopandas/geodataframe.py:1983, in GeoDataFrame.sjoin(self, df, *args, **kwargs)
1905 def sjoin(self, df, *args, **kwargs):
1906 """Spatial join of two GeoDataFrames.
1907
1908 See the User Guide page :doc:`../../user_guide/mergingdata` for details.
(...)
1981 sjoin : equivalent top-level function
1982 """
-> 1983 return geopandas.sjoin(left_df=self, right_df=df, *args, **kwargs)
File /usr/local/lib/python3.10/dist-packages/geopandas-0.10.2+79.g3abc6a7-py3.10.egg/geopandas/tools/sjoin.py:124, in sjoin(left_df, right_df, how, predicate, lsuffix, rsuffix, **kwargs)
120 raise TypeError(f"sjoin() got an unexpected keyword argument '{first}'")
122 _basic_checks(left_df, right_df, how, lsuffix, rsuffix)
--> 124 indices = _geom_predicate_query(left_df, right_df, predicate)
126 joined = _frame_join(indices, left_df, right_df, how, lsuffix, rsuffix)
128 return joined
File /usr/local/lib/python3.10/dist-packages/geopandas-0.10.2+79.g3abc6a7-py3.10.egg/geopandas/tools/sjoin.py:216, in _geom_predicate_query(left_df, right_df, predicate)
212 input_geoms = right_df.geometry
213 else:
214 # all other predicates are symmetric
215 # keep them the same
--> 216 sindex = right_df.sindex
217 input_geoms = left_df.geometry
219 if sindex:
File /usr/local/lib/python3.10/dist-packages/geopandas-0.10.2+79.g3abc6a7-py3.10.egg/geopandas/base.py:2706, in GeoPandasBase.sindex(self)
2655 @property
2656 def sindex(self):
2657 """Generate the spatial index
2658
2659 Creates R-tree spatial index based on ``pygeos.STRtree`` or
(...)
2704 [2]])
2705 """
-> 2706 return self.geometry.values.sindex
File /usr/local/lib/python3.10/dist-packages/geopandas-0.10.2+79.g3abc6a7-py3.10.egg/geopandas/array.py:291, in GeometryArray.sindex(self)
288 @property
289 def sindex(self):
290 if self._sindex is None:
--> 291 self._sindex = _get_sindex_class()(self.data)
292 return self._sindex
File /usr/local/lib/python3.10/dist-packages/geopandas-0.10.2+79.g3abc6a7-py3.10.egg/geopandas/sindex.py:21, in _get_sindex_class()
19 if compat.HAS_RTREE:
20 return RTreeIndex
---> 21 raise ImportError(
22 "Spatial indexes require either `rtree` or `pygeos`. "
23 "See installation instructions at https://geopandas.org/install.html"
24 )
ImportError: Spatial indexes require either `rtree` or `pygeos`. See installation instructions at https://geopandas.org/install.html
[8]:
join_inner_df = pointdf.sjoin(polydf, how="inner")
join_inner_df
# Note the lack of NaNs; dropped anything that didn't intersect
---------------------------------------------------------------------------
ImportError Traceback (most recent call last)
Input In [8], in <cell line: 1>()
----> 1 join_inner_df = pointdf.sjoin(polydf, how="inner")
2 join_inner_df
File /usr/local/lib/python3.10/dist-packages/geopandas-0.10.2+79.g3abc6a7-py3.10.egg/geopandas/geodataframe.py:1983, in GeoDataFrame.sjoin(self, df, *args, **kwargs)
1905 def sjoin(self, df, *args, **kwargs):
1906 """Spatial join of two GeoDataFrames.
1907
1908 See the User Guide page :doc:`../../user_guide/mergingdata` for details.
(...)
1981 sjoin : equivalent top-level function
1982 """
-> 1983 return geopandas.sjoin(left_df=self, right_df=df, *args, **kwargs)
File /usr/local/lib/python3.10/dist-packages/geopandas-0.10.2+79.g3abc6a7-py3.10.egg/geopandas/tools/sjoin.py:124, in sjoin(left_df, right_df, how, predicate, lsuffix, rsuffix, **kwargs)
120 raise TypeError(f"sjoin() got an unexpected keyword argument '{first}'")
122 _basic_checks(left_df, right_df, how, lsuffix, rsuffix)
--> 124 indices = _geom_predicate_query(left_df, right_df, predicate)
126 joined = _frame_join(indices, left_df, right_df, how, lsuffix, rsuffix)
128 return joined
File /usr/local/lib/python3.10/dist-packages/geopandas-0.10.2+79.g3abc6a7-py3.10.egg/geopandas/tools/sjoin.py:216, in _geom_predicate_query(left_df, right_df, predicate)
212 input_geoms = right_df.geometry
213 else:
214 # all other predicates are symmetric
215 # keep them the same
--> 216 sindex = right_df.sindex
217 input_geoms = left_df.geometry
219 if sindex:
File /usr/local/lib/python3.10/dist-packages/geopandas-0.10.2+79.g3abc6a7-py3.10.egg/geopandas/base.py:2706, in GeoPandasBase.sindex(self)
2655 @property
2656 def sindex(self):
2657 """Generate the spatial index
2658
2659 Creates R-tree spatial index based on ``pygeos.STRtree`` or
(...)
2704 [2]])
2705 """
-> 2706 return self.geometry.values.sindex
File /usr/local/lib/python3.10/dist-packages/geopandas-0.10.2+79.g3abc6a7-py3.10.egg/geopandas/array.py:291, in GeometryArray.sindex(self)
288 @property
289 def sindex(self):
290 if self._sindex is None:
--> 291 self._sindex = _get_sindex_class()(self.data)
292 return self._sindex
File /usr/local/lib/python3.10/dist-packages/geopandas-0.10.2+79.g3abc6a7-py3.10.egg/geopandas/sindex.py:21, in _get_sindex_class()
19 if compat.HAS_RTREE:
20 return RTreeIndex
---> 21 raise ImportError(
22 "Spatial indexes require either `rtree` or `pygeos`. "
23 "See installation instructions at https://geopandas.org/install.html"
24 )
ImportError: Spatial indexes require either `rtree` or `pygeos`. See installation instructions at https://geopandas.org/install.html
我们并不局限于使用 intersection
二元谓词。任何一种 Shapely
可以通过将返回布尔值的 op
科瓦格。
[9]:
pointdf.sjoin(polydf, how="left", predicate="within")
---------------------------------------------------------------------------
ImportError Traceback (most recent call last)
Input In [9], in <cell line: 1>()
----> 1 pointdf.sjoin(polydf, how="left", predicate="within")
File /usr/local/lib/python3.10/dist-packages/geopandas-0.10.2+79.g3abc6a7-py3.10.egg/geopandas/geodataframe.py:1983, in GeoDataFrame.sjoin(self, df, *args, **kwargs)
1905 def sjoin(self, df, *args, **kwargs):
1906 """Spatial join of two GeoDataFrames.
1907
1908 See the User Guide page :doc:`../../user_guide/mergingdata` for details.
(...)
1981 sjoin : equivalent top-level function
1982 """
-> 1983 return geopandas.sjoin(left_df=self, right_df=df, *args, **kwargs)
File /usr/local/lib/python3.10/dist-packages/geopandas-0.10.2+79.g3abc6a7-py3.10.egg/geopandas/tools/sjoin.py:124, in sjoin(left_df, right_df, how, predicate, lsuffix, rsuffix, **kwargs)
120 raise TypeError(f"sjoin() got an unexpected keyword argument '{first}'")
122 _basic_checks(left_df, right_df, how, lsuffix, rsuffix)
--> 124 indices = _geom_predicate_query(left_df, right_df, predicate)
126 joined = _frame_join(indices, left_df, right_df, how, lsuffix, rsuffix)
128 return joined
File /usr/local/lib/python3.10/dist-packages/geopandas-0.10.2+79.g3abc6a7-py3.10.egg/geopandas/tools/sjoin.py:211, in _geom_predicate_query(left_df, right_df, predicate)
206 if predicate == "within":
207 # within is implemented as the inverse of contains
208 # contains is a faster predicate
209 # see discussion at https://github.com/geopandas/geopandas/pull/1421
210 predicate = "contains"
--> 211 sindex = left_df.sindex
212 input_geoms = right_df.geometry
213 else:
214 # all other predicates are symmetric
215 # keep them the same
File /usr/local/lib/python3.10/dist-packages/geopandas-0.10.2+79.g3abc6a7-py3.10.egg/geopandas/base.py:2706, in GeoPandasBase.sindex(self)
2655 @property
2656 def sindex(self):
2657 """Generate the spatial index
2658
2659 Creates R-tree spatial index based on ``pygeos.STRtree`` or
(...)
2704 [2]])
2705 """
-> 2706 return self.geometry.values.sindex
File /usr/local/lib/python3.10/dist-packages/geopandas-0.10.2+79.g3abc6a7-py3.10.egg/geopandas/array.py:291, in GeometryArray.sindex(self)
288 @property
289 def sindex(self):
290 if self._sindex is None:
--> 291 self._sindex = _get_sindex_class()(self.data)
292 return self._sindex
File /usr/local/lib/python3.10/dist-packages/geopandas-0.10.2+79.g3abc6a7-py3.10.egg/geopandas/sindex.py:21, in _get_sindex_class()
19 if compat.HAS_RTREE:
20 return RTreeIndex
---> 21 raise ImportError(
22 "Spatial indexes require either `rtree` or `pygeos`. "
23 "See installation instructions at https://geopandas.org/install.html"
24 )
ImportError: Spatial indexes require either `rtree` or `pygeos`. See installation instructions at https://geopandas.org/install.html
我们还可以与最近的邻居进行连接 sjoin_nearest
。
[10]:
pointdf.sjoin_nearest(polydf, how="left", distance_col="Distances")
# Note the optional Distances column with computed distances between each point
# and the nearest polydf geometry.
---------------------------------------------------------------------------
NotImplementedError Traceback (most recent call last)
Input In [10], in <cell line: 1>()
----> 1 pointdf.sjoin_nearest(polydf, how="left", distance_col="Distances")
File /usr/local/lib/python3.10/dist-packages/geopandas-0.10.2+79.g3abc6a7-py3.10.egg/geopandas/geodataframe.py:2095, in GeoDataFrame.sjoin_nearest(self, right, how, max_distance, lsuffix, rsuffix, distance_col)
1985 def sjoin_nearest(
1986 self,
1987 right,
(...)
1992 distance_col=None,
1993 ):
1994 """
1995 Spatial join of two GeoDataFrames based on the distance between their
1996 geometries.
(...)
2093 dimension is not taken into account.
2094 """
-> 2095 return geopandas.sjoin_nearest(
2096 self,
2097 right,
2098 how=how,
2099 max_distance=max_distance,
2100 lsuffix=lsuffix,
2101 rsuffix=rsuffix,
2102 distance_col=distance_col,
2103 )
File /usr/local/lib/python3.10/dist-packages/geopandas-0.10.2+79.g3abc6a7-py3.10.egg/geopandas/tools/sjoin.py:523, in sjoin_nearest(left_df, right_df, how, max_distance, lsuffix, rsuffix, distance_col)
519 right_df.geometry.values.check_geographic_crs(stacklevel=1)
521 return_distance = distance_col is not None
--> 523 join_df = _nearest_query(left_df, right_df, max_distance, how, return_distance)
525 if return_distance:
526 join_df = join_df.rename(columns={"distances": distance_col})
File /usr/local/lib/python3.10/dist-packages/geopandas-0.10.2+79.g3abc6a7-py3.10.egg/geopandas/tools/sjoin.py:364, in _nearest_query(left_df, right_df, max_distance, how, return_distance)
356 def _nearest_query(
357 left_df: GeoDataFrame,
358 right_df: GeoDataFrame,
(...)
361 return_distance: bool,
362 ):
363 if not (compat.PYGEOS_GE_010 and compat.USE_PYGEOS):
--> 364 raise NotImplementedError(
365 "Currently, only PyGEOS >= 0.10.0 supports `nearest_all`. "
366 + compat.INSTALL_PYGEOS_ERROR
367 )
368 # use the opposite of the join direction for the index
369 use_left_as_sindex = how == "right"
NotImplementedError: Currently, only PyGEOS >= 0.10.0 supports `nearest_all`. To use PyGEOS within GeoPandas, you need to install PyGEOS: 'conda install pygeos' or 'pip install pygeos'