备注

互动在线版: Binder badge

空间连接#

A 空间连接 用途 binary predicates 比如 intersectscrosses 将两个组合在一起 GeoDataFrames 基于它们几何图形之间的空间关系。

一个常见的用例可能是点图层和多边形图层之间的空间连接,其中您希望保留点几何图形并获取相交多边形的属性。

图解

空间连接的类型#

我们目前支持以下空间连接方法。我们是指 left_dfright_df 它们对应于作为args传入的两个数据帧。

左向外连接#

在左外部连接中 (how='left' ),我们保持 all 行,并在必要时复制它们,以表示两个数据帧之间的多个命中。如果它们相交并丢失不相交的右行,我们将保留右行的属性。Left Out Join意味着我们对保留左侧的几何图形感兴趣。

这相当于PostGIS查询:

SELECT pts.geom, pts.id as ptid, polys.id as polyid
FROM pts
LEFT OUTER JOIN polys
ON ST_Intersects(pts.geom, polys.geom);

                    geom                    | ptid | polyid
--------------------------------------------+------+--------
 010100000040A9FBF2D88AD03F349CD47D796CE9BF |    4 |     10
 010100000048EABE3CB622D8BFA8FBF2D88AA0E9BF |    3 |     10
 010100000048EABE3CB622D8BFA8FBF2D88AA0E9BF |    3 |     20
 0101000000F0D88AA0E1A4EEBF7052F7E5B115E9BF |    2 |     20
 0101000000818693BA2F8FF7BF4ADD97C75604E9BF |    1 |
(5 rows)

右外连接#

在右外部连接中 (how='right' ),我们保持 all 行,并在必要时复制它们,以表示两个数据帧之间的多个命中。如果左行相交并丢失不相交的左行,我们将保留它们的属性。右侧的外部连接意味着我们对保留右侧的几何图形感兴趣。

这相当于PostGIS查询:

SELECT polys.geom, pts.id as ptid, polys.id as polyid
FROM pts
RIGHT OUTER JOIN polys
ON ST_Intersects(pts.geom, polys.geom);

  geom    | ptid | polyid
----------+------+--------
 01...9BF |    4 |     10
 01...9BF |    3 |     10
 02...7BF |    3 |     20
 02...7BF |    2 |     20
 00...5BF |      |     30
(5 rows)

内连接#

在内部连接中 (how='inner' ),我们只在它们的二元谓词所在的位置保留从右到左的行 True 。如果有必要,我们会复制它们以表示两个数据帧之间的多个命中。只有当右和左的属性相交并丢弃所有不相交的行时,我们才保留它们的属性。内部联接意味着我们对保留左侧的几何图形感兴趣。

这相当于PostGIS查询:

SELECT pts.geom, pts.id as ptid, polys.id as polyid
FROM pts
INNER JOIN polys
ON ST_Intersects(pts.geom, polys.geom);

                    geom                    | ptid | polyid
--------------------------------------------+------+--------
 010100000040A9FBF2D88AD03F349CD47D796CE9BF |    4 |     10
 010100000048EABE3CB622D8BFA8FBF2D88AA0E9BF |    3 |     10
 010100000048EABE3CB622D8BFA8FBF2D88AA0E9BF |    3 |     20
 0101000000F0D88AA0E1A4EEBF7052F7E5B115E9BF |    2 |     20
(4 rows)

两个GeoDataFrame之间的空间连接#

让我们来看看我们将如何使用 GeoPandas 。首先,将NYC测试数据加载到 GeoDataFrames

[1]:
%matplotlib inline
from shapely.geometry import Point
from geopandas import datasets, GeoDataFrame, read_file

# NYC Boros
zippath = datasets.get_path('nybb')
polydf = read_file(zippath)

# Generate some points
b = [int(x) for x in polydf.total_bounds]
N = 8
pointdf = GeoDataFrame([
    {'geometry': Point(x, y), 'value1': x + y, 'value2': x - y}
    for x, y in zip(range(b[0], b[2], int((b[2] - b[0]) / N)),
                    range(b[1], b[3], int((b[3] - b[1]) / N)))])

# Make sure they're using the same projection reference
pointdf.crs = polydf.crs
[2]:
pointdf
[2]:
geometry value1 value2
0 POINT (913175.000 120121.000) 1033296 793054
1 POINT (932450.000 139211.000) 1071661 793239
2 POINT (951725.000 158301.000) 1110026 793424
3 POINT (971000.000 177391.000) 1148391 793609
4 POINT (990275.000 196481.000) 1186756 793794
5 POINT (1009550.000 215571.000) 1225121 793979
6 POINT (1028825.000 234661.000) 1263486 794164
7 POINT (1048100.000 253751.000) 1301851 794349
8 POINT (1067375.000 272841.000) 1340216 794534
[3]:
polydf
[3]:
BoroCode BoroName Shape_Leng Shape_Area geometry
0 5 Staten Island 330470.010332 1.623820e+09 MULTIPOLYGON (((970217.022 145643.332, 970227....
1 4 Queens 896344.047763 3.045213e+09 MULTIPOLYGON (((1029606.077 156073.814, 102957...
2 3 Brooklyn 741080.523166 1.937479e+09 MULTIPOLYGON (((1021176.479 151374.797, 102100...
3 1 Manhattan 359299.096471 6.364715e+08 MULTIPOLYGON (((981219.056 188655.316, 980940....
4 2 Bronx 464392.991824 1.186925e+09 MULTIPOLYGON (((1012821.806 229228.265, 101278...
[4]:
pointdf.plot()
[4]:
<AxesSubplot:>
../_images/gallery_spatial_joins_6_1.png
[5]:
polydf.plot()
[5]:
<AxesSubplot:>
../_images/gallery_spatial_joins_7_1.png

加入#

[6]:
join_left_df = pointdf.sjoin(polydf, how="left")
join_left_df
# Note the NaNs where the point did not intersect a boro
---------------------------------------------------------------------------
ImportError                               Traceback (most recent call last)
Input In [6], in <cell line: 1>()
----> 1 join_left_df = pointdf.sjoin(polydf, how="left")
      2 join_left_df

File /usr/local/lib/python3.10/dist-packages/geopandas-0.10.2+79.g3abc6a7-py3.10.egg/geopandas/geodataframe.py:1983, in GeoDataFrame.sjoin(self, df, *args, **kwargs)
   1905 def sjoin(self, df, *args, **kwargs):
   1906     """Spatial join of two GeoDataFrames.
   1907
   1908     See the User Guide page :doc:`../../user_guide/mergingdata` for details.
   (...)
   1981     sjoin : equivalent top-level function
   1982     """
-> 1983     return geopandas.sjoin(left_df=self, right_df=df, *args, **kwargs)

File /usr/local/lib/python3.10/dist-packages/geopandas-0.10.2+79.g3abc6a7-py3.10.egg/geopandas/tools/sjoin.py:124, in sjoin(left_df, right_df, how, predicate, lsuffix, rsuffix, **kwargs)
    120     raise TypeError(f"sjoin() got an unexpected keyword argument '{first}'")
    122 _basic_checks(left_df, right_df, how, lsuffix, rsuffix)
--> 124 indices = _geom_predicate_query(left_df, right_df, predicate)
    126 joined = _frame_join(indices, left_df, right_df, how, lsuffix, rsuffix)
    128 return joined

File /usr/local/lib/python3.10/dist-packages/geopandas-0.10.2+79.g3abc6a7-py3.10.egg/geopandas/tools/sjoin.py:216, in _geom_predicate_query(left_df, right_df, predicate)
    212         input_geoms = right_df.geometry
    213     else:
    214         # all other predicates are symmetric
    215         # keep them the same
--> 216         sindex = right_df.sindex
    217         input_geoms = left_df.geometry
    219 if sindex:

File /usr/local/lib/python3.10/dist-packages/geopandas-0.10.2+79.g3abc6a7-py3.10.egg/geopandas/base.py:2706, in GeoPandasBase.sindex(self)
   2655 @property
   2656 def sindex(self):
   2657     """Generate the spatial index
   2658
   2659     Creates R-tree spatial index based on ``pygeos.STRtree`` or
   (...)
   2704            [2]])
   2705     """
-> 2706     return self.geometry.values.sindex

File /usr/local/lib/python3.10/dist-packages/geopandas-0.10.2+79.g3abc6a7-py3.10.egg/geopandas/array.py:291, in GeometryArray.sindex(self)
    288 @property
    289 def sindex(self):
    290     if self._sindex is None:
--> 291         self._sindex = _get_sindex_class()(self.data)
    292     return self._sindex

File /usr/local/lib/python3.10/dist-packages/geopandas-0.10.2+79.g3abc6a7-py3.10.egg/geopandas/sindex.py:21, in _get_sindex_class()
     19 if compat.HAS_RTREE:
     20     return RTreeIndex
---> 21 raise ImportError(
     22     "Spatial indexes require either `rtree` or `pygeos`. "
     23     "See installation instructions at https://geopandas.org/install.html"
     24 )

ImportError: Spatial indexes require either `rtree` or `pygeos`. See installation instructions at https://geopandas.org/install.html
[7]:
join_right_df = pointdf.sjoin(polydf, how="right")
join_right_df
# Note Staten Island is repeated
---------------------------------------------------------------------------
ImportError                               Traceback (most recent call last)
Input In [7], in <cell line: 1>()
----> 1 join_right_df = pointdf.sjoin(polydf, how="right")
      2 join_right_df

File /usr/local/lib/python3.10/dist-packages/geopandas-0.10.2+79.g3abc6a7-py3.10.egg/geopandas/geodataframe.py:1983, in GeoDataFrame.sjoin(self, df, *args, **kwargs)
   1905 def sjoin(self, df, *args, **kwargs):
   1906     """Spatial join of two GeoDataFrames.
   1907
   1908     See the User Guide page :doc:`../../user_guide/mergingdata` for details.
   (...)
   1981     sjoin : equivalent top-level function
   1982     """
-> 1983     return geopandas.sjoin(left_df=self, right_df=df, *args, **kwargs)

File /usr/local/lib/python3.10/dist-packages/geopandas-0.10.2+79.g3abc6a7-py3.10.egg/geopandas/tools/sjoin.py:124, in sjoin(left_df, right_df, how, predicate, lsuffix, rsuffix, **kwargs)
    120     raise TypeError(f"sjoin() got an unexpected keyword argument '{first}'")
    122 _basic_checks(left_df, right_df, how, lsuffix, rsuffix)
--> 124 indices = _geom_predicate_query(left_df, right_df, predicate)
    126 joined = _frame_join(indices, left_df, right_df, how, lsuffix, rsuffix)
    128 return joined

File /usr/local/lib/python3.10/dist-packages/geopandas-0.10.2+79.g3abc6a7-py3.10.egg/geopandas/tools/sjoin.py:216, in _geom_predicate_query(left_df, right_df, predicate)
    212         input_geoms = right_df.geometry
    213     else:
    214         # all other predicates are symmetric
    215         # keep them the same
--> 216         sindex = right_df.sindex
    217         input_geoms = left_df.geometry
    219 if sindex:

File /usr/local/lib/python3.10/dist-packages/geopandas-0.10.2+79.g3abc6a7-py3.10.egg/geopandas/base.py:2706, in GeoPandasBase.sindex(self)
   2655 @property
   2656 def sindex(self):
   2657     """Generate the spatial index
   2658
   2659     Creates R-tree spatial index based on ``pygeos.STRtree`` or
   (...)
   2704            [2]])
   2705     """
-> 2706     return self.geometry.values.sindex

File /usr/local/lib/python3.10/dist-packages/geopandas-0.10.2+79.g3abc6a7-py3.10.egg/geopandas/array.py:291, in GeometryArray.sindex(self)
    288 @property
    289 def sindex(self):
    290     if self._sindex is None:
--> 291         self._sindex = _get_sindex_class()(self.data)
    292     return self._sindex

File /usr/local/lib/python3.10/dist-packages/geopandas-0.10.2+79.g3abc6a7-py3.10.egg/geopandas/sindex.py:21, in _get_sindex_class()
     19 if compat.HAS_RTREE:
     20     return RTreeIndex
---> 21 raise ImportError(
     22     "Spatial indexes require either `rtree` or `pygeos`. "
     23     "See installation instructions at https://geopandas.org/install.html"
     24 )

ImportError: Spatial indexes require either `rtree` or `pygeos`. See installation instructions at https://geopandas.org/install.html
[8]:
join_inner_df = pointdf.sjoin(polydf, how="inner")
join_inner_df
# Note the lack of NaNs; dropped anything that didn't intersect
---------------------------------------------------------------------------
ImportError                               Traceback (most recent call last)
Input In [8], in <cell line: 1>()
----> 1 join_inner_df = pointdf.sjoin(polydf, how="inner")
      2 join_inner_df

File /usr/local/lib/python3.10/dist-packages/geopandas-0.10.2+79.g3abc6a7-py3.10.egg/geopandas/geodataframe.py:1983, in GeoDataFrame.sjoin(self, df, *args, **kwargs)
   1905 def sjoin(self, df, *args, **kwargs):
   1906     """Spatial join of two GeoDataFrames.
   1907
   1908     See the User Guide page :doc:`../../user_guide/mergingdata` for details.
   (...)
   1981     sjoin : equivalent top-level function
   1982     """
-> 1983     return geopandas.sjoin(left_df=self, right_df=df, *args, **kwargs)

File /usr/local/lib/python3.10/dist-packages/geopandas-0.10.2+79.g3abc6a7-py3.10.egg/geopandas/tools/sjoin.py:124, in sjoin(left_df, right_df, how, predicate, lsuffix, rsuffix, **kwargs)
    120     raise TypeError(f"sjoin() got an unexpected keyword argument '{first}'")
    122 _basic_checks(left_df, right_df, how, lsuffix, rsuffix)
--> 124 indices = _geom_predicate_query(left_df, right_df, predicate)
    126 joined = _frame_join(indices, left_df, right_df, how, lsuffix, rsuffix)
    128 return joined

File /usr/local/lib/python3.10/dist-packages/geopandas-0.10.2+79.g3abc6a7-py3.10.egg/geopandas/tools/sjoin.py:216, in _geom_predicate_query(left_df, right_df, predicate)
    212         input_geoms = right_df.geometry
    213     else:
    214         # all other predicates are symmetric
    215         # keep them the same
--> 216         sindex = right_df.sindex
    217         input_geoms = left_df.geometry
    219 if sindex:

File /usr/local/lib/python3.10/dist-packages/geopandas-0.10.2+79.g3abc6a7-py3.10.egg/geopandas/base.py:2706, in GeoPandasBase.sindex(self)
   2655 @property
   2656 def sindex(self):
   2657     """Generate the spatial index
   2658
   2659     Creates R-tree spatial index based on ``pygeos.STRtree`` or
   (...)
   2704            [2]])
   2705     """
-> 2706     return self.geometry.values.sindex

File /usr/local/lib/python3.10/dist-packages/geopandas-0.10.2+79.g3abc6a7-py3.10.egg/geopandas/array.py:291, in GeometryArray.sindex(self)
    288 @property
    289 def sindex(self):
    290     if self._sindex is None:
--> 291         self._sindex = _get_sindex_class()(self.data)
    292     return self._sindex

File /usr/local/lib/python3.10/dist-packages/geopandas-0.10.2+79.g3abc6a7-py3.10.egg/geopandas/sindex.py:21, in _get_sindex_class()
     19 if compat.HAS_RTREE:
     20     return RTreeIndex
---> 21 raise ImportError(
     22     "Spatial indexes require either `rtree` or `pygeos`. "
     23     "See installation instructions at https://geopandas.org/install.html"
     24 )

ImportError: Spatial indexes require either `rtree` or `pygeos`. See installation instructions at https://geopandas.org/install.html

我们并不局限于使用 intersection 二元谓词。任何一种 Shapely 可以通过将返回布尔值的 op 科瓦格。

[9]:
pointdf.sjoin(polydf, how="left", predicate="within")
---------------------------------------------------------------------------
ImportError                               Traceback (most recent call last)
Input In [9], in <cell line: 1>()
----> 1 pointdf.sjoin(polydf, how="left", predicate="within")

File /usr/local/lib/python3.10/dist-packages/geopandas-0.10.2+79.g3abc6a7-py3.10.egg/geopandas/geodataframe.py:1983, in GeoDataFrame.sjoin(self, df, *args, **kwargs)
   1905 def sjoin(self, df, *args, **kwargs):
   1906     """Spatial join of two GeoDataFrames.
   1907
   1908     See the User Guide page :doc:`../../user_guide/mergingdata` for details.
   (...)
   1981     sjoin : equivalent top-level function
   1982     """
-> 1983     return geopandas.sjoin(left_df=self, right_df=df, *args, **kwargs)

File /usr/local/lib/python3.10/dist-packages/geopandas-0.10.2+79.g3abc6a7-py3.10.egg/geopandas/tools/sjoin.py:124, in sjoin(left_df, right_df, how, predicate, lsuffix, rsuffix, **kwargs)
    120     raise TypeError(f"sjoin() got an unexpected keyword argument '{first}'")
    122 _basic_checks(left_df, right_df, how, lsuffix, rsuffix)
--> 124 indices = _geom_predicate_query(left_df, right_df, predicate)
    126 joined = _frame_join(indices, left_df, right_df, how, lsuffix, rsuffix)
    128 return joined

File /usr/local/lib/python3.10/dist-packages/geopandas-0.10.2+79.g3abc6a7-py3.10.egg/geopandas/tools/sjoin.py:211, in _geom_predicate_query(left_df, right_df, predicate)
    206 if predicate == "within":
    207     # within is implemented as the inverse of contains
    208     # contains is a faster predicate
    209     # see discussion at https://github.com/geopandas/geopandas/pull/1421
    210     predicate = "contains"
--> 211     sindex = left_df.sindex
    212     input_geoms = right_df.geometry
    213 else:
    214     # all other predicates are symmetric
    215     # keep them the same

File /usr/local/lib/python3.10/dist-packages/geopandas-0.10.2+79.g3abc6a7-py3.10.egg/geopandas/base.py:2706, in GeoPandasBase.sindex(self)
   2655 @property
   2656 def sindex(self):
   2657     """Generate the spatial index
   2658
   2659     Creates R-tree spatial index based on ``pygeos.STRtree`` or
   (...)
   2704            [2]])
   2705     """
-> 2706     return self.geometry.values.sindex

File /usr/local/lib/python3.10/dist-packages/geopandas-0.10.2+79.g3abc6a7-py3.10.egg/geopandas/array.py:291, in GeometryArray.sindex(self)
    288 @property
    289 def sindex(self):
    290     if self._sindex is None:
--> 291         self._sindex = _get_sindex_class()(self.data)
    292     return self._sindex

File /usr/local/lib/python3.10/dist-packages/geopandas-0.10.2+79.g3abc6a7-py3.10.egg/geopandas/sindex.py:21, in _get_sindex_class()
     19 if compat.HAS_RTREE:
     20     return RTreeIndex
---> 21 raise ImportError(
     22     "Spatial indexes require either `rtree` or `pygeos`. "
     23     "See installation instructions at https://geopandas.org/install.html"
     24 )

ImportError: Spatial indexes require either `rtree` or `pygeos`. See installation instructions at https://geopandas.org/install.html

我们还可以与最近的邻居进行连接 sjoin_nearest

[10]:
pointdf.sjoin_nearest(polydf, how="left", distance_col="Distances")
# Note the optional Distances column with computed distances between each point
# and the nearest polydf geometry.
---------------------------------------------------------------------------
NotImplementedError                       Traceback (most recent call last)
Input In [10], in <cell line: 1>()
----> 1 pointdf.sjoin_nearest(polydf, how="left", distance_col="Distances")

File /usr/local/lib/python3.10/dist-packages/geopandas-0.10.2+79.g3abc6a7-py3.10.egg/geopandas/geodataframe.py:2095, in GeoDataFrame.sjoin_nearest(self, right, how, max_distance, lsuffix, rsuffix, distance_col)
   1985 def sjoin_nearest(
   1986     self,
   1987     right,
   (...)
   1992     distance_col=None,
   1993 ):
   1994     """
   1995     Spatial join of two GeoDataFrames based on the distance between their
   1996     geometries.
   (...)
   2093     dimension is not taken into account.
   2094     """
-> 2095     return geopandas.sjoin_nearest(
   2096         self,
   2097         right,
   2098         how=how,
   2099         max_distance=max_distance,
   2100         lsuffix=lsuffix,
   2101         rsuffix=rsuffix,
   2102         distance_col=distance_col,
   2103     )

File /usr/local/lib/python3.10/dist-packages/geopandas-0.10.2+79.g3abc6a7-py3.10.egg/geopandas/tools/sjoin.py:523, in sjoin_nearest(left_df, right_df, how, max_distance, lsuffix, rsuffix, distance_col)
    519 right_df.geometry.values.check_geographic_crs(stacklevel=1)
    521 return_distance = distance_col is not None
--> 523 join_df = _nearest_query(left_df, right_df, max_distance, how, return_distance)
    525 if return_distance:
    526     join_df = join_df.rename(columns={"distances": distance_col})

File /usr/local/lib/python3.10/dist-packages/geopandas-0.10.2+79.g3abc6a7-py3.10.egg/geopandas/tools/sjoin.py:364, in _nearest_query(left_df, right_df, max_distance, how, return_distance)
    356 def _nearest_query(
    357     left_df: GeoDataFrame,
    358     right_df: GeoDataFrame,
   (...)
    361     return_distance: bool,
    362 ):
    363     if not (compat.PYGEOS_GE_010 and compat.USE_PYGEOS):
--> 364         raise NotImplementedError(
    365             "Currently, only PyGEOS >= 0.10.0 supports `nearest_all`. "
    366             + compat.INSTALL_PYGEOS_ERROR
    367         )
    368     # use the opposite of the join direction for the index
    369     use_left_as_sindex = how == "right"

NotImplementedError: Currently, only PyGEOS >= 0.10.0 supports `nearest_all`. To use PyGEOS within GeoPandas, you need to install PyGEOS: 'conda install pygeos' or 'pip install pygeos'