A 空间连接 用途 binary predicates 比如 intersectscrosses 将两个组合在一起 GeoDataFrames 基于它们几何图形之间的空间关系。




我们目前支持以下空间连接方法。我们是指 left_dfright_df 它们对应于作为args传入的两个数据帧。


在左外部连接中 (how='left' ),我们保持 all 行,并在必要时复制它们,以表示两个数据帧之间的多个命中。如果它们相交并丢失不相交的右行,我们将保留右行的属性。Left Out Join意味着我们对保留左侧的几何图形感兴趣。


SELECT pts.geom, pts.id as ptid, polys.id as polyid
FROM pts
ON ST_Intersects(pts.geom, polys.geom);

                    geom                    | ptid | polyid
 010100000040A9FBF2D88AD03F349CD47D796CE9BF |    4 |     10
 010100000048EABE3CB622D8BFA8FBF2D88AA0E9BF |    3 |     10
 010100000048EABE3CB622D8BFA8FBF2D88AA0E9BF |    3 |     20
 0101000000F0D88AA0E1A4EEBF7052F7E5B115E9BF |    2 |     20
 0101000000818693BA2F8FF7BF4ADD97C75604E9BF |    1 |
(5 rows)


在右外部连接中 (how='right' ),我们保持 all 行,并在必要时复制它们,以表示两个数据帧之间的多个命中。如果左行相交并丢失不相交的左行,我们将保留它们的属性。右侧的外部连接意味着我们对保留右侧的几何图形感兴趣。


SELECT polys.geom, pts.id as ptid, polys.id as polyid
FROM pts
ON ST_Intersects(pts.geom, polys.geom);

  geom    | ptid | polyid
 01...9BF |    4 |     10
 01...9BF |    3 |     10
 02...7BF |    3 |     20
 02...7BF |    2 |     20
 00...5BF |      |     30
(5 rows)


在内部连接中 (how='inner' ),我们只在它们的二元谓词所在的位置保留从右到左的行 True 。如果有必要,我们会复制它们以表示两个数据帧之间的多个命中。只有当右和左的属性相交并丢弃所有不相交的行时,我们才保留它们的属性。内部联接意味着我们对保留左侧的几何图形感兴趣。


SELECT pts.geom, pts.id as ptid, polys.id as polyid
FROM pts
ON ST_Intersects(pts.geom, polys.geom);

                    geom                    | ptid | polyid
 010100000040A9FBF2D88AD03F349CD47D796CE9BF |    4 |     10
 010100000048EABE3CB622D8BFA8FBF2D88AA0E9BF |    3 |     10
 010100000048EABE3CB622D8BFA8FBF2D88AA0E9BF |    3 |     20
 0101000000F0D88AA0E1A4EEBF7052F7E5B115E9BF |    2 |     20
(4 rows)


让我们来看看我们将如何使用 GeoPandas 。首先,将NYC测试数据加载到 GeoDataFrames

%matplotlib inline
from shapely.geometry import Point
from geopandas import datasets, GeoDataFrame, read_file

# NYC Boros
zippath = datasets.get_path('nybb')
polydf = read_file(zippath)

# Generate some points
b = [int(x) for x in polydf.total_bounds]
N = 8
pointdf = GeoDataFrame([
    {'geometry': Point(x, y), 'value1': x + y, 'value2': x - y}
    for x, y in zip(range(b[0], b[2], int((b[2] - b[0]) / N)),
                    range(b[1], b[3], int((b[3] - b[1]) / N)))])

# Make sure they're using the same projection reference
pointdf.crs = polydf.crs
geometry value1 value2
0 POINT (913175.000 120121.000) 1033296 793054
1 POINT (932450.000 139211.000) 1071661 793239
2 POINT (951725.000 158301.000) 1110026 793424
3 POINT (971000.000 177391.000) 1148391 793609
4 POINT (990275.000 196481.000) 1186756 793794
5 POINT (1009550.000 215571.000) 1225121 793979
6 POINT (1028825.000 234661.000) 1263486 794164
7 POINT (1048100.000 253751.000) 1301851 794349
8 POINT (1067375.000 272841.000) 1340216 794534
BoroCode BoroName Shape_Leng Shape_Area geometry
0 5 Staten Island 330470.010332 1.623820e+09 MULTIPOLYGON (((970217.022 145643.332, 970227....
1 4 Queens 896344.047763 3.045213e+09 MULTIPOLYGON (((1029606.077 156073.814, 102957...
2 3 Brooklyn 741080.523166 1.937479e+09 MULTIPOLYGON (((1021176.479 151374.797, 102100...
3 1 Manhattan 359299.096471 6.364715e+08 MULTIPOLYGON (((981219.056 188655.316, 980940....
4 2 Bronx 464392.991824 1.186925e+09 MULTIPOLYGON (((1012821.806 229228.265, 101278...


join_left_df = pointdf.sjoin(polydf, how="left")
# Note the NaNs where the point did not intersect a boro
join_right_df = pointdf.sjoin(polydf, how="right")
# Note Staten Island is repeated
join_inner_df = pointdf.sjoin(polydf, how="inner")
# Note the lack of NaNs; dropped anything that didn't intersect
我们并不局限于使用 intersection 二元谓词。任何一种 Shapely 可以通过将返回布尔值的 op 科瓦格。

pointdf.sjoin(polydf, how="left", predicate="within")
我们还可以与最近的邻居进行连接 sjoin_nearest

pointdf.sjoin_nearest(polydf, how="left", distance_col="Distances")
# Note the optional Distances column with computed distances between each point
# and the nearest polydf geometry.
