Elasticsearch:用于Elasticsearch的地理编码对象
司机简称
弹性搜索
生成依赖项
利勃曲尔
Elasticsearch 是用于各种数据源的企业级搜索引擎。它支持使用预定义的REST API在快速高效的manor中对这些数据进行全文索引和地理空间查询。
驱动程序功能
Supports Create()
This driver supports the GDALDriver::Create()
operation
Supports Georeferencing
This driver supports georeferencing
打开数据集名称语法
从gdal2.1开始,驱动程序支持从Elasticsearch主机读取现有索引。打开数据集有两种可能的主要语法:
使用 ES:http://hostname:port (端口通常为9200)
使用 ES: 使用“打开”选项指定主机和端口
Layer open options
HOST =主机名:服务器主机名。默认为本地主机。
PORT =端口。服务器端口。默认为9200。
USERPWD =用户:密码。(GDAL>=2.4)作为用户名的基本身份验证:密码。
LAYER =姓名。(GDAL>=2.4)用于限制层列表的索引名或索引映射。
BATCH_SIZE =数字。每批要检索的功能数。默认值为100。
FEATURE_COUNT_TO_ESTABLISH_FEATURE_DEFN =数字。要检索以建立特征定义的特征数。-1=无限制。默认为100。
SINGLE_QUERY_TIMEOUT =数字。(GDAL>=3.2.1)请求(如GetFeatureCount()或GetExtent())的超时(以秒为单位)。默认为无限制。
SINGLE_QUERY_TERMINATE_AFTER =数字。(GDAL>=3.2.1)为请求(如GetFeatureCount()或GetExtent())收集的最大文档数。默认为无限制。
FEATURE_ITERATION_TIMEOUT =数字。(GDAL>=3.2.1)从ResetReading()开始的功能迭代超时(以秒为单位)。默认为无限制。
FEATURE_ITERATION_TERMINATE_AFTER =数字。(GDAL>=3.2.1)为特征迭代收集的最大文档数。默认为无限制。
JSON_FIELD =是/否。是否将名为“_json”的字段作为json包含在完整文档中。默认为否。
FLATTEN_NESTED_ATTRIBUTE =是/否。是否递归地探索嵌套对象并生成扁平OGR属性。默认为“是”。
FID =字符串。要用作FID的带整数值的字段名。默认为“ogc_fid”
FORWARD_HTTP_HEADERS_FROM_ENV =字符串。(GDAL>=3.1)可用于指定必须传递给Elasticsearch的HTTP头(通常用于身份验证目的)。string的值是以逗号分隔的httpu headeru name=envu variableu name列表,其中httpu headeru name是http头的名称,envu variableu name是应从中检索http头的值的环境变量/配置选项的名称。这适用于从将传入请求的HTTP头存储到环境变量的web服务器调用ogrelasticsearch驱动程序的用例。也可以使用ES_FORWARD_HTTP_HEADERS_FROM_ENV configuration选项。
AGGREGATION= string (GDAL >= 3.5). JSON-serialized definition of an aggregation.
Elasticsearch与OGR概念
Elasticsearch索引中的每个映射类型都将被视为OGR层。Elasticsearch文档被视为OGR功能。
字段定义
字段从输入OGR数据源动态映射。但是,驱动程序将利用Elasticsearch中的高级选项,如 field mapping file .
映射文件允许您根据 Elasticsearch field-specific types . 有许多选项可供选择,但是,大多数功能都基于您可以在Elasticsearch中对文本字段执行的所有不同操作。
ogr2ogr -progress --config ES_WRITEMAP /path/to/file/map.txt -f "Elasticsearch" http://localhost:9200 my_shapefile.shp
几何图形类型
在GDAL 2.0和更早版本中,驱动程序处理的几何图形受到限制:即使多边形作为输入提供,它们也存储为 geo point 多边形的“中心”用作点的值。从GDAL 2.1开始, geo_shape 用于存储所有几何体类型(曲线几何体除外,这些曲线几何体不由Elasticsearch处理,并且将近似于其线性等价物)。
过滤
驱动程序将使用SetSpatialFilter()将任何空间筛选器集转发到服务器。
从GDAL 2.2开始,使用SetAttributeFilter()设置的SQL属性过滤器将转换为 Elasticsearch filter syntax . 它们将与潜在定义的空间过滤器相结合。
通过将传递给SetAttributeFilter()的字符串设置为JSon序列化对象,也可以直接使用Elasticsearch筛选器,例如。
{ "post_filter": { "term": { "properties.EAS_ID": 169 } } }
注意:如果直接定义弹性搜索JSon筛选器,则通过SetSpatialFilter()指定的空间筛选器将被忽略,因此必须包含在JSon筛选器中(如果需要)。
分页
功能是从服务器中以100块为单位检索的。这可以通过“批量打开”选项进行更改。
图式
当读取弹性搜索索引/类型时,OGR必须建立属性和几何字段的模式,因为OGR有一个固定的模式概念。
在一般情况下,OGR将读取索引/类型的映射定义和前100个文档(可以使用FEATURE_COUNT_TO_ESTABLISH_FEATURE_DEFN open选项进行更改),并构建最适合找到的字段和值的架构。
还可以设置JSON_FIELD=YES open选项,以便在OGR模式中添加一个_JSON特殊字段。当将弹性搜索文档作为OGR特性读取时,文档的完整JSon版本将存储在ùJSon字段中。对于复杂文档或在OGR数据类型中转换不好的数据类型,这可能很有用。在创建/更新文档时,如果存在并设置了\u json字段,则将直接使用其内容(其他字段将被忽略)。
功能ID
弹性搜索有一个特殊的id字段,其中包含文档的唯一id。此字段作为OGR字段返回,但不能用作OGR special FeatureID字段,该字段必须为整数类型。默认情况下,OGR将尝试读取潜在的“ogc_fid”字段以设置OGR FeatureID。可以使用FID open选项设置要查找的此字段的名称。如果找不到该字段,OGR返回的FID将是从1开始的序列号,但根本不能保证它是稳定的。
ExecuteSQL()接口
从gdal2.2开始,包含WHERE和orderby语句的单层SQL请求将被转换为Elasticsearch查询。
否则,如果将“ES”指定为ExecuteSQL()的方言,则会有一个JSon字符串 Elastic Search filter 可以通过。搜索将在所有索引和类型上完成,除非筛选器本身限制搜索。返回的层将是FEATURE_COUNT_TO_ESTABLISH_FEATURE_deff first documents返回的类型的联合。它还将包含“索引”和“类型”特殊字段,以指示特征的来源。
以下过滤器可用于将搜索限制为“poly”索引及其“FeatureCollection”类型映射(Elasticsearch 1.X和2.X)
{ "filter": {
"indices" : {
"no_match_filter": "none",
"index": "poly",
"filter": {
"and" : [
{ "type": { "value": "FeatureCollection" } },
{ "term" : { "properties.EAS_ID" : 158.0 } }
]
}
}
}
}
对于Elasticsearch 5.X(也适用于2.X):
{ "post_filter": {
"indices" : {
"no_match_query": "none",
"index": "poly",
"query": {
"bool": {
"must" : [
{ "type": { "value": "FeatureCollection" } },
{ "term" : { "properties.EAS_ID" : 158.0 } }
]
}
}
}
}
}
Aggregations are not supported through the ExecuteSQL() interface, but through the below described mechanism.
Aggregations
3.5.0 新版功能.
The driver can support issuing aggregation requests to an index. ElasticSearch
aggregations can potentially be rather complex, so the driver currently limits
to geohash grid based spatial aggegrations, with additional fields with
statistical indicators (min, max, average, .), which can be used for example
to generate heatmaps. The specification of the aggegation is done through
the AGGREGATION
open option, whose value is a JSON serialized object whose
members are:
index
(required): the name of the index to query.geometry_field
(optional): the path to the geometry field on which to do geohash grid aggregation. For documents with points encoded as GeoJSON, this will be for example geometry.coordinates. When this property is not specified, the driver will analyze the mapping and use the geometry field definition found into it (provided there is a single one). Note that aggegration on geo_shape geometries is only supported since Elasticsearch 7 and may require a non-free license.geohash_grid
(optional): a JSon object, describing a few characteristics of the geohash_grid, that can have the following members:size
(optional): maximum number of geohash buckets to return per query. The default is 10,000. Ifprecision
is specified and the number of results would exceedsize
, then the server will trim the results, by sorting by decreasing number of documents matched.precision
(optional): string length of the geohashes used to define cells/buckets in the results, in the [1,12] range. A geohash of size 1 can generate up to 32 buckets, of size 2 up to 32*32 buckets, etc. When it is not specified, the driver will automatically compute a value, taking into account thesize
parameter and the spatial filter, so that the theoretical number of buckets returned does not exceedsize
.
fields
(optional): a JSon object, describing which additional statistical fields should be added, that can have the following members:min
(optional): array with the paths to index properties on which to compute the minimum during aggegation.max
(optional): array with the paths to index properties on which to compute the maximum during aggegation.avg
(optional): array with the paths to index properties on which to compute the average during aggegation.sum
(optional): array with the paths to index properties on which to compute the sum during aggegation.count
(optional): array with the paths to index properties on which to compute the value_count during aggegation.stats
(optional): array with the paths to index properties on which to compute all the above indicators during aggegation.
When using a GeoJSON mapping, the path to an index property is typically
property.some_name
.
When specifying the AGGREGATION open option, a single read-only layer called
aggregation
will be returned. A spatial filter can be set on it using the
standard OGR SetSpatialFilter() API: it is applied prior to aggregation.
An example of a potential value for the AGGREGATION open option can be:
{
"index": "my_points",
"geometry_field": "geometry.coordinates",
"geohash_grid": {
"size": 1000,
"precision": 3
},
"fields": {
"min": [ "field_a", "field_b"],
"stats": [ "field_c" ]
}
}
It will return a layer with a Point geometry field and the following fields:
key
of type String: the value of the geohash of the corresponding bucketdoc_count
of type Integer64: the number of matching documents in the bucketfield_a_min
of type Realfield_b_min
of type Realfield_c_min
of type Realfield_c_max
of type Realfield_c_avg
of type Realfield_c_sum
of type Realfield_c_count
of type Integer64
Multi-target layers
3.5.0 新版功能.
The GetLayerByName() method accepts a layer name that can be a comma-separated list of indices, potentially combined with the '*' wildcard character. See https://www.elastic.co/guide/en/elasticsearch/reference/current/multi-index.html. Note that in the current implementation, the field definition will be established from the one of the matching layers, but not all, so using this functionality will be appropriate when the multiple matching layers share the same schema.
获取元数据
获取特征计数是有效的。
仅在映射到Elasticsearch类型geo_point的几何体列上获取范围是有效的。在几何图形字段上,对整个图层进行特征检索,这可能会比较慢。
写入支持
可以创建和删除索引/类型。
只有在更新模式下打开数据源时才启用写支持。
在非批量模式下使用CreateFeature()插入新功能时,如果命令成功,OGR将获取返回的\u id并将其用于SetFeature()操作。
空间参考系
弹性搜索中存储的几何图形应作为WGS84基准上的经纬度参考(EPSG:4326)。创建时,驱动程序将自动从层(或几何字段)SRS重新投影到EPSG:4326,前提是已设置了输入SRS,且该SRS尚未设置为EPSG:4326。
图层创建选项
从GDAL 2.1开始,驱动程序支持以下层创建选项:
INDEX_NAME =姓名。要创建(或重用)的索引的名称。默认情况下,索引名是图层名。
INDEX_DEFINITION =文件名或JSon。(GDAL>=2.4)从中读取用户定义索引定义或内联索引定义(序列化JSon)的文件名。
MAPPING_NAME= =名称。(Elasticsearch<7)索引中映射类型的名称。默认情况下,映射名称为“FeatureCollection”,文档将作为GeoJSON功能对象编写。如果选择了另一个映射名称,则将使用更“扁平”的结构。转换为Elasticsearch>=7时忽略此选项(请参阅 Removal of mapping types ). 对于Elasticsearch 7或更高版本,始终使用“扁平”结构。
MAPPING =文件名或JSon。从中读取用户定义映射或序列化JSon映射的文件名。
WRITE_MAPPING =文件名。创建一个映射文件,用户可以在插入到索引之前对其进行修改。不会编写任何功能。此选项不适用于映射。
OVERWRITE =是/否。是否用要创建的图层名覆盖现有类型映射。默认为否。
OVERWRITE_INDEX =是/否(GDAL>=2.2)是否覆盖层所属的整个索引。默认为“否”。此选项比“覆盖”强。只有当层对应的类型映射是索引的单一类型映射时,才会进行覆盖。如果存在多个类型映射,则需要销毁整个索引(销毁映射和使用它的文档是不安全的,因为它们可能被其他映射使用)。这在Elasticsearch 1.X中是可能的,但在以后的版本中不再如此)。
GEOMETRY_NAME =姓名。几何列的名称。默认为“几何体”。
GEOM_MAPPING_TYPE =自动/地理点/地理形状。几何字段的映射类型。默认为自动。GEO_POINT使用 geo_point 映射类型。如果使用,则使用几何体的“质心”。这是GDAL<2.1的行为。几何图形使用 geo_shape 映射类型,兼容所有几何体类型。使用“自动”时,对于类型为“点”的几何体字段,将使用地理点。在其他情况下,使用geo_形状。
GEO_SHAPE_ENCODING =GeoJSON/WKT。(GDAL>=3.2.1)地理形状几何字段的编码。默认为GeoJSON。由于Elasticsearch 6.2,WKT是可能的
GEOM_PRECISION={value}{unit}'. Desired geometry precision. Number followed by unit. For example 1m. For a geo_point geometry field, this causes a compressed geometry format to be used. This option is without effect if MAPPING is specified.
STORE_FIELDS =是/否。字段是否应存储在索引中。设置为“是”将设置 "store" property 所有字段的字段映射为“true”。默认为NO(注意:在gdal2.1之前,默认行为是存储字段),如果指定了映射,则此选项无效。
STORED_FIELDS =应存储在索引中的逗号分隔字段名列表。那些田地 "store" property 字段映射设置为“true”。如果必须存储所有字段,则使用STORE_fields=YES是一个快捷方式。如果指定了映射,则此选项无效。
NOT_ANALYZED_FIELDS=List of comma separated field names that should not be analyzed during indexing. Those fields will have their "index" property of the field mapping set to "not_analyzed" (the default in Elasticsearch is "analyzed"). A same field should not be specified both in NOT_ANALYZED_FIELDS and NOT_INDEXED_FIELDS. Starting with GDAL 2.2, the {ALL} value can be used to designate all fields. This option is without effect if MAPPING is specified.
NOT_INDEXED_FIELDS =不应编制索引的逗号分隔字段名列表。那些田地 "index" property 字段映射设置为“no”(Elasticsearch中的默认值为“analysed”)。不应在未分析的u字段和未索引的u字段中指定相同的字段。如果指定了映射,则此选项无效。
FIELDS_WITH_RAW_VALUE=(GDAL > 2.2) List of comma separated field names (of type string) that should be created with an additional raw/not_analyzed sub-field, or {ALL} to designate all string analyzed fields. This is needed for sorting on those columns, and can improve performance when filtering with SQL operators. This option is without effect if MAPPING is specified.
BULK_INSERT =是/否。是否使用批量插入创建特征。默认为“是”。
BULK_SIZE =值。大容量上载的缓冲区大小(字节)。默认为1000000(100万)。
FID =字符串。要用作FID的带整数值的字段名。可以设置为空以禁用写入FID值。默认为“ogc_fid”
DOT_AS_NESTED_FIELD =是/否。是否将字段名中的点字符视为子文档。默认为“是”。
IGNORE_SOURCE_ID =是/否。是否忽略传递给CreateFeature()的功能中的“id”字段。默认为否。
配置选项
The following (deprecated) configuration options are available. Starting with GDAL 2.1, layer creation options are also available and should be preferred (see above):
ES_WRITEMAP
=/path/to/mapfile.txt. Creates a mapping file that can be modified by the user prior to insert in to the index. No feature will be written. Note that this will properly work only if only one single layer is created. Starting with GDAL 2.1, the WRITE_MAPPING layer creation option should rather be used.ES_META
=/path/to/mapfile.txt. Tells the driver to the user-defined field mappings. Starting with GDAL 2.1, the MAPPING layer creation option should rather be used.ES_BULK
=5000000. Identifies the maximum size in bytes of the buffer to store documents to be inserted at a time. Lower record counts help with memory consumption within Elasticsearch but take longer to insert. Starting with GDAL 2.1, the BULK_SIZE layer creation option should rather be used.ES_OVERWRITE
=1. Overwrites the current index by deleting an existing one. Starting with GDAL 2.1, the OVERWRITE layer creation option should rather be used.
实例
打开本地商店:
ogrinfo ES:
打开远程存储:
ogrinfo ES:http://example.com:9200
在弹性搜索字段上过滤:
ogrinfo -ro ES: my_type -where '{ "post_filter": { "term": { "properties.EAS_ID": 168 } } }'
Using "match" query on Windows: On Windows the query must be between double quotes and double quotes inside the query must be escaped.
C:\GDAL_on_Windows>ogrinfo ES: my_type -where "{\"query\": { \"match\": { \"properties.NAME\": \"Helsinki\" } } }"
Basic aggregation:
ogrinfo -ro ES: my_type -oo "AGGREGATION={\"index\":\"my_points\"}"
用形状文件加载Elasticsearch索引:
ogr2ogr -f "Elasticsearch" http://localhost:9200 my_shapefile.shp
创建映射文件: 映射文件允许您根据 Elasticsearch field-specific types . 有许多选项可供选择,但是,大多数功能都基于您可以对文本字段执行的所有不同操作。
ogr2ogr -progress --config ES_WRITEMAP /path/to/file/map.txt -f "Elasticsearch" http://localhost:9200 my_shapefile.shp
或(GDAL>=2.1):
ogr2ogr -progress -lco WRITE_MAPPING=/path/to/file/map.txt -f "Elasticsearch" http://localhost:9200 my_shapefile.shp
读取映射文件: 在转换期间读取映射文件
ogr2ogr -progress --config ES_META /path/to/file/map.txt -f "Elasticsearch" http://localhost:9200 my_shapefile.shp
或(GDAL>=2.1):
ogr2ogr -progress -lco MAPPING=/path/to/file/map.txt -f "Elasticsearch" http://localhost:9200 my_shapefile.shp
批量上载(对于较大的数据集): 批量加载有助于上载大量数据。整数值是插入前收集的字节数。 Bulk size considerations
ogr2ogr -progress --config ES_BULK 5000000 -f "Elasticsearch" http://localhost:9200 PG:"host=localhost user=postgres dbname=my_db password=password" "my_table" -nln thetable
或(GDAL>=2.1):
ogr2ogr -progress -lco BULK_SIZE=5000000 -f "Elasticsearch" http://localhost:9200 my_shapefile.shp
覆盖当前索引: 如果指定,这将覆盖当前索引。否则,将追加数据。
ogr2ogr -progress --config ES_OVERWRITE 1 -f "Elasticsearch" http://localhost:9200 PG:"host=localhost user=postgres dbname=my_db password=password" "my_table" -nln thetable
或(GDAL>=2.1):
ogr2ogr -progress -overwrite ES:http://localhost:9200 PG:"host=localhost user=postgres dbname=my_db password=password" "my_table" -nln thetable