15.7. GeoMesa作业¶

这个项目 (geomesa-accumulo/geomesa-accumulo-jobs 在源代码分发版中)包含用于维护GeoMesa Acumulo的Map-Reducer作业。

15.7.1. 建筑说明书¶

如果您希望构建 geomesa-accumulo-jobs 另外，您可以使用Maven：

$ mvn clean install -pl geomesa-accumulo/geomesa-accumulo-jobs -am

15.7.2. GeoMesa输入和输出格式¶

GeoMesa提供了可用于Hadoop map/Reduce作业的输入和输出格式。输入/输出格式可以直接在Scala中使用，或者在 interop 包裹。

提供了可用作更复杂操作的模板的示例作业。这些是：

org.locationtech.geomesa.accumulo.jobs.mapreduce.interop.FeatureCountJob
org.locationtech.geomesa.accumulo.jobs.mapreduce.interop.FeatureWriterJob

15.7.2.1. GeoMesaAccumuloInputFormat¶

这个 GeoMesaAccumuloInputFormat 可以用来获得 SimpleFeature S直接从GeoMesa进入你们的工作岗位。

使用静电 configure 方法来设置您的作业。您需要为其提供连接参数的映射，该映射将用于检索GeoTools数据存储。您还需要提供要素类型名称。或者，您可以提供CQL过滤器，该过滤器将用于选择商店中的要素子集。

为您的映射器提供的密钥为 Text 与 SimpleFeature ID。该值将是 SimpleFeature 。

15.7.2.2. GeoMesaOutputFormat¶

这个 GeoMesaOutputFormat 可以用来写 SimpleFeature S回到地理台地。

使用静电 setOutput 方法来设置您的作业。您需要为其提供连接参数地图，该地图将用于检索GeoTool DataStore 。

您输出的密钥无关紧要，将被忽略。该值应为 SimpleFeature 你想要写的东西。如果 SimpleFeatureType 与 SimpleFeature 在GeoMesa中尚不存在，我们将为您创建它。您可以写下不同的内容 SimpleFeatureType S，如果需要的话，在一份工作中。

15.7.3. 映射/减少工作¶

以下说明要求您使用 -libjars 参数以确保在分布式类路径上有正确的JAR可用。

备注

在下面的示例中，替换 ${VERSION} 使用适当的Scala Plus GeoMesa版本(例如 2.12-4.0.2 )。

15.7.3.1. 属性索引¶

GeoMesa提供对属性的索引，以改进某些查询。您可以在创建模式(简单要素类型)时指定应编制索引的属性。如果您稍后决定要为其他属性编制索引，则可以使用属性索引作业。您只需运行此作业一次；该作业将为中列出的每个属性创建属性索引 --geomesa.index.attributes 。

可以通过Yarn调用该作业，如下所示：

geomesa-accumulo$ yarn jar geomesa-accumulo-jobs/target/geomesa-accumulo-jobs_${VERSION}.jar \
    org.locationtech.geomesa.accumulo.jobs.index.AttributeIndexJob \
    --geomesa.input.instanceId <instance> \
    --geomesa.input.zookeepers <zookeepers> \
    --geomesa.input.user <user> \
    --geomesa.input.password <pwd> \
    --geomesa.input.tableName <catalog-table> \
    --geomesa.input.feature <feature> \
    --geomesa.index.coverage <full|join> \ # optional attribute
    --geomesa.index.attributes <attributes to index - space separated>

备注

您还需要包括广泛的 -libjars 与所有依赖的JAR进行参数。

15.7.3.2. 将现有数据更新为最新的索引格式¶

GeoMesa中的索引正在不断改进。我们努力保持向后兼容性，但旧数据不能总是利用我们所做的改进。但是，旧数据可以通过 SchemaCopyJob 。这会将其复制到新的表(或特性名称)中，并使用最新的代码库重写所有数据。数据更新后，您可以删除旧表并将新表重命名回原始名称。

可以通过Yarn调用该作业，如下所示(JAR版本可能略有不同)：

geomesa-accumulo$ yarn jar geomesa-accumulo-jobs/target/geomesa-accumulo-jobs_${VERSION}.jar \
    org.locationtech.geomesa.accumulo.jobs.index.SchemaCopyJob \
    --geomesa.input.instanceId <instance> \
    --geomesa.output.instanceId <instance> \
    --geomesa.input.zookeepers <zookeepers> \
    --geomesa.output.zookeepers <zookeepers> \
    --geomesa.input.user <user> \
    --geomesa.output.user <user> \
    --geomesa.input.password <pwd> \
    --geomesa.output.password <pwd> \
    --geomesa.input.tableName <catalog-table> \
    --geomesa.output.tableName <new-catalog-table> \
    --geomesa.input.feature <feature> \
    --geomesa.output.feature <feature> \
    --geomesa.input.cql <options cql filter for input features>

备注

您还需要包括广泛的 -libjars 与所有依赖的JAR进行参数。