监视:通知

通知是在每个数据收集周期之后运行的监视的一部分。它的可配置机制可以检查指标值是否在允许的值范围内,如果不在允许的值范围内,则向指定的接收者(注册用户或外部电子邮件)发送通知。

数据模型

通知机制由几个类组成,负责不同的方面:

  • 高级配置: NotificationCheck

    保存一般描述、度量检查定义的列表、发送宽限期配置和上次发送标记、应该向其递送通知的用户的列表(在帮助表中, NotificationReceiver 类)。

  • 按指标定义: MetricNotificationDefinition

    保持每个指标每次检查的配置:指标名称、用户允许的最小值、最大值、检查类型(如果值应低于或高于给定阈值,或者上次读取的时间不应早于指标检查的特定时间段)、检查的附加范围(资源、标签、OWS服务-此部分已部分实现)。定义对象创建自 NotificationCheck.user_tresholds 数据,用于生成验证表。注意,就是那个 NotificationCheck 对于一组不同的指标,可以有多个定义项。定义行在以下情况下创建 NotificationCheck 已创建或更新。

  • 每度量检查配置: MetricNotificationCheck

    保持每个指标每次检查的配置:指标和阈值。在用户提交具体通知的配置单后创建。

工作流程

在收集脚本中的每个收集/处理周期之后检查通知,方法是调用 CollectorAPI.emit_notifications(for_timestamp) 。这将执行以下操作:

  • 获取所有通知,

  • 对于每个通知,将获得所有通知检查

  • 对于每个通知检查,它将获得给定时间戳的有效度量,并检查值是否与给定条件匹配

  • 每次检查都会引发异常,该异常将在调用方中捕获,并且对于每个通知,将返回错误列表

  • 根据通知和错误列表,将生成警报并将其发送给用户,除非最后一次发送是在宽限期结束之前。

此外,通知还公开 /monitoring/api/status/ 状态接口 ,它将显示在请求时刻检测到的错误。

Web API

状态接口

状态端点显示通知执行的错误检查的当前状态。前端可以定期向此端点发出请求。目前没有状态的历史视图。状态响应用标准响应信封包装。无错误响应将具有 status 关键点设置为 oksuccesstrue ,否则 errors 不会是空的。

无错误响应:

GET /monitoring/api/status/

{
    "status": "ok",
    "data": [],
    "success": true
}

报告错误的响应:

{
  "status": "ok",
  "data": [
    {
      "problems": [
        {
          "threshold_value": "2017-08-29T10:45:26.142",
          "message": "Value collected too far in the past",
          "name": "request.count",
          "severity": "warning",
          "offending_value": "2017-08-25T16:41:00"
        }
      ],
      "check": {
        "grace_period": {
          "seconds": 600,
          "class": "datetime.timedelta"
        },
        "last_send": null,
        "description": "detects when requests are not handled",
        "severity": "warning",
        "user_threshold": {
          "3": {
            "max": 10,
            "metric": "request.count",
            "steps": null,
            "description": "Number of handled requests is lower than",
            "min": 0
          },
          "4": {
            "max": null,
            "metric": "request.count",
            "steps": null,
            "description": "No response for at least",
            "min": 60
          },
          "5": {
            "max": null,
            "metric": "response.time",
            "steps": null,
            "description": "Response time is higher than",
            "min": 500
          }
        },
        "id": 2,
        "name": "geonode is not working"
      }
    }
  ],
  "success": true
}

报告错误的响应包含中的检查元素列表 data 元素。每个检查元素包含:

  • check -序列化 NotificationCheck 对象,该对象用于

  • problems -失败的指标检查列表。每个元素包含度量名称、严重程度、错误消息、测量值和阈值。

严重性

严重性是错误潜在影响的文字描述。有三个值: warningerrorfatal

通知列表

此调用将返回可用通知列表:

GET /monitoring/api/notifications/

{
  "status": "ok",
  "data": {
    "problems": [
      {
        "threshold_value": "10.0000",
        "check_url": "/monitoring/api/notifications/config/2/",
        "name": "request.count",
        "check_id": 2,
        "description": "Metric value for request.count should be at least 10, got 4 instead",
        "offending_value": "4.0000",
        "message": "Number of handled requests is lower than 4",
        "severity": "error"
      }
    ],
    "health_level": "error"
  },
  "success": true
}

响应将包含中的通知摘要列表 data 钥匙。每个元素将具有:

  • name 检查的指标数量

  • message 是由通知生成的错误消息。这描述了问题所在。

  • description 更详细的信息哪些检查失败。

  • offending_valuethreshold_value 是已比较的值 (offending_value 是来自指标数据的实际值)

  • check_url 至通知详细信息

  • severity 错误的百分比

另外, data 将拥有最高 severity 提供的值位于 health_level

通知详细信息

这将返回通知的详细信息,包括表单和允许字段的列表:

GET /monitoring/api/notifications/config/{{notification_id}}/

{
  "status": "ok",
  "errors": {},
  "data": {
    "fields": [
      {
        "is_enabled": true,
        "use_resource": false,
        "description": "Number of handled requests is lower than",
        "max_value": "10.0000",
        "metric": {
          "class": "geonode.contrib.monitoring.models.Metric",
          "name": "request.count",
          "id": 2
        },
        "min_value": "0.0000",
        "use_label": false,
        "use_ows_service": false,
        "field_option": "min_value",
        "use_service": false,
        "steps_calculated": [
          "0.0000",
          "3.33",
          "6.67",
          "10.0"
        ],
        "current_value": "30.0000",
        "steps": 3,
        "notification_check": {
          "class": "geonode.contrib.monitoring.models.NotificationCheck",
          "name": "geonode is not working",
          "id": 2
        },
        "field_name": "request.count.min_value",
        "id": 3,
        "unit": ""
      },
      {
        "is_enabled": true,
        "use_resource": false,
        "description": "No response for at least",
        "max_value": null,
        "metric": {
          "class": "geonode.contrib.monitoring.models.Metric",
          "name": "request.count",
          "id": 2
        },
        "min_value": "60.0000",
        "use_label": false,
        "use_ows_service": false,
        "field_option": "max_timeout",
        "use_service": false,
        "steps_calculated": null,
        "current_value": {
          "seconds": 120,
          "class": "datetime.timedelta"
        },
        "steps": null,
        "notification_check": {
          "class": "geonode.contrib.monitoring.models.NotificationCheck",
          "name": "geonode is not working",
          "id": 2
        },
        "field_name": "request.count.max_timeout",
        "id": 4,
        "unit": ""
      },
      {
        "is_enabled": false,
        "use_resource": false,
        "description": "Response time is higher than",
        "max_value": null,
        "metric": {
          "class": "geonode.contrib.monitoring.models.Metric",
          "name": "response.time",
          "id": 11
        },
        "min_value": "500.0000",
        "use_label": false,
        "use_ows_service": false,
        "field_option": "max_value",
        "use_service": false,
        "steps_calculated": null,
        "current_value": null,
        "steps": null,
        "notification_check": {
          "class": "geonode.contrib.monitoring.models.NotificationCheck",
          "name": "geonode is not working",
          "id": 2
        },
        "field_name": "response.time.max_value",
        "id": 5,
        "unit": "s"
      },
      {
        "is_enabled": false,
        "use_resource": false,
        "description": "dsfdsf",
        "max_value": null,
        "metric": {
          "class": "geonode.contrib.monitoring.models.Metric",
          "name": "response.time",
          "id": 11
        },
        "min_value": null,
        "use_label": false,
        "use_ows_service": false,
        "field_option": "min_value",
        "use_service": false,
        "steps_calculated": null,
        "current_value": null,
        "steps": null,
        "notification_check": {
          "class": "geonode.contrib.monitoring.models.NotificationCheck",
          "name": "geonode is not working",
          "id": 2
        },
        "field_name": "response.time.min_value",
        "id": 6,
        "unit": "s"
      },
      {
        "is_enabled": true,
        "use_resource": false,
        "description": "Incoming traffic should be higher than",
        "max_value": null,
        "metric": {
          "class": "geonode.contrib.monitoring.models.Metric",
          "name": "network.in.rate",
          "id": 34
        },
        "min_value": null,
        "use_label": false,
        "use_ows_service": false,
        "field_option": "min_value",
        "use_service": false,
        "steps_calculated": null,
        "current_value": "10000000.0000",
        "steps": null,
        "notification_check": {
          "class": "geonode.contrib.monitoring.models.NotificationCheck",
          "name": "geonode is not working",
          "id": 2
        },
        "field_name": "network.in.rate.min_value",
        "id": 7,
        "unit": "B/s"
      }
    ],
    "form": "<tr><th><label for=\"id_emails\">Emails:</label></th><td><textarea cols=\"40\" id=\"id_emails\" name=\"emails\" rows=\"10\">\r\n\nad@m.in</textarea></td></tr>\n<tr><th><label for=\"id_severity\">Severity:</label></th><td><select id=\"id_severity\" name=\"severity\">\n<option value=\"warning\">Warning</option>\n<option value=\"error\" selected=\"selected\">Error</option>\n<option value=\"fatal\">Fatal</option>\n</select></td></tr>\n<tr><th><label for=\"id_active\">Active:</label></th><td><input checked=\"checked\" id=\"id_active\" name=\"active\" type=\"checkbox\" /></td></tr>\n<tr><th><label for=\"id_grace_period\">Grace period:</label></th><td><input id=\"id_grace_period\" name=\"grace_period\" type=\"text\" value=\"00:01:00\" /></td></tr>\n<tr><th><label for=\"id_request.count.min_value\">Request.count.min value:</label></th><td><select id=\"id_request.count.min_value\" name=\"request.count.min_value\">\n<option value=\"0.0000\">0.0000</option>\n<option value=\"3.33\">3.33</option>\n<option value=\"6.67\">6.67</option>\n<option value=\"10.0\">10.0</option>\n</select></td></tr>\n<tr><th><label for=\"id_request.count.max_timeout\">Request.count.max timeout:</label></th><td><input id=\"id_request.count.max_timeout\" min=\"60.0000\" name=\"request.count.max_timeout\" step=\"0.01\" type=\"number\" /></td></tr>\n<tr><th><label for=\"id_response.time.max_value\">Response.time.max value:</label></th><td><input id=\"id_response.time.max_value\" min=\"500.0000\" name=\"response.time.max_value\" step=\"0.01\" type=\"number\" /></td></tr>\n<tr><th><label for=\"id_response.time.min_value\">Response.time.min value:</label></th><td><input id=\"id_response.time.min_value\" name=\"response.time.min_value\" step=\"0.01\" type=\"number\" /></td></tr>\n<tr><th><label for=\"id_network.in.rate.min_value\">Network.in.rate.min value:</label></th><td><input id=\"id_network.in.rate.min_value\" name=\"network.in.rate.min_value\" step=\"0.01\" type=\"number\" /></td></tr>",
    "notification": {
      "grace_period": {
        "seconds": 60,
        "class": "datetime.timedelta"
      },
      "last_send": "2017-09-04T13:13:15.203",
      "description": "detects when requests are not handled",
      "severity": "error",
      "user_threshold": {
        "request.count.max_timeout": {
          "max": null,
          "metric": "request.count",
          "steps": null,
          "description": "No response for at least",
          "min": 60
        },
        "response.time.max_value": {
          "max": null,
          "metric": "response.time",
          "steps": null,
          "description": "Response time is higher than",
          "min": 500
        },
        "request.count.min_value": {
          "max": 10,
          "metric": "request.count",
          "steps": 3,
          "description": "Number of handled requests is lower than",
          "min": 0
        }
      },
      "active": true,
      "id": 2,
      "name": "geonode is not working"
    }
  },
  "success": true
}

返回的密钥位于 data 元素:

  • fields -表单域列表,包括每个资源的详细配置标志

  • form -呈现的用户表单,可以显示

  • notification -序列化通知对象,具有 user_thresholds 列表(这是要创建的基础 fields 对象)

前端应使用 fields 要在客户端创建整个表单的列表:

  • 字段名称存储在 field_name

  • 字段标签可以从 description

  • 可以从以下位置提取单元 unit 字段

  • 如果字段定义在以下位置提供列表 steps_calculated ,应用于构造选择/下拉,否则应显示文本输入。如果可能,验证应该考虑到 min_valuemax_value

  • 当前设置的值在中可用 current_value 田野。

  • 每个字段都有 is_enabled 属性,该属性指示字段是否已启用。当前该值的计算方式如下:如果满足以下条件,则启用该字段 current_value 不是 None 。这在未来可能会改变。

此外,每个通知配置都接受中的电子邮件列表 emails 田野。此字段应作为包含换行符的电子邮件列表发送 (n )。

表单应提交到与配置源相同的URL (/monitoring/api/notifications/config/{{id}}/ ),见下文。

通知版本(按用户)

下面的API调用允许用户通过设置接收方来配置通知,并调整检查阈值:

POST /monitoring/api/notifications/config/{{notification_check_id}}/

request.count.max_value=val
request.count.min_value=1
emails=list of emails

响应包含序列化的 NotificationCheck 在……里面 data 元素,如果在表单处理过程中没有捕获到错误:

{
  "status": "ok",
  "errors": {},
  "data": {
    "grace_period": {
      "seconds": 600,
      "class": "datetime.timedelta"
    },
    "last_send": null,
    "description": "more test",
    "severity": "error",
    "user_threshold": {
      "request.count.max_value": {
        "max": null,
        "metric": "request.count",
        "steps": null,
        "description": "Max number of request",
        "min": 1000
      },
      "request.count.min_value": {
        "max": 100,
        "metric": "request.count",
        "steps": null,
        "description": "Min number of request",
        "min": 0
      }
    },
    "id": 293,
    "name": "test"
  },
  "success": true
}

错误(非200)响应将具有 errors 已填充密钥:

{
  "status": "error",
  "errors": {
    "user_threshold": [
      "This field is required."
    ],
    "name": [
      "This field is required."
    ],
    "description": [
      "This field is required."
    ]
  },
  "data": [],
  "success": false
}

通知创建

此API调用允许创建新通知,表单布局与版本不同:

POST /monitoring/api/notifications/

name=Name of notification (geonode doesn't work)
description=This will check if geonode is serving any data
emails=
user_thresholds=
severity=

有效载荷元素:

  • namedescription 值对用户是否可见

  • severity 严重性值

  • emails 是电子邮件列表,但是,它被编码为字符串,其中每封电子邮件都在新行中:

    email1@test.com
    email2@test.com
    
  • user_thresholds 是按度量每次检查配置的JSON编码列表。List的每个元素都应该是10个元素的列表,包含:

    • 指标名称

    • 字段检查选项(以下三个值之一: min_valuemax_valuemax_timeout )

    • 如果度量检查可以使用服务,则返回标志

    • 如果度量检查可以使用资源,则为标志

    • 标记,如果度量检查可以使用标签

    • 指标检查是否可以使用OWS服务的标志

    • 用户输入的最小值(如果没有最小检查,则不进行最小检查)

    • 用户输入的最大值(如果没有,则不检查最大值)

    • Steps Count是为用户输入生成的步骤数,因此用户可以从选择列表中选择值,而不需要键入。仅当还为以下项提供了最小值和最大值的样本有效负载时,此设置才会生效 user_thresholds

      [
          ('request.count', 'min_value', False, False, False, False, 0, 100, None, "Min number of request"),
          ('request.count', 'max_value', False, False, False, False, 1000, None, None, "Max number of request"),
      ]
      

响应是序列化的 NotificationCheck 用标准响应信封(状态、错误等)包装。实际数据在 data 钥匙。如果处理失败(例如,由于表单验证错误),则响应将不是200 OK,并且 errors 将填充密钥。

{
  "status": "ok",
  "errors": {},
  "data": {
    "grace_period": {
      "seconds": 600,
      "class": "datetime.timedelta"
    },
    "last_send": null,
    "description": "more test",
    "user_threshold": {
      "request.count.max_value": {
        "max": 100,
        "metric": "request.count",
        "steps": null,
        "description": "Min number of request",
        "min": 0
      },
      "request.count.min_value": {
        "max": null,
        "metric": "request.count",
        "steps": null,
        "description": "Max number of request",
        "min": 1000
      }
    },
    "id": 257,
    "name": "test"
  },
  "success": true
}