监视:通知¶
通知是在每个数据收集周期之后运行的监视的一部分。它的可配置机制可以检查指标值是否在允许的值范围内,如果不在允许的值范围内,则向指定的接收者(注册用户或外部电子邮件)发送通知。
数据模型¶
通知机制由几个类组成,负责不同的方面:
高级配置: NotificationCheck :
保存一般描述、度量检查定义的列表、发送宽限期配置和上次发送标记、应该向其递送通知的用户的列表(在帮助表中, NotificationReceiver 类)。
按指标定义: MetricNotificationDefinition :
保持每个指标每次检查的配置:指标名称、用户允许的最小值、最大值、检查类型(如果值应低于或高于给定阈值,或者上次读取的时间不应早于指标检查的特定时间段)、检查的附加范围(资源、标签、OWS服务-此部分已部分实现)。定义对象创建自 NotificationCheck.user_tresholds 数据,用于生成验证表。注意,就是那个 NotificationCheck 对于一组不同的指标,可以有多个定义项。定义行在以下情况下创建 NotificationCheck 已创建或更新。
每度量检查配置: MetricNotificationCheck
保持每个指标每次检查的配置:指标和阈值。在用户提交具体通知的配置单后创建。
工作流程¶
在收集脚本中的每个收集/处理周期之后检查通知,方法是调用 CollectorAPI.emit_notifications(for_timestamp) 。这将执行以下操作:
获取所有通知,
对于每个通知,将获得所有通知检查
对于每个通知检查,它将获得给定时间戳的有效度量,并检查值是否与给定条件匹配
每次检查都会引发异常,该异常将在调用方中捕获,并且对于每个通知,将返回错误列表
根据通知和错误列表,将生成警报并将其发送给用户,除非最后一次发送是在宽限期结束之前。
此外,通知还公开 /monitoring/api/status/
状态接口 ,它将显示在请求时刻检测到的错误。
Web API¶
状态接口¶
状态端点显示通知执行的错误检查的当前状态。前端可以定期向此端点发出请求。目前没有状态的历史视图。状态响应用标准响应信封包装。无错误响应将具有 status 关键点设置为 ok 和 success 至 true ,否则 errors 不会是空的。
无错误响应:
GET /monitoring/api/status/
{
"status": "ok",
"data": [],
"success": true
}
报告错误的响应:
{
"status": "ok",
"data": [
{
"problems": [
{
"threshold_value": "2017-08-29T10:45:26.142",
"message": "Value collected too far in the past",
"name": "request.count",
"severity": "warning",
"offending_value": "2017-08-25T16:41:00"
}
],
"check": {
"grace_period": {
"seconds": 600,
"class": "datetime.timedelta"
},
"last_send": null,
"description": "detects when requests are not handled",
"severity": "warning",
"user_threshold": {
"3": {
"max": 10,
"metric": "request.count",
"steps": null,
"description": "Number of handled requests is lower than",
"min": 0
},
"4": {
"max": null,
"metric": "request.count",
"steps": null,
"description": "No response for at least",
"min": 60
},
"5": {
"max": null,
"metric": "response.time",
"steps": null,
"description": "Response time is higher than",
"min": 500
}
},
"id": 2,
"name": "geonode is not working"
}
}
],
"success": true
}
报告错误的响应包含中的检查元素列表 data 元素。每个检查元素包含:
check -序列化 NotificationCheck 对象,该对象用于
problems -失败的指标检查列表。每个元素包含度量名称、严重程度、错误消息、测量值和阈值。
严重性¶
严重性是错误潜在影响的文字描述。有三个值: warning , error 和 fatal 。
通知列表¶
此调用将返回可用通知列表:
GET /monitoring/api/notifications/
{
"status": "ok",
"data": {
"problems": [
{
"threshold_value": "10.0000",
"check_url": "/monitoring/api/notifications/config/2/",
"name": "request.count",
"check_id": 2,
"description": "Metric value for request.count should be at least 10, got 4 instead",
"offending_value": "4.0000",
"message": "Number of handled requests is lower than 4",
"severity": "error"
}
],
"health_level": "error"
},
"success": true
}
响应将包含中的通知摘要列表 data 钥匙。每个元素将具有:
name 检查的指标数量
message 是由通知生成的错误消息。这描述了问题所在。
description 更详细的信息哪些检查失败。
offending_value 和 threshold_value 是已比较的值 (offending_value 是来自指标数据的实际值)
check_url 至通知详细信息
severity 错误的百分比
另外, data 将拥有最高 severity 提供的值位于 health_level 。
通知详细信息¶
这将返回通知的详细信息,包括表单和允许字段的列表:
GET /monitoring/api/notifications/config/{{notification_id}}/
{
"status": "ok",
"errors": {},
"data": {
"fields": [
{
"is_enabled": true,
"use_resource": false,
"description": "Number of handled requests is lower than",
"max_value": "10.0000",
"metric": {
"class": "geonode.contrib.monitoring.models.Metric",
"name": "request.count",
"id": 2
},
"min_value": "0.0000",
"use_label": false,
"use_ows_service": false,
"field_option": "min_value",
"use_service": false,
"steps_calculated": [
"0.0000",
"3.33",
"6.67",
"10.0"
],
"current_value": "30.0000",
"steps": 3,
"notification_check": {
"class": "geonode.contrib.monitoring.models.NotificationCheck",
"name": "geonode is not working",
"id": 2
},
"field_name": "request.count.min_value",
"id": 3,
"unit": ""
},
{
"is_enabled": true,
"use_resource": false,
"description": "No response for at least",
"max_value": null,
"metric": {
"class": "geonode.contrib.monitoring.models.Metric",
"name": "request.count",
"id": 2
},
"min_value": "60.0000",
"use_label": false,
"use_ows_service": false,
"field_option": "max_timeout",
"use_service": false,
"steps_calculated": null,
"current_value": {
"seconds": 120,
"class": "datetime.timedelta"
},
"steps": null,
"notification_check": {
"class": "geonode.contrib.monitoring.models.NotificationCheck",
"name": "geonode is not working",
"id": 2
},
"field_name": "request.count.max_timeout",
"id": 4,
"unit": ""
},
{
"is_enabled": false,
"use_resource": false,
"description": "Response time is higher than",
"max_value": null,
"metric": {
"class": "geonode.contrib.monitoring.models.Metric",
"name": "response.time",
"id": 11
},
"min_value": "500.0000",
"use_label": false,
"use_ows_service": false,
"field_option": "max_value",
"use_service": false,
"steps_calculated": null,
"current_value": null,
"steps": null,
"notification_check": {
"class": "geonode.contrib.monitoring.models.NotificationCheck",
"name": "geonode is not working",
"id": 2
},
"field_name": "response.time.max_value",
"id": 5,
"unit": "s"
},
{
"is_enabled": false,
"use_resource": false,
"description": "dsfdsf",
"max_value": null,
"metric": {
"class": "geonode.contrib.monitoring.models.Metric",
"name": "response.time",
"id": 11
},
"min_value": null,
"use_label": false,
"use_ows_service": false,
"field_option": "min_value",
"use_service": false,
"steps_calculated": null,
"current_value": null,
"steps": null,
"notification_check": {
"class": "geonode.contrib.monitoring.models.NotificationCheck",
"name": "geonode is not working",
"id": 2
},
"field_name": "response.time.min_value",
"id": 6,
"unit": "s"
},
{
"is_enabled": true,
"use_resource": false,
"description": "Incoming traffic should be higher than",
"max_value": null,
"metric": {
"class": "geonode.contrib.monitoring.models.Metric",
"name": "network.in.rate",
"id": 34
},
"min_value": null,
"use_label": false,
"use_ows_service": false,
"field_option": "min_value",
"use_service": false,
"steps_calculated": null,
"current_value": "10000000.0000",
"steps": null,
"notification_check": {
"class": "geonode.contrib.monitoring.models.NotificationCheck",
"name": "geonode is not working",
"id": 2
},
"field_name": "network.in.rate.min_value",
"id": 7,
"unit": "B/s"
}
],
"form": "<tr><th><label for=\"id_emails\">Emails:</label></th><td><textarea cols=\"40\" id=\"id_emails\" name=\"emails\" rows=\"10\">\r\n\nad@m.in</textarea></td></tr>\n<tr><th><label for=\"id_severity\">Severity:</label></th><td><select id=\"id_severity\" name=\"severity\">\n<option value=\"warning\">Warning</option>\n<option value=\"error\" selected=\"selected\">Error</option>\n<option value=\"fatal\">Fatal</option>\n</select></td></tr>\n<tr><th><label for=\"id_active\">Active:</label></th><td><input checked=\"checked\" id=\"id_active\" name=\"active\" type=\"checkbox\" /></td></tr>\n<tr><th><label for=\"id_grace_period\">Grace period:</label></th><td><input id=\"id_grace_period\" name=\"grace_period\" type=\"text\" value=\"00:01:00\" /></td></tr>\n<tr><th><label for=\"id_request.count.min_value\">Request.count.min value:</label></th><td><select id=\"id_request.count.min_value\" name=\"request.count.min_value\">\n<option value=\"0.0000\">0.0000</option>\n<option value=\"3.33\">3.33</option>\n<option value=\"6.67\">6.67</option>\n<option value=\"10.0\">10.0</option>\n</select></td></tr>\n<tr><th><label for=\"id_request.count.max_timeout\">Request.count.max timeout:</label></th><td><input id=\"id_request.count.max_timeout\" min=\"60.0000\" name=\"request.count.max_timeout\" step=\"0.01\" type=\"number\" /></td></tr>\n<tr><th><label for=\"id_response.time.max_value\">Response.time.max value:</label></th><td><input id=\"id_response.time.max_value\" min=\"500.0000\" name=\"response.time.max_value\" step=\"0.01\" type=\"number\" /></td></tr>\n<tr><th><label for=\"id_response.time.min_value\">Response.time.min value:</label></th><td><input id=\"id_response.time.min_value\" name=\"response.time.min_value\" step=\"0.01\" type=\"number\" /></td></tr>\n<tr><th><label for=\"id_network.in.rate.min_value\">Network.in.rate.min value:</label></th><td><input id=\"id_network.in.rate.min_value\" name=\"network.in.rate.min_value\" step=\"0.01\" type=\"number\" /></td></tr>",
"notification": {
"grace_period": {
"seconds": 60,
"class": "datetime.timedelta"
},
"last_send": "2017-09-04T13:13:15.203",
"description": "detects when requests are not handled",
"severity": "error",
"user_threshold": {
"request.count.max_timeout": {
"max": null,
"metric": "request.count",
"steps": null,
"description": "No response for at least",
"min": 60
},
"response.time.max_value": {
"max": null,
"metric": "response.time",
"steps": null,
"description": "Response time is higher than",
"min": 500
},
"request.count.min_value": {
"max": 10,
"metric": "request.count",
"steps": 3,
"description": "Number of handled requests is lower than",
"min": 0
}
},
"active": true,
"id": 2,
"name": "geonode is not working"
}
},
"success": true
}
返回的密钥位于 data 元素:
fields -表单域列表,包括每个资源的详细配置标志
form -呈现的用户表单,可以显示
notification -序列化通知对象,具有 user_thresholds 列表(这是要创建的基础 fields 对象)
前端应使用 fields 要在客户端创建整个表单的列表:
字段名称存储在 field_name 。
字段标签可以从 description
可以从以下位置提取单元 unit 字段
如果字段定义在以下位置提供列表 steps_calculated ,应用于构造选择/下拉,否则应显示文本输入。如果可能,验证应该考虑到 min_value 和 max_value 。
当前设置的值在中可用 current_value 田野。
每个字段都有 is_enabled 属性,该属性指示字段是否已启用。当前该值的计算方式如下:如果满足以下条件,则启用该字段 current_value 不是 None 。这在未来可能会改变。
此外,每个通知配置都接受中的电子邮件列表 emails 田野。此字段应作为包含换行符的电子邮件列表发送 (n )。
表单应提交到与配置源相同的URL (/monitoring/api/notifications/config/{{id}}/
),见下文。
通知版本(按用户)¶
下面的API调用允许用户通过设置接收方来配置通知,并调整检查阈值:
POST /monitoring/api/notifications/config/{{notification_check_id}}/
request.count.max_value=val
request.count.min_value=1
emails=list of emails
响应包含序列化的 NotificationCheck 在……里面 data 元素,如果在表单处理过程中没有捕获到错误:
{
"status": "ok",
"errors": {},
"data": {
"grace_period": {
"seconds": 600,
"class": "datetime.timedelta"
},
"last_send": null,
"description": "more test",
"severity": "error",
"user_threshold": {
"request.count.max_value": {
"max": null,
"metric": "request.count",
"steps": null,
"description": "Max number of request",
"min": 1000
},
"request.count.min_value": {
"max": 100,
"metric": "request.count",
"steps": null,
"description": "Min number of request",
"min": 0
}
},
"id": 293,
"name": "test"
},
"success": true
}
错误(非200)响应将具有 errors 已填充密钥:
{
"status": "error",
"errors": {
"user_threshold": [
"This field is required."
],
"name": [
"This field is required."
],
"description": [
"This field is required."
]
},
"data": [],
"success": false
}
通知创建¶
此API调用允许创建新通知,表单布局与版本不同:
POST /monitoring/api/notifications/
name=Name of notification (geonode doesn't work)
description=This will check if geonode is serving any data
emails=
user_thresholds=
severity=
有效载荷元素:
name , description 值对用户是否可见
severity 严重性值
emails 是电子邮件列表,但是,它被编码为字符串,其中每封电子邮件都在新行中:
email1@test.com email2@test.com
user_thresholds 是按度量每次检查配置的JSON编码列表。List的每个元素都应该是10个元素的列表,包含:
指标名称
字段检查选项(以下三个值之一: min_value , max_value 或 max_timeout )
如果度量检查可以使用服务,则返回标志
如果度量检查可以使用资源,则为标志
标记,如果度量检查可以使用标签
指标检查是否可以使用OWS服务的标志
用户输入的最小值(如果没有最小检查,则不进行最小检查)
用户输入的最大值(如果没有,则不检查最大值)
Steps Count是为用户输入生成的步骤数,因此用户可以从选择列表中选择值,而不需要键入。仅当还为以下项提供了最小值和最大值的样本有效负载时,此设置才会生效 user_thresholds :
[ ('request.count', 'min_value', False, False, False, False, 0, 100, None, "Min number of request"), ('request.count', 'max_value', False, False, False, False, 1000, None, None, "Max number of request"), ]
响应是序列化的 NotificationCheck 用标准响应信封(状态、错误等)包装。实际数据在 data 钥匙。如果处理失败(例如,由于表单验证错误),则响应将不是200 OK,并且 errors 将填充密钥。
{
"status": "ok",
"errors": {},
"data": {
"grace_period": {
"seconds": 600,
"class": "datetime.timedelta"
},
"last_send": null,
"description": "more test",
"user_threshold": {
"request.count.max_value": {
"max": 100,
"metric": "request.count",
"steps": null,
"description": "Min number of request",
"min": 0
},
"request.count.min_value": {
"max": null,
"metric": "request.count",
"steps": null,
"description": "Max number of request",
"min": 1000
}
},
"id": 257,
"name": "test"
},
"success": true
}