监视:通知

通知是每个数据收集周期后运行的监视的一部分。它的可配置机制检查指标值是否在允许的值范围内,如果不在允许值范围内,则向指定的接收者(注册用户或外部电子邮件)发送通知。

数据模型

通知机制由几个类组成,负责不同的方面:

  • 高级配置: NotificationCheck

    保留一般说明、度量检查定义列表、发送宽限期配置和上次发送标记、应向其发送通知的用户列表(在helper表中, NotificationReceiver 班级)。

  • 根据度量定义: MetricNotificationDefinition

    每次检查保留每个度量配置:用户允许的度量名称、最小值、最大值、检查类型(如果值应低于或高于给定阈值,或者最后一次读取的值应不早于度量检查的特定期间)、附加检查范围(资源、标签、OWS服务-此部分已部分实现)。定义对象是从 NotificationCheck.user_tresholds 数据,用于生成验证表单。注意,那个 NotificationCheck 对于不同的度量集,可以有多个定义项。定义行创建于 NotificationCheck 创建或更新。

  • 每个度量检查配置: MetricNotificationCheck

    每次检查配置保留每个度量:度量值和阈值。它是在用户提交特定通知的配置表单后创建的。

工作流程

通过调用 CollectorAPI.emit_notifications(for_timestamp) 是的。这将执行以下操作:

  • 获取所有通知,

  • 对于每个通知,将获取所有通知检查

  • 对于每个通知检查,它将获取对给定时间戳有效的度量,并检查值是否与给定条件匹配

  • 每次检查都会引发异常,异常将被调用方捕获,对于每个通知,都将返回错误列表

  • 基于通知和错误列表,将生成警报并发送给用户,除非上次传递是在宽限期结束之前。

此外,通知会公开 /monitoring/api/status/ 状态API ,它将显示请求时检测到的错误。

Web API

状态API

状态终结点显示由通知执行的错误检查的当前状态。前端可以定期向此终结点发出请求。目前没有状态的历史视图。状态响应用标准响应信封包装。无错误响应将具有 status 键设置为 oksuccesstrue ,否则 errors 不会是空的。

无错误响应:

GET /monitoring/api/status/

{
    "status": "ok",
    "data": [],
    "success": true
}

报告错误的响应:

{
  "status": "ok",
  "data": [
    {
      "problems": [
        {
          "threshold_value": "2017-08-29T10:45:26.142",
          "message": "Value collected too far in the past",
          "name": "request.count",
          "severity": "warning",
          "offending_value": "2017-08-25T16:41:00"
        }
      ],
      "check": {
        "grace_period": {
          "seconds": 600,
          "class": "datetime.timedelta"
        },
        "last_send": null,
        "description": "detects when requests are not handled",
        "severity": "warning",
        "user_threshold": {
          "3": {
            "max": 10,
            "metric": "request.count",
            "steps": null,
            "description": "Number of handled requests is lower than",
            "min": 0
          },
          "4": {
            "max": null,
            "metric": "request.count",
            "steps": null,
            "description": "No response for at least",
            "min": 60
          },
          "5": {
            "max": null,
            "metric": "response.time",
            "steps": null,
            "description": "Response time is higher than",
            "min": 500
          }
        },
        "id": 2,
        "name": "geonode is not working"
      }
    }
  ],
  "success": true
}

报告错误的响应包含中的check元素列表 data 元素。每个check元素包含:

  • check -序列化的 NotificationCheck 使用的对象

  • problems -失败的度量检查列表。每个元素都包含度量名称、严重性、错误消息、度量值和阈值。

严重程度

严重性是对错误潜在影响的文本描述。有三个值: warningerrorfatal .

通知列表

此调用将返回可用通知的列表:

GET /monitoring/api/notifications/

{
  "status": "ok",
  "data": {
    "problems": [
      {
        "threshold_value": "10.0000",
        "check_url": "/monitoring/api/notifications/config/2/",
        "name": "request.count",
        "check_id": 2,
        "description": "Metric value for request.count should be at least 10, got 4 instead",
        "offending_value": "4.0000",
        "message": "Number of handled requests is lower than 4",
        "severity": "error"
      }
    ],
    "health_level": "error"
  },
  "success": true
}

响应将包含通知摘要列表 data 钥匙。每个元素将具有:

  • name 已检查的度量

  • message 是由通知生成的错误消息。这描述了问题所在。

  • description 更详细的信息哪些检查失败。

  • offending_valuethreshold_value 是比较的值 (offending_value 是公制数据的实际值)

  • check_url 通知详细信息

  • severity 错误的

也, data 会有最高的 severity 可用价值 health_level .

通知详细信息

这将返回通知的详细信息,包括允许字段的表单和列表:

GET /monitoring/api/notifications/config/{{notification_id}}/

{
  "status": "ok",
  "errors": {},
  "data": {
    "fields": [
      {
        "is_enabled": true,
        "use_resource": false,
        "description": "Number of handled requests is lower than",
        "max_value": "10.0000",
        "metric": {
          "class": "geonode.contrib.monitoring.models.Metric",
          "name": "request.count",
          "id": 2
        },
        "min_value": "0.0000",
        "use_label": false,
        "use_ows_service": false,
        "field_option": "min_value",
        "use_service": false,
        "steps_calculated": [
          "0.0000",
          "3.33",
          "6.67",
          "10.0"
        ],
        "current_value": "30.0000",
        "steps": 3,
        "notification_check": {
          "class": "geonode.contrib.monitoring.models.NotificationCheck",
          "name": "geonode is not working",
          "id": 2
        },
        "field_name": "request.count.min_value",
        "id": 3,
        "unit": ""
      },
      {
        "is_enabled": true,
        "use_resource": false,
        "description": "No response for at least",
        "max_value": null,
        "metric": {
          "class": "geonode.contrib.monitoring.models.Metric",
          "name": "request.count",
          "id": 2
        },
        "min_value": "60.0000",
        "use_label": false,
        "use_ows_service": false,
        "field_option": "max_timeout",
        "use_service": false,
        "steps_calculated": null,
        "current_value": {
          "seconds": 120,
          "class": "datetime.timedelta"
        },
        "steps": null,
        "notification_check": {
          "class": "geonode.contrib.monitoring.models.NotificationCheck",
          "name": "geonode is not working",
          "id": 2
        },
        "field_name": "request.count.max_timeout",
        "id": 4,
        "unit": ""
      },
      {
        "is_enabled": false,
        "use_resource": false,
        "description": "Response time is higher than",
        "max_value": null,
        "metric": {
          "class": "geonode.contrib.monitoring.models.Metric",
          "name": "response.time",
          "id": 11
        },
        "min_value": "500.0000",
        "use_label": false,
        "use_ows_service": false,
        "field_option": "max_value",
        "use_service": false,
        "steps_calculated": null,
        "current_value": null,
        "steps": null,
        "notification_check": {
          "class": "geonode.contrib.monitoring.models.NotificationCheck",
          "name": "geonode is not working",
          "id": 2
        },
        "field_name": "response.time.max_value",
        "id": 5,
        "unit": "s"
      },
      {
        "is_enabled": false,
        "use_resource": false,
        "description": "dsfdsf",
        "max_value": null,
        "metric": {
          "class": "geonode.contrib.monitoring.models.Metric",
          "name": "response.time",
          "id": 11
        },
        "min_value": null,
        "use_label": false,
        "use_ows_service": false,
        "field_option": "min_value",
        "use_service": false,
        "steps_calculated": null,
        "current_value": null,
        "steps": null,
        "notification_check": {
          "class": "geonode.contrib.monitoring.models.NotificationCheck",
          "name": "geonode is not working",
          "id": 2
        },
        "field_name": "response.time.min_value",
        "id": 6,
        "unit": "s"
      },
      {
        "is_enabled": true,
        "use_resource": false,
        "description": "Incoming traffic should be higher than",
        "max_value": null,
        "metric": {
          "class": "geonode.contrib.monitoring.models.Metric",
          "name": "network.in.rate",
          "id": 34
        },
        "min_value": null,
        "use_label": false,
        "use_ows_service": false,
        "field_option": "min_value",
        "use_service": false,
        "steps_calculated": null,
        "current_value": "10000000.0000",
        "steps": null,
        "notification_check": {
          "class": "geonode.contrib.monitoring.models.NotificationCheck",
          "name": "geonode is not working",
          "id": 2
        },
        "field_name": "network.in.rate.min_value",
        "id": 7,
        "unit": "B/s"
      }
    ],
    "form": "<tr><th><label for=\"id_emails\">Emails:</label></th><td><textarea cols=\"40\" id=\"id_emails\" name=\"emails\" rows=\"10\">\r\n\nad@m.in</textarea></td></tr>\n<tr><th><label for=\"id_severity\">Severity:</label></th><td><select id=\"id_severity\" name=\"severity\">\n<option value=\"warning\">Warning</option>\n<option value=\"error\" selected=\"selected\">Error</option>\n<option value=\"fatal\">Fatal</option>\n</select></td></tr>\n<tr><th><label for=\"id_active\">Active:</label></th><td><input checked=\"checked\" id=\"id_active\" name=\"active\" type=\"checkbox\" /></td></tr>\n<tr><th><label for=\"id_grace_period\">Grace period:</label></th><td><input id=\"id_grace_period\" name=\"grace_period\" type=\"text\" value=\"00:01:00\" /></td></tr>\n<tr><th><label for=\"id_request.count.min_value\">Request.count.min value:</label></th><td><select id=\"id_request.count.min_value\" name=\"request.count.min_value\">\n<option value=\"0.0000\">0.0000</option>\n<option value=\"3.33\">3.33</option>\n<option value=\"6.67\">6.67</option>\n<option value=\"10.0\">10.0</option>\n</select></td></tr>\n<tr><th><label for=\"id_request.count.max_timeout\">Request.count.max timeout:</label></th><td><input id=\"id_request.count.max_timeout\" min=\"60.0000\" name=\"request.count.max_timeout\" step=\"0.01\" type=\"number\" /></td></tr>\n<tr><th><label for=\"id_response.time.max_value\">Response.time.max value:</label></th><td><input id=\"id_response.time.max_value\" min=\"500.0000\" name=\"response.time.max_value\" step=\"0.01\" type=\"number\" /></td></tr>\n<tr><th><label for=\"id_response.time.min_value\">Response.time.min value:</label></th><td><input id=\"id_response.time.min_value\" name=\"response.time.min_value\" step=\"0.01\" type=\"number\" /></td></tr>\n<tr><th><label for=\"id_network.in.rate.min_value\">Network.in.rate.min value:</label></th><td><input id=\"id_network.in.rate.min_value\" name=\"network.in.rate.min_value\" step=\"0.01\" type=\"number\" /></td></tr>",
    "notification": {
      "grace_period": {
        "seconds": 60,
        "class": "datetime.timedelta"
      },
      "last_send": "2017-09-04T13:13:15.203",
      "description": "detects when requests are not handled",
      "severity": "error",
      "user_threshold": {
        "request.count.max_timeout": {
          "max": null,
          "metric": "request.count",
          "steps": null,
          "description": "No response for at least",
          "min": 60
        },
        "response.time.max_value": {
          "max": null,
          "metric": "response.time",
          "steps": null,
          "description": "Response time is higher than",
          "min": 500
        },
        "request.count.min_value": {
          "max": 10,
          "metric": "request.count",
          "steps": 3,
          "description": "Number of handled requests is lower than",
          "min": 0
        }
      },
      "active": true,
      "id": 2,
      "name": "geonode is not working"
    }
  },
  "success": true
}

返回的密钥 data 元素:

  • fields -表单字段列表,包括每个资源的详细配置标志

  • form -呈现的用户窗体,可以显示

  • notification -序列化通知对象 user_thresholds 列表(这是要创建的基 fields 对象)

前端应使用 fields 在客户端创建整个窗体的列表:

  • 字段名存储在 field_name .

  • 字段标签可以由 description

  • 单位可以从 unit 领域

  • 如果字段定义在 steps_calculated ,这应用于构造选择/下拉列表,否则应显示文本输入。如果可能,验证应考虑 min_valuemax_value .

  • 当前设置值在中可用 current_value 字段。

  • 每个字段都有 is_enabled 属性,该属性指示字段是否已启用。当前此值按以下方式计算:如果 current_value 不是 None 是的。这在未来可能会改变。

此外,每个通知配置都接受 emails 字段。此字段应作为电子邮件列表发送,并用新行字符连接 (n

表单应提交到与配置源相同的url (/monitoring/api/notifications/config/{{id}}/ ),见下文。

通知版本(按用户)

以下API调用允许用户通过设置接收器和调整检查阈值来配置通知:

POST /monitoring/api/notifications/config/{{notification_check_id}}/

request.count.max_value=val
request.count.min_value=1
emails=list of emails

响应包含序列化的 NotificationCheck 在里面 data 元素,如果在表单处理期间未捕获错误:

{
  "status": "ok",
  "errors": {},
  "data": {
    "grace_period": {
      "seconds": 600,
      "class": "datetime.timedelta"
    },
    "last_send": null,
    "description": "more test",
    "severity": "error",
    "user_threshold": {
      "request.count.max_value": {
        "max": null,
        "metric": "request.count",
        "steps": null,
        "description": "Max number of request",
        "min": 1000
      },
      "request.count.min_value": {
        "max": 100,
        "metric": "request.count",
        "steps": null,
        "description": "Min number of request",
        "min": 0
      }
    },
    "id": 293,
    "name": "test"
  },
  "success": true
}

错误(非200)响应将具有 errors 密钥填充:

{
  "status": "error",
  "errors": {
    "user_threshold": [
      "This field is required."
    ],
    "name": [
      "This field is required."
    ],
    "description": [
      "This field is required."
    ]
  },
  "data": [],
  "success": false
}

通知创建

这个api调用允许创建新的通知,它的表单布局与版本不同:

POST /monitoring/api/notifications/

name=Name of notification (geonode doesn't work)
description=This will check if geonode is serving any data
emails=
user_thresholds=
severity=

有效载荷元件:

  • namedescription 值对用户可见吗

  • severity 严重性值

  • emails 是一个电子邮件列表,但是,它被编码为一个字符串,其中每个电子邮件都在新行中:

    email1@test.com
    email2@test.com
    
  • user_thresholds 是每个检查配置的每个度量的JSON编码列表。列表中的每个元素都应该是一个10元素列表,其中包含:

    • 指标名称

    • 字段检查选项(三个值之一: min_valuemax_valuemax_timeout

    • 标记,如果度量检查可以使用服务

    • 标记,如果度量检查可以使用资源

    • 标记,如果度量检查可以使用标签

    • 标记,如果度量检查可以使用OWS服务

    • 用户输入的最小值(如果没有,则不进行最小检查)

    • 用户输入的最大值(没有最大检查,如果没有)

    • 步骤计数是为用户输入生成的许多步骤,因此用户可以从选择列表中选择值,而不是键入。只有当最小值和最大值同时被提供给 user_thresholds

      [
          ('request.count', 'min_value', False, False, False, False, 0, 100, None, "Min number of request"),
          ('request.count', 'max_value', False, False, False, False, 1000, None, None, "Max number of request"),
      ]
      

响应是序列化的 NotificationCheck 用标准响应信封包装(状态、错误等)。实际数据在 data 钥匙。如果处理失败,例如由于表单验证错误,则响应将为非200 OK,并且 errors 将填充密钥。

{
  "status": "ok",
  "errors": {},
  "data": {
    "grace_period": {
      "seconds": 600,
      "class": "datetime.timedelta"
    },
    "last_send": null,
    "description": "more test",
    "user_threshold": {
      "request.count.max_value": {
        "max": 100,
        "metric": "request.count",
        "steps": null,
        "description": "Min number of request",
        "min": 0
      },
      "request.count.min_value": {
        "max": null,
        "metric": "request.count",
        "steps": null,
        "description": "Max number of request",
        "min": 1000
      }
    },
    "id": 257,
    "name": "test"
  },
  "success": true
}