4.7. 使用WeatherReport排除CouchDB 3故障

4.7.1. 概述

WeatherReport是一个OTP应用程序和一组工具,用于诊断可能影响CouchDB版本3节点或集群的常见问题(不支持版本4或更高版本)。它可以通过 weatherreport 命令行脚本。

以下是使用以下命令的基本示例 weatherreport 紧随其后的是命令的输出:

$ weatherreport --etc /path/to/etc
[warning] Cluster member node3@127.0.0.1 is not connected to this node. Please check whether it is down.

4.7.2. 用法

在大多数情况下,您可以只运行 weatherreport 命令,如上所示。但是,有时您可能希望了解一些额外的详细信息,或者只运行特定的检查。为此,有命令行选项。执行 weatherreport --help 要了解有关这些选项的更多信息,请执行以下操作:

$ weatherreport --help
Usage: weatherreport [-c <path>] [-d <level>] [-e] [-h] [-l] [check_name ...]

  -c, --etc                 Path to the CouchDB configuration directory
  -d, --level               Minimum message severity level (default: notice)
  -l, --list                Describe available diagnostic tasks
  -e, --expert              Perform more detailed diagnostics
  -h, --help                Display help/usage
  check_name                A specific check to run

若要了解将运行哪些检查,请使用 --list 选项:

$ weatherreport --list
Available diagnostic checks:

  custodian            Shard safety/liveness checks
  disk                 Data directory permissions and atime
  internal_replication Check the number of pending internal replication jobs
  ioq                  Check the total number of active IOQ requests
  mem3_sync            Check there is a registered mem3_sync process
  membership           Cluster membership validity
  memory_use           Measure memory usage
  message_queues       Check for processes with large mailboxes
  node_stats           Check useful erlang statistics for diagnostics
  nodes_connected      Cluster node liveness
  process_calls        Check for large numbers of processes with the same current/initial call
  process_memory       Check for processes with high memory usage
  safe_to_rebuild      Check whether the node can safely be taken out of service
  search               Check the local search node is responsive
  tcp_queues           Measure the length of tcp queues in the kernel

如果您想要关于WeatherReport正在做什么的所有详细信息,您可以在更详细的日志记录级别使用 --level 选项:

$ weatherreport --etc /path/to/etc --level debug
[debug] Not connected to the local cluster node, trying to connect. alive:false connect_failed:undefined
[debug] Starting distributed Erlang.
[debug] Connected to local cluster node 'node1@127.0.0.1'.
[debug] Local RPC: mem3:nodes([]) [5000]
[debug] Local RPC: os:getpid([]) [5000]
[debug] Running shell command: ps -o pmem,rss -p 73905
[debug] Shell command output:
%MEM    RSS
0.3  25116

[debug] Local RPC: erlang:nodes([]) [5000]
[debug] Local RPC: mem3:nodes([]) [5000]
[warning] Cluster member node3@127.0.0.1 is not connected to this node. Please check whether it is down.
[info] Process is using 0.3% of available RAM, totalling 25116 KB of real memory.

大多数情况下,您会希望使用默认值,但任何syslog严重性名称都可以(从最详细到最不详细): debug, info, notice, warning, error, critical, alert, emergency

最后,如果您只想运行单个诊断程序或特定诊断程序的列表,则可以传递它们的名称:

$ weatherreport --etc /path/to/etc nodes_connected
[warning] Cluster member node3@127.0.0.1 is not connected to this node. Please check whether it is down.