4.7. 使用WeatherReport排除CouchDB 3故障¶
4.7.1. 概述¶
WeatherReport是一个OTP应用程序和一组工具,用于诊断可能影响CouchDB版本3节点或集群的常见问题(不支持版本4或更高版本)。它可以通过 weatherreport
命令行脚本。
以下是使用以下命令的基本示例 weatherreport
紧随其后的是命令的输出:
$ weatherreport --etc /path/to/etc
[warning] Cluster member node3@127.0.0.1 is not connected to this node. Please check whether it is down.
4.7.2. 用法¶
在大多数情况下,您可以只运行 weatherreport
命令,如上所示。但是,有时您可能希望了解一些额外的详细信息,或者只运行特定的检查。为此,有命令行选项。执行 weatherreport --help
要了解有关这些选项的更多信息,请执行以下操作:
$ weatherreport --help
Usage: weatherreport [-c <path>] [-d <level>] [-e] [-h] [-l] [check_name ...]
-c, --etc Path to the CouchDB configuration directory
-d, --level Minimum message severity level (default: notice)
-l, --list Describe available diagnostic tasks
-e, --expert Perform more detailed diagnostics
-h, --help Display help/usage
check_name A specific check to run
若要了解将运行哪些检查,请使用 --list 选项:
$ weatherreport --list
Available diagnostic checks:
custodian Shard safety/liveness checks
disk Data directory permissions and atime
internal_replication Check the number of pending internal replication jobs
ioq Check the total number of active IOQ requests
mem3_sync Check there is a registered mem3_sync process
membership Cluster membership validity
memory_use Measure memory usage
message_queues Check for processes with large mailboxes
node_stats Check useful erlang statistics for diagnostics
nodes_connected Cluster node liveness
process_calls Check for large numbers of processes with the same current/initial call
process_memory Check for processes with high memory usage
safe_to_rebuild Check whether the node can safely be taken out of service
search Check the local search node is responsive
tcp_queues Measure the length of tcp queues in the kernel
如果您想要关于WeatherReport正在做什么的所有详细信息,您可以在更详细的日志记录级别使用 --level
选项:
$ weatherreport --etc /path/to/etc --level debug
[debug] Not connected to the local cluster node, trying to connect. alive:false connect_failed:undefined
[debug] Starting distributed Erlang.
[debug] Connected to local cluster node 'node1@127.0.0.1'.
[debug] Local RPC: mem3:nodes([]) [5000]
[debug] Local RPC: os:getpid([]) [5000]
[debug] Running shell command: ps -o pmem,rss -p 73905
[debug] Shell command output:
%MEM RSS
0.3 25116
[debug] Local RPC: erlang:nodes([]) [5000]
[debug] Local RPC: mem3:nodes([]) [5000]
[warning] Cluster member node3@127.0.0.1 is not connected to this node. Please check whether it is down.
[info] Process is using 0.3% of available RAM, totalling 25116 KB of real memory.
大多数情况下,您会希望使用默认值,但任何syslog严重性名称都可以(从最详细到最不详细): debug, info, notice, warning, error, critical, alert, emergency
。
最后,如果您只想运行单个诊断程序或特定诊断程序的列表,则可以传递它们的名称:
$ weatherreport --etc /path/to/etc nodes_connected
[warning] Cluster member node3@127.0.0.1 is not connected to this node. Please check whether it is down.