adjusted_mutual_info_score#

sklearn.metrics.adjusted_mutual_info_score(labels_true, labels_pred, *, average_method='arithmetic')[源代码]#

调整后的两个集群之间的互信息。

调整后的互信息(AMI)是对互信息(MI)分数的调整,以考虑机会。它解释了这样一个事实,即对于具有大量集群的两个集群,MI通常更高,而不管是否实际上有更多的信息共享。对于两个聚类 \(U\)\(V\) ,AMI给出为:

AMI(U, V) = [MI(U, V) - E(MI(U, V))] / [avg(H(U), H(V)) - E(MI(U, V))]

该指标与标签的绝对值无关:类或集群标签值的排列不会以任何方式改变得分值。

This metric is furthermore symmetric: switching \(U\) (label_true) with \(V\) (labels_pred) will return the same score value. This can be useful to measure the agreement of two independent label assignments strategies on the same dataset when the real ground truth is not known.

请注意,该函数比其他指标(例如调整后的兰德指数)慢一个数量级。

阅读更多的 User Guide .

参数:
labels_trueint数组状的形状(n_samples,)

A clustering of the data into disjoint subsets, called \(U\) in the above formula.

labels_predint数组状的形状(n_samples,)

A clustering of the data into disjoint subsets, called \(V\) in the above formula.

average_method' min ','算术',' max ',默认='算术'

如何计算分母中的归一化子。

Added in version 0.20.

在 0.22 版本发生变更: 的默认值 average_method 从“max”改为“算术”。

返回:
ami:float(上限为1.0)

当两个分区相同(即完全匹配)时,AMI返回值1。随机分区(独立标记)的预期AMI平均约为0,因此可能为负。该值以调整后的nats(基于自然对数)为单位。

参见

adjusted_rand_score

调整后的兰德指数。

mutual_info_score

相互信息(未根据机会进行调整)。

引用

[1]

Vinh, Epps, and Bailey, (2010). Information Theoretic Measures for Clusterings Comparison: Variants, Properties, Normalization and Correction for Chance, JMLR <http://jmlr.csail.mit.edu/papers/volume11/vinh10a/vinh10a.pdf> _

[2]

Wikipedia entry for the Adjusted Mutual Information <https://en.wikipedia.org/wiki/Adjusted_Mutual_Information> _

示例

完美的标签既同质又完整,因此评分为1.0::

>>> from sklearn.metrics.cluster import adjusted_mutual_info_score
>>> adjusted_mutual_info_score([0, 0, 1, 1], [0, 0, 1, 1])
...
1.0
>>> adjusted_mutual_info_score([0, 0, 1, 1], [1, 1, 0, 0])
...
1.0

如果类成员完全分散在不同的集群中,则分配完全不完成,因此AMI为空::

>>> adjusted_mutual_info_score([0, 0, 0, 0], [0, 1, 2, 3])
...
0.0