fowlkes_mallows_score#

sklearn.metrics.fowlkes_mallows_score(labels_true, labels_pred, *, sparse=False)[源代码]#

测量一组点的两个集群的相似性。

Added in version 0.18.

Fowlkes-Malows指数(LDI)定义为精确度和召回率之间的几何平均值:

FMI = TP / sqrt((TP + FP) * (TP + FN))

Where TP is the number of True Positive (i.e. the number of pairs of points that belong to the same cluster in both labels_true and labels_pred), FP is the number of False Positive (i.e. the number of pairs of points that belong to the same cluster in labels_pred but not in labels_true) and FN is the number of False Negative (i.e. the number of pairs of points that belong to the same cluster in labels_true but not in labels_pred).

分数范围为0到1。高值表示两个聚类之间具有良好的相似性。

阅读更多的 User Guide .

参数:
labels_true形状类似数组(n_samples,),dype =int

将数据聚集到不相交的子集中。

labels_pred形状类似数组(n_samples,),dype =int

将数据聚集到不相交的子集中。

sparse布尔,默认=假

用稀疏矩阵在内部计算权宜矩阵。

返回:
score浮子

由此产生的Fowlkes-Malows分数。

引用

[1]

E. B. Fowkles and C. L. Mallows, 1983. "A method for comparing two hierarchical clusterings". Journal of the American Statistical Association <https://www.tandfonline.com/doi/abs/10.1080/01621459.1983.10478008> _

[2]

Wikipedia entry for the Fowlkes-Mallows Index <https://en.wikipedia.org/wiki/Fowlkes-Mallows_index> _

示例

完美的标签既同质又完整,因此评分为1.0::

>>> from sklearn.metrics.cluster import fowlkes_mallows_score
>>> fowlkes_mallows_score([0, 0, 1, 1], [0, 0, 1, 1])
np.float64(1.0)
>>> fowlkes_mallows_score([0, 0, 1, 1], [1, 1, 0, 0])
np.float64(1.0)

如果类成员完全分散在不同的集群中,则分配是完全随机的,因此FMI为空::

>>> fowlkes_mallows_score([0, 0, 0, 0], [0, 1, 2, 3])
0.0