kmeans_plusplus#

sklearn.cluster.kmeans_plusplus(X, n_clusters, *, sample_weight=None, x_squared_norms=None, random_state=None, n_local_trials=None)[源代码]#

Init n_根据k均值++对种子进行集群。

Added in version 0.24.

参数:

X形状（n_samples，n_features）的{类数组，稀疏矩阵}: 从中挑选种子的数据。
n_clustersint: 要初始化的重心数量。
sample_weight形状类似数组（n_samples，），默认=无: 中每个观察的权重 X .如果 None ，所有观察结果都被赋予相同的权重。 sample_weight 如果 init 是可调用或用户提供的数组。

Added in version 1.3.
x_squared_norms形状类似数组（n_samples，），默认=无: 每个数据点的欧几里得规范平方。
random_stateint或RandomState实例，默认=无: 确定重心初始化的随机数生成。传递int以获得跨多个函数调用的可重复输出。看到 Glossary .
n_local_trialsint，默认=无: 每个中心（第一个除外）的播种试验次数，贪婪地选择其中最能降低惯性的那个。设置为无，使试验数量在数学上取决于种子数量（2+log（k）），这是推荐设置。设置为1会禁用贪婪集群选择并恢复香草k均值++算法，经验证明该算法的工作效果不如其贪婪变体。

返回:

centers形状的nd数组（n_classes，n_features）: k均值的初始中心。
indices形状的nd数组（n_clusters，）: 数据阵列X中所选中心的索引位置。对于给定的指数和中心，X [index] =中心。

注意到

以智能的方式初始化k均值集群的初始集群中心，以加速收敛。参见：Arthur，D.和瓦西维茨基，S.“k-means++：仔细播种的优势”。ACM-SIAM离散算法研讨会。2007

示例

>>> from sklearn.cluster import kmeans_plusplus
>>> import numpy as np
>>> X = np.array([[1, 2], [1, 4], [1, 0],
...               [10, 2], [10, 4], [10, 0]])
>>> centers, indices = kmeans_plusplus(X, n_clusters=2, random_state=0)
>>> centers
array([[10,  2],
       [ 1,  0]])
>>> indices
array([3, 2])