randomized_svd#

sklearn.utils.extmath.randomized_svd(M, n_components, *, n_oversamples=10, n_iter='auto', power_iteration_normalizer='auto', transpose='auto', flip_sign=True, random_state=None, svd_lapack_driver='gesdd')[源代码]#

计算截短的随机MVD。

该方法解决了中描述的固定排序逼近问题 [1] （问题（1.5），p5）。

参阅维基百科主特征量对于使用权力迭代算法对网页进行排名的典型示例。众所周知，该算法还被用作Google PageRank算法的构建模块。

参数:

M{nd数组，稀疏矩阵}: 要分解的矩阵。
n_componentsint: 要提取的奇异值和载体的数量。
n_oversamplesint，默认值=10: Additional number of random vectors to sample the range of M so as to ensure proper conditioning. The total number of random vectors used to find the range of M is n_components + n_oversamples. Smaller number can improve speed but can negatively impact the quality of approximation of singular vectors and singular values. Users might wish to increase this parameter up to 2*k - n_components where k is the effective rank, for large matrices, noisy problems, matrices with slowly decaying spectrums, or to increase precision accuracy. See [1] (pages 5, 23 and 26).
n_iterint或'自动'，默认='自动': 电源迭代次数。它可以用于处理非常嘈杂的问题。当“自动”时，它被设置为4，除非 n_components 很小（< .1 * min（X.形状））在这种情况下 n_iter 设置为7。这提高了精度与几个组件。请注意，一般用户应该增加 n_oversamples 在增加之前 n_iter 因为随机化方法的原则是避免使用这些成本更高的功率迭代步骤。当 n_components 等于或大于有效矩阵阶并且频谱不呈现缓慢衰减， n_iter=0 或 1 理论上甚至应该运作良好（请参阅 [1] 第9页）。

在 0.18 版本发生变更.
power_iteration_normalizer' Auto '，' QR '，' LU '，&#3: 功率迭代是否通过逐步QR分解（最慢但最准确）进行规范化，“无”（最快但在数字上不稳定时 n_iter 很大，例如通常为5或更大），或“LU”因式分解（数字稳定，但准确性可能会略有下降）。如果出现以下情况，“自动”模式不应用规范化 n_iter <= 2，否则切换到LU。

Added in version 0.18.
transpose布尔或“自动”，默认=“自动”: 该算法是否应该应用于MT而不是M。结果应该大致相同。如果M.Shape，“自动”模式将触发移调 [1] > M.形状 [0] 因为在这种情况下，随机MVD的实现往往会更快一些。

在 0.18 版本发生变更.
flip_sign布尔，默认=True: 奇异值分解的输出仅在奇异载体符号的排列时才是唯一的。如果 flip_sign 设置为 True ，通过使左奇异载体中每个分量的最大负载为正值来解决符号模糊性。
random_stateint，RandomState实例或无，默认=' warn ': 伪随机数生成器的种子，用于在洗牌数据时使用，即获取随机载体来初始化算法。传递int以获得跨多个函数调用的可重复结果。看到 Glossary .

在 1.2 版本发生变更: 默认值从0更改为无。
svd_lapack_driver{“gesdd”，“gesdd”}，默认=“gesdd”: 是否使用更有效的分而治之的方法 ("gesdd" ）或更一般的矩形方法 ("gesvd" ）计算矩阵B的奇异值，这是M到低维子空间的投影，如中所述 [1].

Added in version 1.2.

返回:

u形状的nd数组（n_samples，n_components）: 具有符号翻转为列的左奇异向量的酉矩阵。
s形状的nd数组（n_components，）: 奇异值，按非增顺序排序。
vhndrow形状数组（n_components，n_features）: 具有符号翻转为行的右奇异向量的正矩阵。

注意到

该算法使用随机化来找到（通常非常好的）近似截断奇异值分解以加速计算。对于您只希望提取少量分量的大型矩阵，它的速度特别快。为了获得进一步的速度， n_iter 可以设置为<=2（以损失精度为代价）。为了提高精度，建议增加 n_oversamples ，高达 2*k-n_components 其中k是有效等级。通常， n_components 选择大于k，因此增加 n_oversamples 高达 n_components 应该够了

引用

[1] (1,2,3,4)

"Finding structure with randomness: Stochastic algorithms for constructing approximate matrix decompositions" Halko等人（2009）

[2]

矩阵分解的随机算法Per-Gunnar Martinsson、Vladimir Rokhlin和Mark Tygert

[3]

主成分分析随机算法的实现A。Szlam等人2014

示例

>>> import numpy as np
>>> from sklearn.utils.extmath import randomized_svd
>>> a = np.array([[1, 2, 3, 5],
...               [3, 4, 5, 6],
...               [7, 8, 9, 10]])
>>> U, s, Vh = randomized_svd(a, n_components=2, random_state=0)
>>> U.shape, s.shape, Vh.shape
((3, 2), (2,), (2, 4))