增量PCA#

当要分解的数据集太大而无法容纳内存时,增量主成分分析(IPCA)通常用作主成分分析(PCA)的替代品。IPCA使用独立于输入数据样本数量的内存量为输入数据构建低等级逼近。它仍然取决于输入数据功能,但更改批大小可以控制内存使用情况。

该示例用于视觉检查IPCA是否能够找到数据到PCA(符号翻转)的类似投影,同时一次仅处理几个样本。这可以被认为是一个“玩具示例”,因为IPCA旨在用于不适合主内存的大型数据集,需要增量方法。

  • Incremental PCA of iris dataset Mean absolute unsigned error 0.002201
  • PCA of iris dataset
# Authors: The scikit-learn developers
# SPDX-License-Identifier: BSD-3-Clause

import matplotlib.pyplot as plt
import numpy as np

from sklearn.datasets import load_iris
from sklearn.decomposition import PCA, IncrementalPCA

iris = load_iris()
X = iris.data
y = iris.target

n_components = 2
ipca = IncrementalPCA(n_components=n_components, batch_size=10)
X_ipca = ipca.fit_transform(X)

pca = PCA(n_components=n_components)
X_pca = pca.fit_transform(X)

colors = ["navy", "turquoise", "darkorange"]

for X_transformed, title in [(X_ipca, "Incremental PCA"), (X_pca, "PCA")]:
    plt.figure(figsize=(8, 8))
    for color, i, target_name in zip(colors, [0, 1, 2], iris.target_names):
        plt.scatter(
            X_transformed[y == i, 0],
            X_transformed[y == i, 1],
            color=color,
            lw=2,
            label=target_name,
        )

    if "Incremental" in title:
        err = np.abs(np.abs(X_pca) - np.abs(X_ipca)).mean()
        plt.title(title + " of iris dataset\nMean absolute unsigned error %.6f" % err)
    else:
        plt.title(title + " of iris dataset")
    plt.legend(loc="best", shadow=False, scatterpoints=1)
    plt.axis([-4, 4, -1.5, 1.5])

plt.show()

Total running time of the script: (0分0.166秒)

相关实例

Iris数据集的主成分分析(PCA)

Principal Component Analysis (PCA) on Iris Dataset

Iris数据集LDA和PCA 2D投影的比较

Comparison of LDA and PCA 2D projection of Iris dataset

绘制在虹膜数据集上训练的决策树的决策面

Plot the decision surface of decision trees trained on the iris dataset

使用Pipeline和GridSearchCV选择降维

Selecting dimensionality reduction with Pipeline and GridSearchCV

Gallery generated by Sphinx-Gallery <https://sphinx-gallery.github.io> _