机器学习¶

../_images/34018729885_002ced9b54_k_d.jpg

Python有大量用于数据分析、统计和机器学习的库，这使得它成为许多数据科学家的首选语言。

下面列出了一些广泛用于机器学习和其他数据科学应用的软件包。

SciPy Stack¶

SciPy堆栈由一组核心助手包组成，用于数据科学中的统计分析和可视化数据。由于其大量的功能和易用性，栈被认为是大多数数据科学应用的必备条件。

Stack由以下包组成（指向给定文档的链接）：

该堆栈还捆绑了Python，但已从上面的列表中排除。

安装¶

要安装完整的堆栈或单个包，可以参考给出的说明 here。

NB: Anaconda 对于无缝地安装和维护数据科学包是非常首选和推荐的。

scikit学习¶

SciKit是一个免费的开放源码机器学习库，用于Python。它提供了现成的功能来实现许多算法，如线性回归、分类器、支持向量机、k均值、神经网络等。它还具有一些样本数据集，可直接用于培训和测试。

由于它的速度、健壮性和易用性，它是许多机器学习应用程序最广泛使用的库之一。

安装¶

通过PyPI：

pip install -U scikit-learn

通过Conda：

conda install scikit-learn

Scikit Learn还随附水蟒（如上所述）。有关更多安装说明，请参阅 this link。

例子¶

对于这个示例，我们在 Iris dataset 与Scikit Learn捆绑在一起。

该数据集获取了花的四个特征：萼片长度、萼片宽度、花瓣长度和花瓣宽度，并将它们分为三个花种（标签）：刚毛、花色或维吉尼亚。标签在数据集中表示为数字：0（setosa）、1（versicolor）和2（virginica）。

我们对IRIS数据集进行无序处理，并将其分为单独的训练和测试集，保留最后10个数据点用于测试，其余数据点用于训练。然后我们在训练集上训练分类器，并在测试集上进行预测。

from sklearn.datasets import load_iris
from sklearn import tree
from sklearn.metrics import accuracy_score
import numpy as np

#loading the iris dataset
iris = load_iris()

x = iris.data #array of the data
y = iris.target #array of labels (i.e answers) of each data entry

#getting label names i.e the three flower species
y_names = iris.target_names

#taking random indices to split the dataset into train and test
test_ids = np.random.permutation(len(x))

#splitting data and labels into train and test
#keeping last 10 entries for testing, rest for training

x_train = x[test_ids[:-10]]
x_test = x[test_ids[-10:]]

y_train = y[test_ids[:-10]]
y_test = y[test_ids[-10:]]

#classifying using decision tree
clf = tree.DecisionTreeClassifier()

#training (fitting) the classifier with the training set
clf.fit(x_train, y_train)

#predictions on the test dataset
pred = clf.predict(x_test)

print pred #predicted labels i.e flower species
print y_test #actual labels
print (accuracy_score(pred, y_test))*100 #prediction accuracy

因为我们是随机分割的，并且分类器在每次迭代中都训练，所以精度可能会有所不同。运行上述代码可以得到：

[0 1 1 1 0 2 0 2 2 2]
[0 1 1 1 0 2 0 2 2 2]
100.0

第一行包含由我们的分类器预测的测试数据的标签（即花卉种类），第二行包含数据集中给出的实际花卉种类。因此这次我们得到了100%的准确度。

有关scikit的更多信息，请参阅 documentation。

机器学习¶

SciPy Stack¶

安装¶

scikit学习¶

安装¶

例子¶

O'Reilly Book

Translations

目录

Related Topics