决策树回归#

In this example, we demonstrate the effect of changing the maximum depth of a decision tree on how it fits to the data. We perform this once on a 1D regression task and once on a multi-output regression task.

# Authors: The scikit-learn developers
# SPDX-License-Identifier: BSD-3-Clause

1D回归任务的决策树#

在这里,我们在一维回归任务上拟合一棵树。

decision trees 用于在添加噪音观察的情况下匹配sin曲线。结果,它学习逼近sin曲线的局部线性回归。

我们可以看到,如果树的最大深度(由 max_depth 参数)设置得太高,决策树学习训练数据的太细细节并从噪音中学习,即它们过适应。

创建随机1D数据集#

import numpy as np

rng = np.random.RandomState(1)
X = np.sort(5 * rng.rand(80, 1), axis=0)
y = np.sin(X).ravel()
y[::5] += 3 * (0.5 - rng.rand(16))

匹配回归模型#

在这里,我们匹配了两个具有不同最大深度的模型

from sklearn.tree import DecisionTreeRegressor

regr_1 = DecisionTreeRegressor(max_depth=2)
regr_2 = DecisionTreeRegressor(max_depth=5)
regr_1.fit(X, y)
regr_2.fit(X, y)
DecisionTreeRegressor(max_depth=5)
In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.


预测#

获取测试集的预测

X_test = np.arange(0.0, 5.0, 0.01)[:, np.newaxis]
y_1 = regr_1.predict(X_test)
y_2 = regr_2.predict(X_test)

绘制结果#

import matplotlib.pyplot as plt

plt.figure()
plt.scatter(X, y, s=20, edgecolor="black", c="darkorange", label="data")
plt.plot(X_test, y_1, color="cornflowerblue", label="max_depth=2", linewidth=2)
plt.plot(X_test, y_2, color="yellowgreen", label="max_depth=5", linewidth=2)
plt.xlabel("data")
plt.ylabel("target")
plt.title("Decision Tree Regression")
plt.legend()
plt.show()
Decision Tree Regression

如您所见,深度为5(黄色)的模型学习训练数据的细节,使其过度适合噪音。另一方面,深度为2(蓝色)的模型可以很好地学习数据中的主要趋势,并且不会过度适应。在实际用例中,您需要确保树不会过度适应训练数据,这可以使用交叉验证来完成。

具有多输出目标的决策树回归#

这里 decision trees 用于同时预测噪音 xy 观察一个给定单一基本特征的圆。因此,它学习近似圆的局部线性回归。

我们可以看到,如果树的最大深度(由 max_depth 参数)设置得太高,决策树学习训练数据的太细细节并从噪音中学习,即它们过适应。

创建随机数据集#

rng = np.random.RandomState(1)
X = np.sort(200 * rng.rand(100, 1) - 100, axis=0)
y = np.array([np.pi * np.sin(X).ravel(), np.pi * np.cos(X).ravel()]).T
y[::5, :] += 0.5 - rng.rand(20, 2)

匹配回归模型#

regr_1 = DecisionTreeRegressor(max_depth=2)
regr_2 = DecisionTreeRegressor(max_depth=5)
regr_3 = DecisionTreeRegressor(max_depth=8)
regr_1.fit(X, y)
regr_2.fit(X, y)
regr_3.fit(X, y)
DecisionTreeRegressor(max_depth=8)
In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.


预测#

获取测试集的预测

X_test = np.arange(-100.0, 100.0, 0.01)[:, np.newaxis]
y_1 = regr_1.predict(X_test)
y_2 = regr_2.predict(X_test)
y_3 = regr_3.predict(X_test)

绘制结果#

plt.figure()
s = 25
plt.scatter(y[:, 0], y[:, 1], c="yellow", s=s, edgecolor="black", label="data")
plt.scatter(
    y_1[:, 0],
    y_1[:, 1],
    c="cornflowerblue",
    s=s,
    edgecolor="black",
    label="max_depth=2",
)
plt.scatter(y_2[:, 0], y_2[:, 1], c="red", s=s, edgecolor="black", label="max_depth=5")
plt.scatter(y_3[:, 0], y_3[:, 1], c="blue", s=s, edgecolor="black", label="max_depth=8")
plt.xlim([-6, 6])
plt.ylim([-6, 6])
plt.xlabel("target 1")
plt.ylabel("target 2")
plt.title("Multi-output Decision Tree Regression")
plt.legend(loc="best")
plt.show()
Multi-output Decision Tree Regression

如您所见,的价值越高 max_depth ,模型捕捉到的数据细节就越多。然而,该模型也过度适合数据,并且受到噪音的影响。

Total running time of the script: (0 minutes 0.227 seconds)

相关实例

使用AdaBoost进行决策树回归

Decision Tree Regression with AdaBoost

比较随机森林和多输出Meta估计量

Comparing random forests and the multi-output meta estimator

绘制在虹膜数据集上训练的决策树的决策面

Plot the decision surface of decision trees trained on the iris dataset

在虹膜数据集中绘制树木集合的决策面

Plot the decision surfaces of ensembles of trees on the iris dataset

Gallery generated by Sphinx-Gallery <https://sphinx-gallery.github.io> _