例子利用 FrozenEstimator#

此示例展示了 FrozenEstimator .

FrozenEstimator 是一个实用类,允许冻结合适的估计器。例如,当我们想要将匹配的估计量传递给元估计量时,这很有用,例如 FixedThresholdClassifier 而不让元估计器重新适应估计器。

# Authors: The scikit-learn developers
# SPDX-License-Identifier: BSD-3-Clause

为预适应的分类器设置决策阈值#

scikit-learn中的拟合分类器使用任意决策阈值来决定给定样本属于哪个类别。决策阈值为 0.0 关于返回的值 decision_function ,或者 0.5 关于返回的概率 predict_proba .

然而,人们可能需要设置自定义决策阈值。我们可以通过使用 FixedThresholdClassifier 并用以下内容包裹分类器 FrozenEstimator .

from sklearn.datasets import make_classification
from sklearn.frozen import FrozenEstimator
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import FixedThresholdClassifier, train_test_split

X, y = make_classification(n_samples=1000, random_state=0)
X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=0)
classifier = LogisticRegression().fit(X_train, y_train)

print(
    "Probability estimates for three data points:\n"
    f"{classifier.predict_proba(X_test[-3:]).round(3)}"
)
print(
    "Predicted class for the same three data points:\n"
    f"{classifier.predict(X_test[-3:])}"
)
Probability estimates for three data points:
[[0.18 0.82]
 [0.29 0.71]
 [0.   1.  ]]
Predicted class for the same three data points:
[1 1 1]

现在假设您想要对概率估计设置不同的决策阈值。我们可以通过将分类器包裹起来来做到这一点 FrozenEstimator 并将其传递给 FixedThresholdClassifier .

threshold_classifier = FixedThresholdClassifier(
    estimator=FrozenEstimator(classifier), threshold=0.9
)

Note that in the above piece of code, calling fit on FixedThresholdClassifier does not refit the underlying classifier.

现在,让我们看看预测相对于概率阈值的变化。

print(
    "Probability estimates for three data points with FixedThresholdClassifier:\n"
    f"{threshold_classifier.predict_proba(X_test[-3:]).round(3)}"
)
print(
    "Predicted class for the same three data points with FixedThresholdClassifier:\n"
    f"{threshold_classifier.predict(X_test[-3:])}"
)
Probability estimates for three data points with FixedThresholdClassifier:
[[0.18 0.82]
 [0.29 0.71]
 [0.   1.  ]]
Predicted class for the same three data points with FixedThresholdClassifier:
[0 0 1]

我们看到概率估计保持不变,但由于使用了不同的决策阈值,因此预测的类别也不同。

请参阅 sphx_glr_auto_examples_model_selection_plot_cost_sensitive_learning.py 了解成本敏感的学习和决策阈值调整。

预拟合分类器的校准#

您可以使用 FrozenEstimator 校准预拟合分类器, CalibratedClassifierCV .

from sklearn.calibration import CalibratedClassifierCV
from sklearn.metrics import brier_score_loss

calibrated_classifier = CalibratedClassifierCV(
    estimator=FrozenEstimator(classifier)
).fit(X_train, y_train)

prob_pos_clf = classifier.predict_proba(X_test)[:, 1]
clf_score = brier_score_loss(y_test, prob_pos_clf)
print(f"No calibration: {clf_score:.3f}")

prob_pos_calibrated = calibrated_classifier.predict_proba(X_test)[:, 1]
calibrated_score = brier_score_loss(y_test, prob_pos_calibrated)
print(f"With calibration: {calibrated_score:.3f}")
No calibration: 0.033
With calibration: 0.032

Total running time of the script: (0分0.032秒)

相关实例

scikit-learn 1.5的发布亮点

Release Highlights for scikit-learn 1.5

分类器的概率校准

Probability calibration of classifiers

事后调整决策函数的截止点

Post-hoc tuning the cut-off point of decision function

概率校准曲线

Probability Calibration curves

Gallery generated by Sphinx-Gallery <https://sphinx-gallery.github.io> _