CalibratedClassifierCV#

class sklearn.calibration.CalibratedClassifierCV(estimator=None, *, method='sigmoid', cv=None, n_jobs=None, ensemble='auto')[源代码]#

使用等张回归或逻辑回归进行概率校准。

This class uses cross-validation to both estimate the parameters of a classifier and subsequently calibrate a classifier. With default ensemble=True, for each cv split it fits a copy of the base estimator to the training subset, and calibrates it using the testing subset. For prediction, predicted probabilities are averaged across these individual calibrated classifiers. When ensemble=False, cross-validation is used to obtain unbiased predictions, via cross_val_predict, which are then used for calibration. For prediction, the base estimator, trained using all the data, is used. This is the prediction method implemented when probabilities=True for SVC and NuSVC estimators (see User Guide for details).

可以通过将模型包裹在 FrozenEstimator .在这种情况下，所有提供的数据都用于校准。用户必须手动注意用于模型拟合和校准的数据是不相交的。

校准基于 decision_function 方法 estimator 如果存在，否则 predict_proba .

阅读更多的 User Guide .要了解有关CalibrantedClassifierCV类的更多信息，请参阅以下校准示例：分类器的概率校准 , 概率校准曲线，而且 sphx_glr_auto_examples_calibration_plot_calibration_multiclass.py .

参数:

estimator估计器实例，默认=无

需要校准其输出以提供更准确的分类器 predict_proba 产出默认分类器是 LinearSVC .

Added in version 1.2.

method' sigmoid '，'，'，默认=' sigmoid '

用于校准的方法。可以是“sigmoid”（对应于Platt方法（即逻辑回归模型）或“等张”（非参数方法）。不建议使用校准样本太少的等张校准 (<<1000) 因为它往往过于适合。

cvint，交叉验证生成器，或可迭代，默认=无

确定交叉验证拆分策略。简历的可能输入包括：

无，若要使用默认的5重交叉验证，
integer，用于指定折叠次数。
CV splitter ,
可迭代产出（训练、测试）分裂为索引数组。

对于integer/Non-输入，如果 y 是二进制或多类， StratifiedKFold 采用了如果 y 既不是二元也不是多元的， KFold 采用了

参阅 User Guide 这里可以使用的各种交叉验证策略。

在 0.22 版本发生变更: cv 如果无从3倍更改为5倍，则默认值。

在 1.6 版本发生变更: "prefit" 已经过时了使用 FrozenEstimator 而不是.

n_jobsint，默认=无

要并行运行的作业数。 None 意思是1，除非在a中 joblib.parallel_backend 上下文 -1 意味着使用所有处理器。

基本估计器克隆在交叉验证迭代中并行进行匹配。因此，并行性只有当 cv != "prefit" .

看到 Glossary 了解更多详细信息。

Added in version 0.24.

ensemblebool或“auto”，默认值=“auto”

确定校准品的安装方式。

“Auto”将使用 False 如果 estimator 是一 FrozenEstimator ，而且 True 否则。

如果 True ， estimator 使用训练数据进行匹配，并使用测试数据进行校准 cv 折最终的估计器是 n_cv 安装分类器和校准器对，其中 n_cv 是交叉验证折叠的数量。输出是所有对的平均预测概率。

如果 False , cv 用于计算无偏预测，通过 cross_val_predict ，然后用于校准。在预测时，使用的分类器是 estimator 根据所有数据进行训练。请注意，此方法也在 sklearn.svm 估计器与 probabilities=True 参数.

Added in version 0.24.

在 1.6 版本发生变更: "auto" 添加选项，并且是默认选项。

属性:

classes_形状的nd数组（n_classes，）

班级标签。

n_features_in_int

期间看到的功能数量 fit .仅在基础估计器在适合时暴露此类属性时才定义。

Added in version 0.24.

feature_names_in_ ：nd形状数组 (n_features_in_ ,)nd数组形状（

期间看到的要素的名称 fit .仅在基础估计器在适合时暴露此类属性时才定义。

Added in version 1.0.

calibrated_classifiers_ ：list（len（）等于cv或1，如果 ensemble=False )list（len（）等于cv或1，如果

分类器和校准器对的列表。

当 ensemble=True , n_cv 装配 estimator 和校准器对。 n_cv 是交叉验证折叠的数量。
当 ensemble=False ， estimator ，对所有数据进行了装配，并装配了校准器。

在 0.24 版本发生变更: 单一校准分类器情况， ensemble=False .

参见

calibration_curve: 计算校准曲线的真实概率和预测概率。

引用

[1]

从决策树和朴素Bayesian分类器中获得校准的概率估计，B。扎德罗兹尼& C.埃尔坎，ICML 2001

[2]

将分类器分数转化为准确的多类概率估计，B。扎德罗兹尼& C. Elkan，（KDD 2002）

[3]

支持向量机的概率输出以及与正规似然方法的比较，J. Platt，（1999）

[4]

预测良好的概率与监督学习，A。尼古列斯库-米齐尔& R.卡鲁阿纳，ICML 2005

示例

>>> from sklearn.datasets import make_classification
>>> from sklearn.naive_bayes import GaussianNB
>>> from sklearn.calibration import CalibratedClassifierCV
>>> X, y = make_classification(n_samples=100, n_features=2,
...                            n_redundant=0, random_state=42)
>>> base_clf = GaussianNB()
>>> calibrated_clf = CalibratedClassifierCV(base_clf, cv=3)
>>> calibrated_clf.fit(X, y)
CalibratedClassifierCV(...)
>>> len(calibrated_clf.calibrated_classifiers_)
3
>>> calibrated_clf.predict_proba(X)[:5, :]
array([[0.110..., 0.889...],
       [0.072..., 0.927...],
       [0.928..., 0.071...],
       [0.928..., 0.071...],
       [0.071..., 0.928...]])
>>> from sklearn.model_selection import train_test_split
>>> X, y = make_classification(n_samples=100, n_features=2,
...                            n_redundant=0, random_state=42)
>>> X_train, X_calib, y_train, y_calib = train_test_split(
...        X, y, random_state=42
... )
>>> base_clf = GaussianNB()
>>> base_clf.fit(X_train, y_train)
GaussianNB()
>>> from sklearn.frozen import FrozenEstimator
>>> calibrated_clf = CalibratedClassifierCV(FrozenEstimator(base_clf))
>>> calibrated_clf.fit(X_calib, y_calib)
CalibratedClassifierCV(...)
>>> len(calibrated_clf.calibrated_classifiers_)
1
>>> calibrated_clf.predict_proba([[-0.5, 0.5]])
array([[0.936..., 0.063...]])

fit(X, y, sample_weight=None, **fit_params)[源代码]#

适应校准的模型。

参数:

X形状类似阵列（n_samples，n_features）: 训练数据。
y形状类似阵列（n_samples，）: 目标值。
sample_weight形状类似数组（n_samples，），默认=无: 样本重量。如果无，则样本的加权相等。
**fit_paramsdict: 要传递给 fit 底层分类器的方法。

返回:

self对象: 返回自我的实例。

get_metadata_routing()[源代码]#

获取此对象的元数据路由。

请检查 User Guide 关于路由机制如何工作。

返回:

routingMetadataRouter: A MetadataRouter 封装路由信息。

get_params(deep=True)[源代码]#

获取此估计器的参数。

参数:

deep布尔，默认=True: 如果为True，将返回此估计量和包含的作为估计量的子对象的参数。

返回:

paramsdict: 参数名称映射到其值。

predict(X)[源代码]#

预测新样本的目标。

预测的类别是具有最高概率的类别，因此可能与未校准分类器的预测不同。

参数:

X形状类似阵列（n_samples，n_features）: 样品，经接受 estimator.predict .

返回:

C形状的nd数组（n_samples，）: 预测的班级。

predict_proba(X)[源代码]#

校准的分类概率。

This function returns calibrated probabilities of classification according to each class on an array of test vectors X.

参数:

X形状类似阵列（n_samples，n_features）: 样品，经接受 estimator.predict_proba .

返回:

C形状的nd数组（n_samples，n_classes）: 预测的问题。

score(X, y, sample_weight=None)[源代码]#

返回给定测试数据和标签的平均准确度。

在多标签分类中，这是子集准确度，这是一个苛刻的指标，因为您需要为每个样本正确预测每个标签集。

参数:

X形状类似阵列（n_samples，n_features）: 测试样本。
y形状的类似阵列（n_samples，）或（n_samples，n_outputs）: 真正的标签 X .
sample_weight形状类似数组（n_samples，），默认=无: 样本重量。

返回:

score浮子: 平均准确度 self.predict(X) w.r.t. y .

set_fit_request(*, sample_weight: bool | None | str = '$UNCHANGED$') → CalibratedClassifierCV[源代码]#

请求元数据传递给 fit 法

请注意，此方法仅适用于以下情况 enable_metadata_routing=True （见 sklearn.set_config ).请参阅 User Guide 关于路由机制如何工作。

The options for each parameter are:

True ：元数据被请求并传递给 fit 如果提供的话。如果未提供元数据，则会忽略请求。
False ：未请求元数据，元估计器不会将其传递给 fit .
None ：不请求元数据，如果用户提供元估计器，则元估计器将引发错误。
str ：元数据应通过此给定别名而不是原始名称传递给元估计器。

默认 (sklearn.utils.metadata_routing.UNCHANGED ）保留现有请求。这允许您更改某些参数的请求，而不是其他参数。

Added in version 1.3.

备注

只有当该估计器用作元估计器的子估计器时，该方法才相关，例如在 Pipeline .否则就没有效果了。

参数:

sample_weight字符串、真、假或无，默认=sklearn.utils. metalics_Routing.UNChanged: 元数据路由 sample_weight 参数 fit .

返回:

self对象: 更新的对象。

set_params(**params)[源代码]#

设置此估计器的参数。

该方法适用于简单估计器以及嵌套对象（例如 Pipeline ).后者具有以下形式的参数 <component>__<parameter> 以便可以更新嵌套对象的每个组件。

参数:

**paramsdict: 估计参数。

返回:

self估计器实例: 估计实例。

set_score_request(*, sample_weight: bool | None | str = '$UNCHANGED$') → CalibratedClassifierCV[源代码]#

请求元数据传递给 score 法

请注意，此方法仅适用于以下情况 enable_metadata_routing=True （见 sklearn.set_config ).请参阅 User Guide 关于路由机制如何工作。

The options for each parameter are:

True ：元数据被请求并传递给 score 如果提供的话。如果未提供元数据，则会忽略请求。
False ：未请求元数据，元估计器不会将其传递给 score .
None ：不请求元数据，如果用户提供元估计器，则元估计器将引发错误。
str ：元数据应通过此给定别名而不是原始名称传递给元估计器。

默认 (sklearn.utils.metadata_routing.UNCHANGED ）保留现有请求。这允许您更改某些参数的请求，而不是其他参数。

Added in version 1.3.

备注

只有当该估计器用作元估计器的子估计器时，该方法才相关，例如在 Pipeline .否则就没有效果了。

参数:

sample_weight字符串、真、假或无，默认=sklearn.utils. metalics_Routing.UNChanged: 元数据路由 sample_weight 参数 score .

返回:

self对象: 更新的对象。