版本0.23#

有关该版本主要亮点的简短描述，请参阅 scikit-learn 0.23的发布亮点 .

换象传说

Major Feature 一些你以前做不到的大事。
Feature 一些你以前做不到的事情。
Efficiency 现有功能现在可能不需要那么多的计算或内存。
Enhancement 一个杂七杂八的小改进。
Fix 以前没有按照记录或合理预期发挥作用的事情现在应该起作用了。
API Change 您需要更改您的代码才能在将来产生相同的效果;或者将来将删除某个功能。

版本0.23.2#

更改型号#

以下估计量和函数在与相同的数据和参数进行匹配时，可能会产生与之前版本不同的模型。这种情况通常是由于建模逻辑（错误修复或增强）或随机抽样过程的变化而发生的。

Fix inertia_ 属性 cluster.KMeans 和 cluster.MiniBatchKMeans .

详细信息见下面的变更日志。

（虽然我们试图通过提供这些信息来更好地告知用户，但我们不能保证此列表是完整的。

Changelog#

`sklearn.cluster`#

Fix 修复了中的一个错误 cluster.KMeans 其中舍入错误可能会阻止在以下情况下宣布收敛 tol=0 . #17959 通过 Jérémie du Boisberranger .
Fix 修复了中的一个错误 cluster.KMeans 和 cluster.MiniBatchKMeans 其中报告的惯性被样本权重错误加权。 #17848 通过 Jérémie du Boisberranger .
Fix 修复了中的一个错误 cluster.MeanShift 与 bin_seeding=True .当估计带宽为0时，行为相当于 bin_seeding=False . #17742 通过 Jeremie du Boisberranger .
Fix Fixed a bug in cluster.AffinityPropagation, that gives incorrect clusters when the array dtype is float32. #17995 by Thomaz Santana and Amanda Dsouza.

`sklearn.decomposition`#

Fix 修复了中的一个错误 decomposition.MiniBatchDictionaryLearning.partial_fit 它应该通过在迷你批处理上仅迭代一次来更新字典。 #17433 通过 Chiara Marmo .
Fix 避免Windows上溢出 decomposition.IncrementalPCA.partial_fit 大型 batch_size 和 n_samples 价值观 #17985 通过 Alan Butler 和 Amanda Dsouza .

`sklearn.ensemble`#

Fix 修复了 ensemble.MultinomialDeviance 其中logloss的平均值被错误地计算为logloss的总和。 #17694 通过 Markus Rempfler 和 Tsutomu Kusanagi .
Fix 修复 ensemble.StackingClassifier 和 ensemble.StackingRegressor 与未定义的估计器的兼容性 n_features_in_. #17357 by Thomas Fan .

`sklearn.feature_extraction`#

Fix 修复错误 feature_extraction.text.CountVectorizer 当样本顺序不变性被破坏时， max_features was set and features had the same count. #18016 by Thomas Fan , Roman Yurchak ，而且 Joel Nothman .

`sklearn.linear_model`#

Fix linear_model.lars_path 不会覆盖 X when X_copy=True and Gram='auto'. #17914 by Thomas Fan .

`sklearn.manifold`#

Fix 修复了一个错误， metrics.pairwise_distances 如果 metric='seuclidean' 和 X 不是类型 np.float64 . #15730 通过 Forrest Koch .

`sklearn.metrics`#

Fix 修复了中的一个错误 metrics.mean_squared_error 其中，多个RSSE值的平均值被错误地计算为多个SSE值的平均值的根。 #17309 通过 Swier Heeres .

`sklearn.pipeline`#

Fix pipeline.FeatureUnion 当出现时，会发出弃用警告 None is included in transformer_list. #17360 by Thomas Fan .

`sklearn.utils`#

Fix 修复 utils.estimator_checks.check_estimator 以便所有测试用例都支持 binary_only 估计器标签。 #17812 通过 Bruno Charron .

版本0.23.1#

May 18 2020

Changelog#

`sklearn.cluster`#

Efficiency cluster.KMeans 对于非常小的数据集提高了效率。特别是它不能再产生空闲线程。 #17210 和 #17235 通过 Jeremie du Boisberranger .
Fix 修复了中的一个错误 cluster.KMeans 其中由用户提供的样本权重被适当地修改。 #17204 通过 Jeremie du Boisberranger .

杂项#

Fix 修复了 repr of third-party estimators that use a **kwargs parameter in their constructor, when changed_only is True which is now the default. #17205 by Nicolas Hug .

版本0.23.0#

May 12 2020

强制仅关键字参数#

为了促进库的清晰和明确的使用，大多数构造函数和函数参数现在预计将作为关键字参数传递（即使用 param=value syntax) instead of positional. To ease the transition, a FutureWarning is raised if a keyword-only parameter is used as positional. In version 1.0 (renaming of 0.25), these parameters will be strictly keyword-only, and a TypeError will be raised. #15005 by Joel Nothman , Adrin Jalali , Thomas Fan ，而且 Nicolas Hug .看到 SLEP009 了解更多详细信息。

更改型号#

以下估计量和函数在与相同的数据和参数进行匹配时，可能会产生与之前版本不同的模型。这种情况通常是由于建模逻辑（错误修复或增强）或随机抽样过程的变化而发生的。

Fix ensemble.BaggingClassifier , ensemble.BaggingRegressor ，而且 ensemble.IsolationForest .
Fix cluster.KMeans 与 algorithm="elkan" 和 algorithm="full" .
Fix cluster.Birch
Fix compose.ColumnTransformer.get_feature_names
Fix compose.ColumnTransformer.fit
Fix datasets.make_multilabel_classification
Fix decomposition.PCA 与 n_components='mle'
Enhancement decomposition.NMF 和 decomposition.non_negative_factorization float32 dtype输入
Fix decomposition.KernelPCA.inverse_transform
API Change ensemble.HistGradientBoostingClassifier 和 ensemble.HistGradientBoostingRegressor
Fix estimator_samples_ 在 ensemble.BaggingClassifier , ensemble.BaggingRegressor 和 ensemble.IsolationForest
Fix ensemble.StackingClassifier 和 ensemble.StackingRegressor 与 sample_weight
Fix gaussian_process.GaussianProcessRegressor
Fix linear_model.RANSACRegressor 与 sample_weight .
Fix linear_model.RidgeClassifierCV
Fix metrics.mean_squared_error 与 squared 和 multioutput='raw_values' .
Fix metrics.mutual_info_score 负分数。
Fix metrics.confusion_matrix 长度为零 y_true 和 y_pred
Fix neural_network.MLPClassifier
Fix preprocessing.StandardScaler 与 partial_fit 和稀疏输入。
Fix preprocessing.Normalizer 带有norm =' max '
Fix 任何使用 svm.libsvm 或 svm.liblinear 求解器，包括 svm.LinearSVC , svm.LinearSVR , svm.NuSVC , svm.NuSVR , svm.OneClassSVM , svm.SVC , svm.SVR , linear_model.LogisticRegression .
Fix tree.DecisionTreeClassifier , tree.ExtraTreeClassifier 和 ensemble.GradientBoostingClassifier 以及 predict 方法 tree.DecisionTreeRegressor , tree.ExtraTreeRegressor ，而且 ensemble.GradientBoostingRegressor 和只读float 32输入 predict , decision_path 和 predict_proba .

详细信息见下面的变更日志。

（虽然我们试图通过提供这些信息来更好地告知用户，但我们不能保证此列表是完整的。

Changelog#

`sklearn.cluster`#

Efficiency cluster.Birch 预测方法的实现通过使用分块方案计算距离矩阵来避免高内存占用。 #16149 通过 Jeremie du Boisberranger 和 Alex Shacked .
Efficiency Major Feature The critical parts of cluster.KMeans have a more optimized implementation. Parallelism is now over the data instead of over initializations allowing better scalability. #11950 by Jeremie du Boisberranger.
Enhancement cluster.KMeans 现在支持稀疏数据时 solver = "elkan" . #11950 通过 Jeremie du Boisberranger .
Enhancement cluster.AgglomerativeClustering 具有更快、更高效的单链接集群实现。 #11514 通过 Leland McInnes .
Fix cluster.KMeans 与 algorithm="elkan" 现在与 tol=0 与默认情况一样 algorithm="full" . #16075 通过 Erich Schubert .
Fix 修复了中的一个错误 cluster.Birch 其中 n_clusters 参数不能有 np.int64 类型. #16484 通过 Jeremie du Boisberranger .
Fix cluster.AgglomerativeClustering add specific error when distance matrix is not square and affinity=precomputed. #16257 by Simona Maggio.
API Change 的 n_jobs 参数 cluster.KMeans , cluster.SpectralCoclustering 和 cluster.SpectralBiclustering 已经过时了他们现在使用基于BEP的并行性。有关如何控制线程数的更多详细信息，请参阅我们的并行性 notes. #11950 通过 Jeremie du Boisberranger .
API Change 的 precompute_distances 参数 cluster.KMeans 已经过时了它没有任何作用。 #11950 通过 Jeremie du Boisberranger .
API Change 的 random_state 参数已添加到 cluster.AffinityPropagation . #16801 通过 @rcwoolston 和 Chiara Marmo .

`sklearn.compose`#

Efficiency compose.ColumnTransformer 现在，在处理收件箱时速度更快，字符串用于转换器的特定数据子集。 #16431 通过 Thomas Fan .
Enhancement compose.ColumnTransformer 方法 get_feature_names 现在支持 'passthrough' 列，要素名称要么是框架的列名称，要么 'xi' 对于列索引 i . #14048 通过 Lewis Ball .
Fix compose.ColumnTransformer 方法 get_feature_names 现在，当其中一个Transformer步骤应用于空列列表时，返回正确的结果 #15963 通过 Roman Yurchak .
Fix compose.ColumnTransformer.fit 当选择一个在框架中不唯一的列名时，将出错。 #16431 通过 Thomas Fan .

`sklearn.datasets`#

Efficiency datasets.fetch_openml 减少了内存使用，因为它不再将完整的数据集文本流存储在内存中。 #16084 通过 Joel Nothman .
Feature datasets.fetch_california_housing 现在通过设置使用pandas支持异类数据 as_frame=True . #15950 通过 Stephanie Andrews 和 Reshama Shaikh .
Feature 嵌入式数据集加载器 datasets.load_breast_cancer , datasets.load_diabetes , datasets.load_digits , datasets.load_iris , datasets.load_linnerud 和 datasets.load_wine 现在支持作为熊猫加载 DataFrame 通过设置 as_frame=True . #15980 通过 @wconnell 和 Reshama Shaikh .
Enhancement 添加 return_centers 参数 datasets.make_blobs ，可用于返回每个集群的中心。 #15709 通过 @shivamgargsya 和 Venkatachalam N .
Enhancement 功能 datasets.make_circles 和 datasets.make_moons 现在接受二元多元组。 #15707 通过 Maciej J Mikulski .
Fix datasets.make_multilabel_classification 现在产生 ValueError 争论 n_classes < 1 OR length < 1 . #16006 通过 Rushabh Vasani .
API Change 的 StreamHandler 被免去 sklearn.logger 避免在处理程序附加到根记录器的常见情况下重复记录消息，并遵循Python日志文档对库的建议，将日志消息处理留给用户和应用程序代码。 #16451 通过 Christoph Deil .

`sklearn.decomposition`#

Enhancement decomposition.NMF 和 decomposition.non_negative_factorization 现在保留float 32 dype。 #16280 通过 Jeremie du Boisberranger .
Enhancement decomposition.TruncatedSVD.transform 现在在给定的稀疏上速度更快 csc 矩阵。 #16837 通过 @wornbb .
Fix decomposition.PCA 有花车 n_components 参数，将专门选择解释方差大于 n_components . #15669 通过 Krishna Chaitanya
Fix decomposition.PCA 与 n_components='mle' now correctly handles small eigenvalues, and does not infer 0 as the correct number of components. #16224 by Lisa Schwetlick, and Gelavizh Ahmadi and Marija Vlajic Wheeler and #16841 by Nicolas Hug .
Fix decomposition.KernelPCA 方法 inverse_transform 现在将正确的逆变换应用于变换后的数据。 #16655 通过 Lewis Ball .
Fix 修复了导致 decomposition.KernelPCA 有时候， invalid value encountered in multiply 期间 fit . #16718 通过 Gui Miotto .
Feature 添加 n_components_ 属性来 decomposition.SparsePCA 和 decomposition.MiniBatchSparsePCA . #16981 通过 Mateusz Górski .

`sklearn.ensemble`#

Major Feature ensemble.HistGradientBoostingClassifier and ensemble.HistGradientBoostingRegressor now support sample_weight. #14696 by Adrin Jalali and Nicolas Hug.
Feature 提前入住 ensemble.HistGradientBoostingClassifier 和 ensemble.HistGradientBoostingRegressor 现在决定采用新的 early_stopping 参数而不是 n_iter_no_change .默认值为“自动”，如果训练集中至少有10，000个样本，则可以提前停止。 #14516 通过 Johann Faouzi .
Major Feature ensemble.HistGradientBoostingClassifier and ensemble.HistGradientBoostingRegressor now support monotonic constraints, useful when features are supposed to have a positive/negative effect on the target. #15582 by Nicolas Hug.
API Change 添加布尔值 verbose 班级标志： ensemble.VotingClassifier 和 ensemble.VotingRegressor . #16069 通过 Sam Bail , Hanna Bruce MacDonald , Reshama Shaikh ，而且 Chiara Marmo .
API Change 修复了中的一个错误 ensemble.HistGradientBoostingClassifier 和 ensemble.HistGradientBoostingRegressor 不尊重 max_leaf_nodes parameter if the criteria was reached at the same time as the max_depth criteria. #16183 by Nicolas Hug .
Fix 将惯例改为 max_depth 参数 ensemble.HistGradientBoostingClassifier 和 ensemble.HistGradientBoostingRegressor .现在，深度对应于从根到最深叶的边数。现在允许砍伐树桩（有一个裂缝的树木）。 #16182 通过 Santhosh B
Fix Fixed a bug in ensemble.BaggingClassifier, ensemble.BaggingRegressor and ensemble.IsolationForest where the attribute estimators_samples_ did not generate the proper indices used during fit. #16437 by Jin-Hwan CHO.
Fix 修复了中的一个错误 ensemble.StackingClassifier 和 ensemble.StackingRegressor 其中 sample_weight 争论没有被传递给 cross_val_predict 当在交叉验证折叠上评估基本估计量以获得Meta估计量的输入时。 #16539 通过 Bill DeRose .
Feature 添加额外选项 loss="poisson" 到 ensemble.HistGradientBoostingRegressor ，它添加了泊松偏差与对数链接有用的建模计数数据。 #16692 通过 Christian Lorentzen
Fix 修复了一个错误， ensemble.HistGradientBoostingRegressor 和 ensemble.HistGradientBoostingClassifier 如果需要多次调用才能适应，就会失败 warm_start=True, early_stopping=True, and there is no validation set. #16663 by Thomas Fan .

`sklearn.feature_extraction`#

Efficiency feature_extraction.text.CountVectorizer 现在，在按文档频率修剪功能后对功能进行排序。这提高了具有大量词汇表的数据集的性能 min_df 或 max_df . #15834 通过 Santiago M. Mola .

`sklearn.feature_selection`#

Enhancement 中添加了对多输出数据的支持 feature_selection.RFE 和 feature_selection.RFECV . #16103 通过 Divyaprabha M .
API Change 添加 feature_selection.SelectorMixin 回到公共API。 #16132 通过 @trimeta .

`sklearn.gaussian_process`#

Enhancement gaussian_process.kernels.Matern 当情况下，返回RBS内核 nu=np.inf . #15503 通过 Sam Dixon .
Fix 修复了 gaussian_process.GaussianProcessRegressor 这导致当不使用WhiteKiller时预测的标准差仅在0和1之间。 #15782 通过 @plgreenLIRU .

`sklearn.impute`#

Enhancement impute.IterativeImputer 接受纯量和类数组输入 max_value 和 min_value .类阵列输入允许为每个特征指定不同的最大值和最小值。 #16403 通过 Narendra Mukherjee .
Enhancement impute.SimpleImputer , impute.KNNImputer ，而且 impute.IterativeImputer 接受pandas的具有缺失值的可为空的integer dype。 #16508 通过 Thomas Fan .

`sklearn.inspection`#

Feature inspection.partial_dependence 和 inspection.plot_partial_dependence now support the fast 'recursion' method for ensemble.RandomForestRegressor and tree.DecisionTreeRegressor. #15864 by Nicolas Hug .

`sklearn.linear_model`#

Major Feature 添加了具有非正态误差分布的广义线性模型（GLM），包括 linear_model.PoissonRegressor , linear_model.GammaRegressor 和 linear_model.TweedieRegressor 它们分别使用Poisson、Gamma和Tweedie分布。 #14300 通过 Christian Lorentzen , Roman Yurchak ，而且 Olivier Grisel .
Major Feature 支持 sample_weight 在 linear_model.ElasticNet 和 linear_model.Lasso 对于密集特征矩阵 X . #15436 通过 Christian Lorentzen .
Efficiency linear_model.RidgeCV 和 linear_model.RidgeClassifierCV 现在，不要分配潜在的大型数组来存储所有超参数的双系数 fit ，也不是存储所有错误或LOO预测的数组，除非 store_cv_values 是 True . #15652 通过 Jérôme Dockès .
Enhancement linear_model.LassoLars 和 linear_model.Lars 现在支持 jitter 向目标添加随机噪音的参数。这可能有助于某些边缘情况下的稳定性。 #15179 通过 @angelaambroz .
Fix 修复了一个错误 sample_weight 参数被传递给的fit方法 linear_model.RANSACRegressor ，它不会传递给包裹的 base_estimator 在最终模型的装配过程中。 #15773 通过 Jeremy Alexandre .
Fix 添加 best_score_ 属性来 linear_model.RidgeCV 和 linear_model.RidgeClassifierCV . #15655 通过 Jérôme Dockès .
Fix 修复了中的一个错误 linear_model.RidgeClassifierCV 通过特定的评分策略。在内部估计器输出分数而不是预测之前。 #14848 通过 Venkatachalam N .
Fix linear_model.LogisticRegression 现在将避免不必要的迭代时 solver='newton-cg' 通过检查较差或相等，而不是严格较差的最大值 absgrad 和 tol 在 utils.optimize._newton_cg . #16266 通过 Rushabh Vasani .
API Change 已弃用的公共属性 standard_coef_ , standard_intercept_ , average_coef_ ，而且 average_intercept_ 在 linear_model.SGDClassifier , linear_model.SGDRegressor , linear_model.PassiveAggressiveClassifier , linear_model.PassiveAggressiveRegressor . #16261 通过 Carlos Brandt .
Fix Efficiency linear_model.ARDRegression is more stable and much faster when n_samples > n_features. It can now scale to hundreds of thousands of samples. The stability fix might imply changes in the number of non-zero coefficients and in the predicted output. #16849 by Nicolas Hug.
Fix 修复了中的一个错误 linear_model.ElasticNetCV , linear_model.MultiTaskElasticNetCV , linear_model.LassoCV 和 linear_model.MultiTaskLassoCV 当使用joblib loky后台时，匹配会失败。 #14264 通过 Jérémie du Boisberranger .
Efficiency 加快 linear_model.MultiTaskLasso , linear_model.MultiTaskLassoCV , linear_model.MultiTaskElasticNet , linear_model.MultiTaskElasticNetCV 通过避免在小型数组上执行较慢的BLAS Level 2调用， #17021 通过 Alex Gramfort 和 Mathurin Massias .

`sklearn.metrics`#

Enhancement metrics.pairwise_distances_chunked 现在允许其 reduce_func 不具有返回值，从而实现就地操作。 #16397 通过 Joel Nothman .
Fix 修复了中的一个错误 metrics.mean_squared_error 不要忽视争论 squared 争论时 multioutput='raw_values' . #16323 通过 Rushabh Vasani
Fix 修复了中的一个错误 metrics.mutual_info_score 其中可以返回负分数。 #16362 通过 Thomas Fan .
Fix 修复了中的一个错误 metrics.confusion_matrix 这将引发错误时 y_true 和 y_pred 长度为零， labels 不是 None .此外，当向 labels 参数. #16442 通过 Kyle Parsons .
API Change 中的值的格式 metrics.ConfusionMatrixDisplay.plot 和 metrics.plot_confusion_matrix to pick the shorter format (either '2g' or 'd'). #16159 by Rick Mackenbach and Thomas Fan .
API Change 从0.25版本开始， metrics.pairwise_distances 将不再自动计算 VI Mahalanobis距离和 V 塞几里德距离的参数，如果 Y 已通过。用户将需要根据他们选择的训练数据计算此参数并将其传递给 pairwise_distances. #16993 by Joel Nothman .

`sklearn.model_selection`#

Enhancement model_selection.GridSearchCV 和 model_selection.RandomizedSearchCV 除了之前发出的类型和详细信息外，还生成适合失败警告消息的堆栈跟踪信息。 #15622 通过 Gregory Morse .
Fix model_selection.cross_val_predict 支持 method="predict_proba" 当 y=None . #15918 通过 Luca Kubin .
Fix model_selection.fit_grid_point 已在0.23中废弃，并将在0.25中删除。 #16401 通过 Arie Pratama Sutiono

`sklearn.multioutput`#

Feature multioutput.MultiOutputRegressor.fit 和 multioutput.MultiOutputClassifier.fit 现在可以接受 fit_params 要传递给 estimator.fit 每一步的方法。 #15953 #15959 通过 Ke Huang .
Enhancement multioutput.RegressorChain 现在支持 fit_params 为 base_estimator 期间 fit . #16111 通过 Venkatachalam N .

`sklearn.naive_bayes`#

Fix 格式正确的错误消息显示在中 naive_bayes.CategoricalNB 当输入中的要素数量不同时 predict 和 fit . #16090 通过 Madhura Jayaratne .

`sklearn.neural_network`#

Efficiency neural_network.MLPClassifier 和 neural_network.MLPRegressor 使用随机求解器时减少了内存占用， 'sgd' 或 'adam' ，而且 shuffle=True . #14075 通过 @meyer89 .
Fix 增加逻辑损失函数的数字稳定性 neural_network.MLPClassifier 通过削减可能性。 #16117 通过 Thomas Fan .

`sklearn.inspection`#

Enhancement inspection.PartialDependenceDisplay 现在将十分位线公开为属性，以便可以隐藏或自定义它们。 #15785 通过 Nicolas Hug

`sklearn.preprocessing`#

Feature 论点 drop 的 preprocessing.OneHotEncoder 现在将接受值“if_binary”，并将删除每个功能的第一个类别，包含两个类别。 #16245 通过 Rushabh Vasani .
Enhancement preprocessing.OneHotEncoder 的 drop_idx_ ndray现在可以包含 None ，在哪里 drop_idx_[i] = None 意味着索引没有删除任何类别 i . #16585 通过 Chiara Marmo .
Enhancement preprocessing.MaxAbsScaler , preprocessing.MinMaxScaler , preprocessing.StandardScaler , preprocessing.PowerTransformer , preprocessing.QuantileTransformer , preprocessing.RobustScaler 现在支持pandas的具有缺失值的可空integer dype。 #16508 通过 Thomas Fan .
Efficiency preprocessing.OneHotEncoder 现在转变速度更快。 #15762 通过 Thomas Fan .
Fix Fix a bug in preprocessing.StandardScaler which was incorrectly computing statistics when calling partial_fit on sparse inputs. #16466 by Guillaume Lemaitre.
Fix 修复中的错误 preprocessing.Normalizer 其中norm =' max '，在规范化载体之前不取最大值的绝对值。 #16632 通过 Maura Pintor 和 Battista Biggio .

`sklearn.semi_supervised`#

Fix semi_supervised.LabelSpreading 和 semi_supervised.LabelPropagation 规范化时避免除以零警告 label_distributions_ . #15946 通过 @ngshya .

`sklearn.svm`#

Fix Efficiency Improved libsvm and liblinear random number generators used to randomly select coordinates in the coordinate descent algorithms. Platform-dependent C rand() was used, which is only able to generate numbers up to 32767 on windows platform (see this blog post) and also has poor randomization power as suggested by this presentation. It was replaced with C++11 mt19937, a Mersenne Twister that correctly generates 31bits/63bits random numbers on all platforms. In addition, the crude "modulo" postprocessor used to get a random number in a bounded interval was replaced by the tweaked Lemire method as suggested by this blog post. Any model using the svm.libsvm or the svm.liblinear solver, including svm.LinearSVC, svm.LinearSVR, svm.NuSVC, svm.NuSVR, svm.OneClassSVM, svm.SVC, svm.SVR, linear_model.LogisticRegression, is affected. In particular users can expect a better convergence when the number of samples (LibSVM) or the number of features (LibLinear) is large. #13511 by Sylvain Marié.
Fix 修复了使用自定义内核不接受浮点项（例如字符串内核）的问题 svm.SVC 和 svm.SVR .请注意，自定义内核现在需要验证它们之前接收到有效数值数组的输入。 #11296 通过 Alexandre Gramfort 和 Georgi Peev .
API Change svm.SVR 和 svm.OneClassSVM 属性， probA_ and probB_, are now deprecated as they were not useful. #15558 by Thomas Fan .

`sklearn.tree`#

Fix tree.plot_tree rotate 参数未使用并且已被弃用。 #15806 通过 Chiara Marmo .
Fix 修复了对只读float 32数组输入的支持 predict , decision_path 和 predict_proba 方法 tree.DecisionTreeClassifier , tree.ExtraTreeClassifier 和 ensemble.GradientBoostingClassifier 以及 predict 方法 tree.DecisionTreeRegressor , tree.ExtraTreeRegressor ，而且 ensemble.GradientBoostingRegressor . #16331 通过 Alexandre Batisse .

`sklearn.utils`#

Major Feature 现在可以用丰富的html表示来显示估计器。这可以通过设置在Deliveryter笔记本中启用 display='diagram' in set_config. The raw html can be returned by using utils.estimator_html_repr. #14180 by Thomas Fan .
Enhancement 改进中的错误消息 utils.validation.column_or_1d . #15926 通过 Loïc Estève .
Enhancement 中添加警告信息 utils.check_array pandas sparse DataFrame。 #16021 通过 Rushabh Vasani .
Enhancement utils.check_array 现在从pandas DataFrame构建一个稀疏矩阵，该矩阵仅包含 SparseArray columns. #16728 by Thomas Fan .
Enhancement utils.check_array 当情况下，支持pandas的可为空的integer d类型，但缺少值 force_all_finite is set to False or 'allow-nan' in which case the data is converted to floating point values where pd.NA values are replaced by np.nan. As a consequence, all sklearn.preprocessing transformers that accept numeric inputs with missing values represented as np.nan now also accepts being directly fed pandas dataframes with pd.Int* or `pd.Uint* typed columns that use pd.NA as a missing value marker. #16508 by Thomas Fan .
API Change 通过课程到 utils.estimator_checks.check_estimator 和 utils.estimator_checks.parametrize_with_checks 现已废弃，对类的支持将在0.24中删除。改为传递实例。 #17032 通过 Nicolas Hug .
API Change 私人公用事业 _safe_tags in utils.estimator_checks was removed, hence all tags should be obtained through estimator._get_tags(). Note that Mixins like RegressorMixin must come before base classes in the MRO for _ get_tags（）'以正常工作。 :pr:`16950 通过 Nicolas Hug .
Fix utils.all_estimators now only returns public estimators. #15380 by Thomas Fan .

杂项#

Major Feature 添加要在jupyter笔记本或实验室中显示的估计量的HTML表示。通过设置 display option in sklearn.set_config. #14180 by Thomas Fan .
Enhancement scikit-learn 目前拥有 mypy 没有错误。 #16726 通过 Roman Yurchak .
API Change 大多数估计者现在暴露了 n_features_in_ attribute. This attribute is equal to the number of features passed to the fit method. See SLEP010 有关详细信息 #16112 通过 Nicolas Hug .
API Change 估算者现在有了一个 requires_y tags which is False by default except for estimators that inherit from ~sklearn.base.RegressorMixin or ~sklearn.base.ClassifierMixin. This tag is used to ensure that a proper error message is raised when y was expected but None was passed. #16622 by Nicolas Hug .
API Change 默认设置 print_changed_only has been changed from False to True. This means that the repr of estimators is now more concise and only shows the parameters whose default value has been changed when printing an estimator. You can restore the previous behaviour by using sklearn.set_config(print_changed_only=False). Also, note that it is always possible to quickly inspect the parameters of any estimator using est.get_params(deep=False). #17061 by Nicolas Hug .

代码和文档贡献者

感谢自0.22版本以来为项目维护和改进做出贡献的所有人，包括：

Abbie Popa, Adrin Jalali, Aleksandra Kocot, Alexandre Batisse, Alexandre Gramfort, Alex Henrie, Alex Itkes, Alex Liang, alexshacked, Alonso Silva Allende, Ana Casado, Andreas Mueller, Angela Ambroz, Ankit810, Arie Pratama Sutiono, Arunav Konwar, Baptiste Maingret, Benjamin Beier Liu, bernie gray, Bharathi Srinivasan, Bharat Raghunathan, Bibhash Chandra Mitra, Brian Wignall, brigi, Brigitta Sipőcz, Carlos H Brandt, CastaChick, castor, cgsavard, Chiara Marmo, Chris Gregory, Christian Kastner, Christian Lorentzen, Corrie Bartelheimer, Daniël van Gelder, Daphne, David Breuer, david-cortes, dbauer9, Divyaprabha M, Edward Qian, Ekaterina Borovikova, ELNS, Emily Taylor, Erich Schubert, Eric Leung, Evgeni Chasnovski, Fabiana, Facundo Ferrín, Fan, Franziska Boenisch, Gael Varoquaux, Gaurav Sharma, Geoffrey Bolmier, Georgi Peev, gholdman1, Gonthier Nicolas, Gregory Morse, Gregory R. Lee, Guillaume Lemaitre, Gui Miotto, Hailey Nguyen, Hanmin Qin, Hao Chun Chang, HaoYin, Hélion du Mas des Bourboux, Himanshu Garg, Hirofumi Suzuki, huangk10, Hugo van Kemenade, Hye Sung Jung, indecisiveuser, inderjeet, J-A16, Jérémie du Boisberranger, Jin-Hwan CHO, JJmistry, Joel Nothman, Johann Faouzi, Jon Haitz Legarreta Gorroño, Juan Carlos Alfaro Jiménez, judithabk6, jumon, Kathryn Poole, Katrina Ni, Kesshi Jordan, Kevin Loftis, Kevin Markham, krishnachaitanya9, Lam Gia Thuan, Leland McInnes, Lisa Schwetlick, lkubin, Loic Esteve, lopusz, lrjball, lucgiffon, lucyleeow, Lucy Liu, Lukas Kemkes, Maciej J Mikulski, Madhura Jayaratne, Magda Zielinska, maikia, Mandy Gu, Manimaran, Manish Aradwad, Maren Westermann, Maria, Mariana Meireles, Marie Douriez, Marielle, Mateusz Górski, mathurinm, Matt Hall, Maura Pintor, mc4229, meyer89, m.fab, Michael Shoemaker, Michał Słapek, Mina Naghshhnejad, mo, Mohamed Maskani, Mojca Bertoncelj, narendramukherjee, ngshya, Nicholas Won, Nicolas Hug, nicolasservel, Niklas, @nkish, Noa Tamir, Oleksandr Pavlyk, olicairns, Oliver Urs Lenz, Olivier Grisel, parsons-kyle-89, Paula, Pete Green, Pierre Delanoue, pspachtholz, Pulkit Mehta, Qizhi Jiang, Quang Nguyen, rachelcjordan, raduspaimoc, Reshama Shaikh, Riccardo Folloni, Rick Mackenbach, Ritchie Ng, Roman Feldbauer, Roman Yurchak, Rory Hartong-Redden, Rüdiger Busche, Rushabh Vasani, Sambhav Kothari, Samesh Lakhotia, Samuel Duan, SanthoshBala18, Santiago M. Mola, Sarat Addepalli, scibol, Sebastian Kießling, SergioDSR, Sergul Aydore, Shiki-H, shivamgargsya, SHUBH CHATTERJEE, Siddharth Gupta, simonamaggio, smarie, Snowhite, stareh, Stephen Blystone, Stephen Marsh, Sunmi Yoon, SylvainLan, talgatomarov, tamirlan1, th0rwas, theoptips, Thomas J Fan, Thomas Li, Thomas Schmitt, Tim Nonner, Tim Vink, Tiphaine Viard, Tirth Patel, Titus Christian, Tom Dupré la Tour, trimeta, Vachan D A, Vandana Iyer, Venkatachalam N, waelbenamara, wconnell, wderose, wenliwyan, Windber, wornbb, Yu-Hang "Maxin" Tang