PowerTransformer#

class sklearn.preprocessing.PowerTransformer(method='yeo-johnson', *, standardize=True, copy=True)[源代码]#

按特征应用幂变换以使数据更像高斯。

幂变换是一系列参数单调变换，用于使数据更像高斯。这对于与异方差（非常数方差）或其他需要正态性的情况相关的建模问题非常有用。

目前，PowerTransformer支持Box-Cox转换和Yeo-Johnson转换。通过最大似然估计稳定方差和最小化偏度的最佳参数。

Box-Cox要求输入数据严格为正值，而Yeo-Johnson支持正值或负值数据。

默认情况下，零均值、单位方差正规化应用于转换后的数据。

有关可视化示例，请参阅 Compare PowerTransformer with other scalers .要查看Box-Cox和Yeo-Johnson转换对不同分布的影响，请参阅：将数据映射到正态分布 .

阅读更多的 User Guide .

Added in version 0.20.

参数:

method' yeo-johnson '，' box-cox '}，默认=' yeo-johnson '

功率变换方法。可用的方法有：

“杨约翰逊” [1], 具有积极和消极价值观
“box-cox” [2], 仅适用于严格的正值

standardize布尔，默认=True

设置为True以将零均值、单位方差正规化应用于转换后的输出。

copy布尔，默认=True

设置为False以在转换期间执行就地计算。

属性:

lambdas_形状浮动的nd数组（n_features，）: 所选功能的功率转换参数。
n_features_in_int: 期间看到的功能数量 fit .

Added in version 0.24.
feature_names_in_ ：nd形状数组 (n_features_in_ ,)nd数组形状（: Names of features seen during fit. Defined only when X has feature names that are all strings.

Added in version 1.0.

参见

power_transform: 没有估计器API的等效功能。
QuantileTransformer: 使用参数将数据映射到标准正态分布 output_distribution='normal' .

注意到

NaNs are treated as missing values: disregarded in fit, and maintained in transform.

引用

[1]

I.K. Yeo and R.A. Johnson, "A new family of power transformations to improve normality or symmetry." Biometrika, 87(4), pp.954-959, (2000).

[2]

G.E.P. Box and D.R. Cox, "An Analysis of Transformations", Journal of the Royal Statistical Society B, 26, 211-252 (1964).

示例

>>> import numpy as np
>>> from sklearn.preprocessing import PowerTransformer
>>> pt = PowerTransformer()
>>> data = [[1, 2], [3, 2], [4, 5]]
>>> print(pt.fit(data))
PowerTransformer()
>>> print(pt.lambdas_)
[ 1.386... -3.100...]
>>> print(pt.transform(data))
[[-1.316... -0.707...]
 [ 0.209... -0.707...]
 [ 1.106...  1.414...]]

fit(X, y=None)[源代码]#

估计每个特征的最佳参数Lambda。

使用最大似然度对每个特征独立估计用于最小化偏度的最佳Lambda参数。

参数:

X形状类似阵列（n_samples，n_features）: 用于估计最佳转换参数的数据。
y没有一: 忽视

返回:

self对象: 已安装Transformer。

fit_transform(X, y=None)[源代码]#

配合 PowerTransformer 到 X ，然后变形 X .

参数:

X形状类似阵列（n_samples，n_features）: 用于估计最佳变换参数并使用功率变换进行变换的数据。
y忽视: 未使用，按照惯例，为了API一致性而存在。

返回:

X_new形状的nd数组（n_samples，n_features）: 转换的数据。

get_feature_names_out(input_features=None)[源代码]#

获取用于转换的输出要素名称。

参数:

input_features字符串或无的类数组，默认=无

输入功能。

如果 input_features 是 None 那么 feature_names_in_ 在中用作功能名称。如果 feature_names_in_ 未定义，则生成以下输入要素名称： ["x0", "x1", ..., "x(n_features_in_ - 1)"] .
如果 input_features 是一个类似阵列的，那么 input_features 必须匹配 feature_names_in_ 如果 feature_names_in_ 是定义的。

返回:

feature_names_out字符串对象的nd数组: 与输入功能相同。

get_metadata_routing()[源代码]#

获取此对象的元数据路由。

请检查 User Guide 关于路由机制如何工作。

返回:

routingMetadataRequest: A MetadataRequest 封装路由信息。

get_params(deep=True)[源代码]#

获取此估计器的参数。

参数:

deep布尔，默认=True: 如果为True，将返回此估计量和包含的作为估计量的子对象的参数。

返回:

paramsdict: 参数名称映射到其值。

inverse_transform(X)[源代码]#

使用拟合的曲线应用逆幂变换。

Box-Cox变换的逆由下式给出：：

if lambda_ == 0:
    X = exp(X_trans)
else:
    X = (X_trans * lambda_ + 1) ** (1 / lambda_)

Yeo-Johnson变换的逆由下式给出：：

if X >= 0 and lambda_ == 0:
    X = exp(X_trans) - 1
elif X >= 0 and lambda_ != 0:
    X = (X_trans * lambda_ + 1) ** (1 / lambda_) - 1
elif X < 0 and lambda_ != 2:
    X = 1 - (-(2 - lambda_) * X_trans + 1) ** (1 / (2 - lambda_))
elif X < 0 and lambda_ == 2:
    X = 1 - exp(-X_trans)

参数:

X形状类似阵列（n_samples，n_features）: 转换后的数据。

返回:

X形状的nd数组（n_samples，n_features）: 原始数据。

set_output(*, transform=None)[源代码]#

设置输出容器。

看到介绍 set_output API 了解如何使用API的示例。

参数:

transform{“默认”，“pandas”，“polars”}，默认=无

配置输出 transform 和 fit_transform .

"default" ：Transformer的默认输出格式
"pandas" ：DataFrame输出
"polars" ：两极输出
None ：转换配置不变

Added in version 1.4: "polars" 添加了选项。

返回:

self估计器实例: 估计实例。

set_params(**params)[源代码]#

设置此估计器的参数。

该方法适用于简单估计器以及嵌套对象（例如 Pipeline ).后者具有以下形式的参数 <component>__<parameter> 以便可以更新嵌套对象的每个组件。

参数:

**paramsdict: 估计参数。

返回:

self估计器实例: 估计实例。

transform(X)[源代码]#

使用合适的投影仪对每个特征应用功率变换。

参数:

X形状类似阵列（n_samples，n_features）: 要使用功率转换进行转换的数据。

返回:

X_trans形状的nd数组（n_samples，n_features）: 转换后的数据。