TimeSeriesSplit#
- class sklearn.model_selection.TimeSeriesSplit(n_splits=5, *, max_train_size=None, test_size=None, gap=0)[源代码]#
时间序列交叉验证器。
提供训练/测试索引,以拆分训练/测试集中以固定时间间隔观察到的时间序列数据样本。在每次拆分中,测试指数必须高于以前,因此交叉验证器中的洗牌是不合适的。
此交叉验证对象是
KFold
.在第k次拆分中,它返回前k个折叠作为训练集,并返回第(k+1)个折叠作为测试集。请注意,与标准交叉验证方法不同,连续训练集是之前训练集的超集。
阅读更多的 User Guide .
有关交叉验证行为的可视化以及常见scikit-learn拆分方法之间的比较,请参阅 在scikit-learn中可视化交叉验证行为
Added in version 0.18.
- 参数:
- n_splitsint,默认=5
分裂的数量。必须至少为2。
在 0.22 版本发生变更:
n_splits
默认值从3更改为5。- max_train_sizeint,默认=无
单个训练集的最大大小。
- test_sizeint,默认=无
用于限制测试集的大小。默认为
n_samples // (n_splits + 1)
,这是允许的最大值,gap=0
.Added in version 0.24.
- gapint,默认=0
在测试集之前要从每个训练集末尾排除的样本数。
Added in version 0.24.
注意到
The training set has size
i * n_samples // (n_splits + 1) + n_samples % (n_splits + 1)
in thei
th split, with a test set of sizen_samples//(n_splits + 1)
by default, wheren_samples
is the number of samples. Note that this formula is only valid whentest_size
andmax_train_size
are left to their default values.示例
>>> import numpy as np >>> from sklearn.model_selection import TimeSeriesSplit >>> X = np.array([[1, 2], [3, 4], [1, 2], [3, 4], [1, 2], [3, 4]]) >>> y = np.array([1, 2, 3, 4, 5, 6]) >>> tscv = TimeSeriesSplit() >>> print(tscv) TimeSeriesSplit(gap=0, max_train_size=None, n_splits=5, test_size=None) >>> for i, (train_index, test_index) in enumerate(tscv.split(X)): ... print(f"Fold {i}:") ... print(f" Train: index={train_index}") ... print(f" Test: index={test_index}") Fold 0: Train: index=[0] Test: index=[1] Fold 1: Train: index=[0 1] Test: index=[2] Fold 2: Train: index=[0 1 2] Test: index=[3] Fold 3: Train: index=[0 1 2 3] Test: index=[4] Fold 4: Train: index=[0 1 2 3 4] Test: index=[5] >>> # Fix test_size to 2 with 12 samples >>> X = np.random.randn(12, 2) >>> y = np.random.randint(0, 2, 12) >>> tscv = TimeSeriesSplit(n_splits=3, test_size=2) >>> for i, (train_index, test_index) in enumerate(tscv.split(X)): ... print(f"Fold {i}:") ... print(f" Train: index={train_index}") ... print(f" Test: index={test_index}") Fold 0: Train: index=[0 1 2 3 4 5] Test: index=[6 7] Fold 1: Train: index=[0 1 2 3 4 5 6 7] Test: index=[8 9] Fold 2: Train: index=[0 1 2 3 4 5 6 7 8 9] Test: index=[10 11] >>> # Add in a 2 period gap >>> tscv = TimeSeriesSplit(n_splits=3, test_size=2, gap=2) >>> for i, (train_index, test_index) in enumerate(tscv.split(X)): ... print(f"Fold {i}:") ... print(f" Train: index={train_index}") ... print(f" Test: index={test_index}") Fold 0: Train: index=[0 1 2 3] Test: index=[6 7] Fold 1: Train: index=[0 1 2 3 4 5] Test: index=[8 9] Fold 2: Train: index=[0 1 2 3 4 5 6 7] Test: index=[10 11]
有关更扩展的示例,请参阅 与时间相关的特征工程 .
- get_metadata_routing()[源代码]#
获取此对象的元数据路由。
请检查 User Guide 关于路由机制如何工作。
- 返回:
- routingMetadataRequest
A
MetadataRequest
封装路由信息。