pandas.unique#

pandas.unique(values)[源代码]#

根据哈希表返回唯一值。

按照出现的顺序返回唯一项。这是不能分类的。

比NumPy快得多。对于足够长的序列来说是唯一的。包括NA值。

参数

values一维阵列式

退货

数字.ndarray或扩展数组

回报可以是：

索引：当输入为索引时
分类：当输入是分类数据类型时
Ndarray：当输入为系列时/ndarray

返回numpy.ndarray或ExtensionArray。

参见

Index.unique: 从索引返回唯一值。
Series.unique: 返回Series对象的唯一值。

示例

>>> pd.unique(pd.Series([2, 1, 3, 3]))
array([2, 1, 3])

>>> pd.unique(pd.Series([2] + [1] * 5))
array([2, 1])

>>> pd.unique(pd.Series([pd.Timestamp("20160101"), pd.Timestamp("20160101")]))
array(['2016-01-01T00:00:00.000000000'], dtype='datetime64[ns]')

>>> pd.unique(
...     pd.Series(
...         [
...             pd.Timestamp("20160101", tz="US/Eastern"),
...             pd.Timestamp("20160101", tz="US/Eastern"),
...         ]
...     )
... )
<DatetimeArray>
['2016-01-01 00:00:00-05:00']
Length: 1, dtype: datetime64[ns, US/Eastern]

>>> pd.unique(
...     pd.Index(
...         [
...             pd.Timestamp("20160101", tz="US/Eastern"),
...             pd.Timestamp("20160101", tz="US/Eastern"),
...         ]
...     )
... )
DatetimeIndex(['2016-01-01 00:00:00-05:00'],
        dtype='datetime64[ns, US/Eastern]',
        freq=None)

>>> pd.unique(list("baabc"))
array(['b', 'a', 'c'], dtype=object)

无序的分类将按出现的顺序返回类别。

>>> pd.unique(pd.Series(pd.Categorical(list("baabc"))))
['b', 'a', 'c']
Categories (3, object): ['a', 'b', 'c']

>>> pd.unique(pd.Series(pd.Categorical(list("baabc"), categories=list("abc"))))
['b', 'a', 'c']
Categories (3, object): ['a', 'b', 'c']

有序的范畴词保留了范畴的顺序。

>>> pd.unique(
...     pd.Series(
...         pd.Categorical(list("baabc"), categories=list("abc"), ordered=True)
...     )
... )
['b', 'a', 'c']
Categories (3, object): ['a' < 'b' < 'c']

元组数组

>>> pd.unique([("a", "b"), ("b", "a"), ("a", "c"), ("b", "a")])
array([('a', 'b'), ('b', 'a'), ('a', 'c')], dtype=object)

pandas.factorize

pandas.wide_to_long