>>> from env_helper import info; info()

页面更新时间： 2023-12-14 16:38:33
运行环境：
    Linux发行版本: Debian GNU/Linux 12 (bookworm)
    操作系统内核: Linux-6.1.0-15-amd64-x86_64-with-glibc2.36
    Python版本: 3.11.2

4.4. 使用Counter进行计数统计¶

计数统计相信大家都不陌生，简单地说就是统计某一项出现的次数。实际应用中很多需求都需要用到这个模型，如检测样本中某一值出现的次数，日志分析某一消息出现的频率、分析文件中相同字符串出现的概率等。这种类似的需求有很多种实现方法。我们逐一来看一下使用不同数据结构时的实现方式。

4.4.1. 使用 dict¶

>>> some_data = [ "a ","f", "12", "2","4", 5, 2 ,' b' , 4 , 7, ' a ', 5,'d', 'a', ' z ']
>>> count_frq = dict()
>>> for item in some_data:
>>>
>>>     if item in count_frq:
>>>         count_frq[item]+=1
>>>     else:
>>>         count_frq[item] = 1
>>> print(count_frq)

{'a ': 1, 'f': 1, '12': 1, '2': 1, '4': 1, 5: 2, 2: 1, ' b': 1, 4: 1, 7: 1, ' a ': 1, 'd': 1, 'a': 1, ' z ': 1}

4.4.2. 使用 defaultdict¶

>>> from collections import defaultdict
>>> count_frq=defaultdict(int)
>>> for item in some_data:
>>>     count_frq[item] += 1
>>> print(count_frq)

defaultdict(<class 'int'>, {'a ': 1, 'f': 1, '12': 1, '2': 1, '4': 1, 5: 2, 2: 1, ' b': 1, 4: 1, 7: 1, ' a ': 1, 'd': 1, 'a': 1, ' z ': 1})

4.4.3. 使用 set 和 list¶

>>> count_set = set(some_data)
>>> count_list =[]
>>> for item in count_set:
>>>     count_list.append ((item,some_data.count(item)))
>>> print (count_list)

[(2, 1), ('a ', 1), (4, 1), ('4', 1), (5, 2), (7, 1), ('a', 1), ('d', 1), ('12', 1), (' a ', 1), (' z ', 1), ('2', 1), ('f', 1), (' b', 1)]

上面的方法都比较简单，但有没有更优雅、更pythonic的解决方法呢？答案是使用 collections.Counter()

>>> from collections import Counter
>>> print (Counter(some_data))

Counter({5: 2, 'a ': 1, 'f': 1, '12': 1, '2': 1, '4': 1, 2: 1, ' b': 1, 4: 1, 7: 1, ' a ': 1, 'd': 1, 'a': 1, ' z ': 1})

Counter 类是自Python2.7起增加的，属于字典类的子类.是一个容器对象.主要用来统计散列对象，支持集合操作 + 、 - 、 & 、 | ，其中 & 和 | 操作分别返回两个 Counter 对象各元素的最小值和最大值。它提供了 3种不同的方式来初始化：

Counter ("success") # 可迭代对象
Counter (s=3, c=2, e=1, u=1) # 关键字參教
Counter ({'s': 3, "c" :2, "u":1 ,"e": 1}) #字典

可以使用 elements() 方法来获取 Counter 中的 key 值。

>>> list(Counter(some_data).elements())

['a ', 'f', '12', '2', '4', 5, 5, 2, ' b', 4, 7, ' a ', 'd', 'a', ' z ']

利用most jommonO方法可以找出前N个出现频率最高的元素以及它们对应的次数。

>>> Counter (some_data).most_common (2)

[(5, 2), ('a ', 1)]

当访问不存在的元素时.默认返回为0而不是抛出KeyError异常。

>>> (Counter (some_data))['y']

update() 方法用于被统计对象元素的更新，原有 Counter 计数器对象与新增元素的统计计数值相加而不是直接替换它们。 subtract() 方法用于实现计数器对象中元素统计值相减，输人和输出的统计值允许为 0 或者负数。

>>> c = Counter ("success")
>>> c.update("successfully")
>>> c

Counter({'s': 6, 'u': 3, 'c': 4, 'e': 2, 'f': 1, 'l': 2, 'y': 1})

>>> c.subtract ("successfulllly")
>>> c

Counter({'s': 3, 'u': 1, 'c': 2, 'e': 1, 'f': 0, 'l': -2, 'y': 0})

4.3. 使用 copy 模块深拷贝对象

4.5. 深入掌握ConfigParser

Python 3 教程 文档

4.4. 使用Counter进行计数统计¶

4.4.1. 使用 dict¶

4.4.2. 使用 defaultdict¶

4.4.3. 使用 set 和 list¶

Python 3 教程文档