dcg_score#

sklearn.metrics.dcg_score(y_true, y_score, *, k=None, log_base=2, sample_weight=None, ignore_ties=False)[源代码]#

计算贴现累积收益。

在应用对数折扣后，将按照预测分数诱导的顺序排名的真实分数相加。

This ranking metric yields a high value if true labels are ranked high by y_score.

通常首选标准化贴现累积收益（NDCG，由ndcg_score计算）。

参数:

y_true数组状的形状（n_samples，n_labels）: 多标签分类的真实目标，或待排名实体的真实分数。
y_score数组状的形状（n_samples，n_labels）: 目标分数可以是概率估计、置信值或非阈值决策测量（由某些分类器上的“decision_function”返回）。
kint，默认=无: 仅考虑排名中最高的k分。如果无，则使用所有输出。
log_basefloat，默认=2: 用于折扣的对数的底。低价值意味着更大的折扣（最重要的结果更重要）。
sample_weight形状类似数组（n_samples，），默认=无: 样本重量。如果 None ，所有样本的重量相同。
ignore_ties布尔，默认=假: 假设在y_score中不存在用于效率增益的关系（如果y_score是连续的，则很可能是这种情况）。

返回:

discounted_cumulative_gain浮子: 平均样本DCG分数。

参见

ndcg_score: 贴现累积增益除以理想贴现累积增益（完美排名获得的DCG），以便获得0和1之间的分数。

引用

Wikipedia entry for Discounted Cumulative Gain <https://en.wikipedia.org/wiki/Discounted_cumulative_gain> _.

贾维林，K.，& Kekalainen，J.（2002）。IR技术的基于累积收益的评估。ACN信息系统交易（TOIS），20（4），422-446。

Wang，Y.，Wang，L.，Li，Y.，他，D.，Chen，W.，& Liu，T. Y. (2013，May）。NDCG排名措施的理论分析。《第26届学习理论年度会议录》（COLT 2013）。

McSherry，F.，& Najork，M. (2008、三月）。计算信息检索性能的措施，有效地在并列分数的存在。在欧洲信息检索会议上（pp. 414-421）。施普林格、柏林、海德堡。

示例

>>> import numpy as np
>>> from sklearn.metrics import dcg_score
>>> # we have ground-truth relevance of some answers to a query:
>>> true_relevance = np.asarray([[10, 0, 0, 1, 5]])
>>> # we predict scores for the answers
>>> scores = np.asarray([[.1, .2, .3, 4, 70]])
>>> dcg_score(true_relevance, scores)
np.float64(9.49...)
>>> # we can set k to truncate the sum; only top k answers contribute
>>> dcg_score(true_relevance, scores, k=2)
np.float64(5.63...)
>>> # now we have some ties in our prediction
>>> scores = np.asarray([[1, 0, 0, 0, 1]])
>>> # by default ties are averaged, so here we get the average true
>>> # relevance of our top predictions: (10 + 5) / 2 = 7.5
>>> dcg_score(true_relevance, scores, k=1)
np.float64(7.5)
>>> # we can choose to ignore ties for faster results, but only
>>> # if we know there aren't ties in our scores, otherwise we get
>>> # wrong results:
>>> dcg_score(true_relevance,
...           scores, k=1, ignore_ties=True)
np.float64(5.0)