ndcg_score#

sklearn.metrics.ndcg_score(y_true, y_score, *, k=None, sample_weight=None, ignore_ties=False)[源代码]#

计算归一化贴现累积增益。

在应用对数折扣后，将按照预测分数诱导的顺序排名的真实分数相加。然后除以可能的最佳分数（理想DCG，获得完美排名），以获得0和1之间的分数。

如果真实标签的排名很高，则此排名指标将返回很高的值 y_score .

参数:

y_true数组状的形状（n_samples，n_labels）: 多标签分类的真实目标，或待排名实体的真实分数。中的负值 y_true 可能导致输出不在0和1之间。
y_score数组状的形状（n_samples，n_labels）: 目标分数可以是概率估计、置信值或非阈值决策测量（由某些分类器上的“decision_function”返回）。
kint，默认=无: 仅考虑排名中最高的k分。如果 None ，使用所有输出。
sample_weight形状类似数组（n_samples，），默认=无: 样本重量。如果 None ，所有样本的重量相同。
ignore_ties布尔，默认=假: 假设在y_score中不存在用于效率增益的关系（如果y_score是连续的，则很可能是这种情况）。

返回:

normalized_discounted_cumulative_gain漂浮在 [0., 1.]: 所有样本的平均NDCG分数。

参见

dcg_score: 贴现累积收益（未标准化）。

引用

Wikipedia entry for Discounted Cumulative Gain <https://en.wikipedia.org/wiki/Discounted_cumulative_gain> _

贾维林，K.，& Kekalainen，J.（2002）。IR技术的基于累积收益的评估。ACN信息系统交易（TOIS），20（4），422-446。

Wang，Y.，Wang，L.，Li，Y.，他，D.，Chen，W.，& Liu，T. Y. (2013、五月）。NDCG排名措施的理论分析。第26届学习理论年度会议录（COLT 2013）

McSherry，F.，& Najork，M. (2008、三月）。计算信息检索性能的措施，有效地在并列分数的存在。在欧洲信息检索会议上（pp. 414-421）。施普林格、柏林、海德堡。

示例

>>> import numpy as np
>>> from sklearn.metrics import ndcg_score
>>> # we have ground-truth relevance of some answers to a query:
>>> true_relevance = np.asarray([[10, 0, 0, 1, 5]])
>>> # we predict some scores (relevance) for the answers
>>> scores = np.asarray([[.1, .2, .3, 4, 70]])
>>> ndcg_score(true_relevance, scores)
np.float64(0.69...)
>>> scores = np.asarray([[.05, 1.1, 1., .5, .0]])
>>> ndcg_score(true_relevance, scores)
np.float64(0.49...)
>>> # we can set k to truncate the sum; only top k answers contribute.
>>> ndcg_score(true_relevance, scores, k=4)
np.float64(0.35...)
>>> # the normalization takes k into account so a perfect answer
>>> # would still get 1.0
>>> ndcg_score(true_relevance, true_relevance, k=4)
np.float64(1.0...)
>>> # now we have some ties in our prediction
>>> scores = np.asarray([[1, 0, 0, 0, 1]])
>>> # by default ties are averaged, so here we get the average (normalized)
>>> # true relevance of our top predictions: (10 / 10 + 5 / 10) / 2 = .75
>>> ndcg_score(true_relevance, scores, k=1)
np.float64(0.75...)
>>> # we can choose to ignore ties for faster results, but only
>>> # if we know there aren't ties in our scores, otherwise we get
>>> # wrong results:
>>> ndcg_score(true_relevance,
...           scores, k=1, ignore_ties=True)
np.float64(0.5...)