# 10.1. 了解k近邻¶

## 10.1.2. 理论¶

kNN是监督学习中最简单的分类算法之一。其思想是在特征空间中寻找测试数据的最接近匹配。我们将用下图来研究它。

• 你需要知道镇上所有的房子，对吧？因为，我们必须检查从新来的人到所有现有房屋的距离，才能找到最近的邻居。如果有足够的房子和家庭，它需要大量的记忆，也需要更多的时间来计算。

• 几乎没有时间进行任何训练或准备。

## 10.1.3. OpenCV中的kNN¶

>>> %matplotlib inline
>>>
>>> import cv2
>>> import numpy as np
>>> import matplotlib.pyplot as plt
>>>
>>> # Feature set containing (x,y) values of 25 known/training data
>>> trainData = np.random.randint(0,100,(25,2)).astype(np.float32)
>>>
>>> # Labels each one either Red or Blue with numbers 0 and 1
>>> responses = np.random.randint(0,2,(25,1)).astype(np.float32)
>>>
>>> # Take Red families and plot them
>>> red = trainData[responses.ravel()==0]
>>> plt.scatter(red[:,0],red[:,1],80,'r','^')
>>>
>>> # Take Blue families and plot them
>>> blue = trainData[responses.ravel()==1]
>>> plt.scatter(blue[:,0],blue[:,1],80,'b','s')
>>>
>>> plt.show()


1. 给新来者的标签取决于我们之前看到的kNN理论。如果需要近邻算法，只需指定 $$k=1$$ 其中k是邻居的数目。

2. k近邻的标签。

3. 从新来者到每个最近邻居的相应距离。

>>> newcomer = np.random.randint(0,100,(1,2)).astype(np.float32)
>>> plt.scatter(newcomer[:,0],newcomer[:,1],80,'g','o')
>>> # knn = cv2.knn()
>>> # knn = cv2.KNearest()
>>> # https://blog.csdn.net/zhangpan929/article/details/86217374
>>> knn = cv2.ml.KNearest_create()
>>> # knn.train(trainData,responses)
>>> knn.train(trainData, cv2.ml.ROW_SAMPLE, responses)
>>> # ret, results, neighbours ,dist = knn.find_nearest(newcomer, 3)
>>> ret, results, neighbours ,dist = knn.findNearest(newcomer, 3)
>>>
>>> print ("result: ", results,"\n")
>>> print ("neighbours: ", neighbours,"\n")
>>> print ("distance: ", dist)
>>>
>>> plt.show()

result:  [[1.]]

neighbours:  [[1. 1. 0.]]

distance:  [[117. 305. 314.]]


result:  [[ 1.]]
neighbours:  [[ 1.  1.  1.]]
distance:  [[ 53.  58.  61.]]


>>> # 10 new comers
>>> newcomers = np.random.randint(0,100,(10,2)).astype(np.float32)
>>> ret, results,neighbours,dist = knn.findNearest(newcomer, 3)
>>> # The results also will contain 10 labels.