Machine Learning K-Nearest Neighbors (KNN) Algorithm: Palin Analytics
K – Nearest neighbors (KNN) as the name suggests is a classification technique for deriving information from immediate neighboring fields to arrive at the best result.
There are two types of classifications:
- Binary class classification
- Multi-class classification
In a binary class classification, you have only two options to choose from foreg. Yes or no.
Whereas in a multi-class classification there are multiple-choice options to arrive at the best result.
K – Nearest neighbors is a multi-class classification.
Lazy is an algorithm, which has already data stored within and will only react to specific inquiries when asked in real time.
Parameters are attributes of data derived from data. Hyperparameters are attributes, which have to be inculcated into the data, it cannot be derived from the data on its own. K is a hyperparameter.
Non-parametric data is used to train the machine to perform in a particular way, which is what is termed as machine learning.
k – Nearest neighbor is a non-parametric attribute. KNN is also a lazy algorithm.
When data is fed into the machine for it to learn and perform, not all data is fed as some data is kept back for testing the accuracy of predictions done by the machine. Knn could and should be the first choice for a classification study, when there is little or no prior knowledge of the distribution data.
POC or proof of concept is a feasibility study done before making the model to predict the success of the model.
For example in the screen below you can see two sets of data points in red and green color and a new data point is shown separately in grey. To decide the color of the new data point, we have to club it with its nearest neighbors, which in this case is red. Now suppose the red data point indicates persons of younger age and low salary, whilst the green indicates persons of higher age and a higher salary.
The red category travels by public transport whilst the green category uses their private vehicles to reach the office, and we have to find out what will be the mode of transport of the person in grey.
The x1 axis represents age and the x2 axis represents salary, which means that the data points in red represents persons with less age and low salary whereas the data points in green represent persons with higher age and higher salary.
Assuming we choose k of neighborsas 5, the chart below explains how it is arrived that the person or data point in grey belongs to the category of persons marked red.
I wish you like this post. If you have any questions then feel free to comment below
- Data Science with R
- Data science using SAS
- Hadoop and Spark developer
- Data science with Hadoop
- Big Data with Analytics
Do check out data science courses