Skip to content

k-Nearest Neighbors (kNN)

Overview

k-Nearest Neighbors (kNN) is a family of instance-based, non-parametric learning methods that make predictions by comparing a query point to the most similar examples in the training data. Rather than learning an explicit model during training, kNN defers computation until inference time, using distance or similarity measures to identify relevant neighbors.

kNN can be applied to classification, regression, and anomaly detection tasks and is often used as a simple baseline or reference method due to its conceptual simplicity and minimal assumptions about the data.

Model Structure

  • No explicit parametric model or learned weights
  • Training data stored directly as the model
  • Predictions based on the k closest data points
  • Distance or similarity metric defines neighborhood structure
  • Aggregation of neighbor labels or values determines output

Design Rationale

kNN was designed to provide a straightforward approach to learning based on the principle that similar inputs should yield similar outputs. By avoiding an explicit training phase, kNN places all modeling assumptions in the choice of distance metric and neighborhood size.

This design makes kNN flexible and easy to implement, while also exposing clear tradeoffs between bias, variance, and computational cost.

Training Paradigm

  • No explicit training beyond storing the dataset
  • Distance computation performed at inference time
  • Choice of k controls smoothing and sensitivity
  • Optional weighting schemes based on distance
  • Efficiency improvements rely on indexing or approximate nearest neighbor methods

Notable Variants

  • kNN Classification
  • kNN Regression
  • Distance-weighted kNN
  • Approximate Nearest Neighbor (ANN) variants

Further Reading

  • C