k-Nearest Neighbors (kNN)

Overview

k-Nearest Neighbors (kNN) is a family of instance-based, non-parametric learning methods that make predictions by comparing a query point to the most similar examples in the training data. Rather than learning an explicit model during training, kNN defers computation until inference time, using distance or similarity measures to identify relevant neighbors.

kNN can be applied to classification, regression, and anomaly detection tasks and is often used as a simple baseline or reference method due to its conceptual simplicity and minimal assumptions about the data.

Model Structure

No explicit parametric model or learned weights
Training data stored directly as the model
Predictions based on the k closest data points
Distance or similarity metric defines neighborhood structure
Aggregation of neighbor labels or values determines output

Design Rationale

kNN was designed to provide a straightforward approach to learning based on the principle that similar inputs should yield similar outputs. By avoiding an explicit training phase, kNN places all modeling assumptions in the choice of distance metric and neighborhood size.

This design makes kNN flexible and easy to implement, while also exposing clear tradeoffs between bias, variance, and computational cost.

Training Paradigm

No explicit training beyond storing the dataset
Distance computation performed at inference time
Choice of k controls smoothing and sensitivity
Optional weighting schemes based on distance
Efficiency improvements rely on indexing or approximate nearest neighbor methods

Notable Variants

kNN Classification
kNN Regression
Distance-weighted kNN
Approximate Nearest Neighbor (ANN) variants