Skip to main content

Linear Regression

Linear Regression: Mathematical Foundations

Linear Regression: Mathematical Foundations

Linear regression is a fundamental statistical technique used to predict a real-valued output \( y \in \mathbb{R} \) for a given input data point \( x \in \mathbb{R}^D \). It assumes that the expected value of the target variable is a linear function of the input features:

$$ \mathbb{E}[y \mid x] = w^\top x $$

1. Dataset Representation

Let the training dataset be represented by a feature matrix:

$$ X \in \mathbb{R}^{N \times D} $$

where \( N \) is the number of data points and \( D \) is the number of features. The dataset can be expressed as:

$$ X = [x_1, x_2, \dots, x_D] $$

Each \( x_i \) (for \( i = 1, \dots, D \)) is a column vector representing one feature across all samples.

2. Model Formulation

A general polynomial form of regression can be written as:

$$ y = w_0 + w_{11} x_1 + w_{12} x_1^2 + \dots + w_{21} x_2 + w_{22} x_2^2 + \dots $$

or more compactly using basis functions:

$$ y = w_0 + \sum_{i=1}^{D} \phi_i(x_i) $$

In practice, the most common and simple form is the linear (degree 1) model:

$$ y = w_0 + \sum_{i=1}^{D} w_i x_i = w_0 + w^\top x $$
Notation:
\( y \): output (target) variable
\( x \in \mathbb{R}^D \): input feature vector
\( w = [w_1, \dots, w_D]^\top \): weight vector
\( w_0 \): bias (intercept) term
\( \phi_i(x_i) \): basis or feature transformation (e.g., polynomial term)

3. Estimating Parameters

There are two primary approaches for estimating the parameters \( w \) and \( w_0 \):

  1. Normal Equation — analytical solution using matrix operations
  2. Gradient Descent — iterative optimization minimizing a loss function

3.1 Normal Equation

The objective is to minimize the mean squared error (MSE) between predicted and true values:

$$ J(w) = (Xw - y)^\top (Xw - y) $$

Taking the gradient with respect to \( w \):

$$ \nabla_w J(w) = 2(X^\top X w - X^\top y) $$

Setting the gradient to zero gives the maximum likelihood estimate (MLE) for \( w \):

$$ \hat{w} = (X^\top X)^{-1} X^\top y $$
Note: The inverse \( (X^\top X)^{-1} \) exists only if \( X^\top X \) is full rank (non-singular). If not, techniques such as regularization or pseudo-inverse are used.

3.2 Gradient Descent Method

Gradient descent updates weights iteratively:

$$ w^{(t+1)} = w^{(t)} - \eta \, \nabla_w J(w^{(t)}) $$

where \( \eta \) is the learning rate controlling the step size.

4. Ridge Regression (L2 Regularization)

To prevent overfitting or handle multicollinearity, a regularization term is added:

$$ J_{\text{ridge}}(w) = (Xw - y)^\top (Xw - y) + \lambda \|w\|_2^2 $$

The gradient becomes:

$$ \nabla_w J_{\text{ridge}}(w) = 2(X^\top X w - X^\top y + \lambda w) $$

and the closed-form solution is:

$$ \hat{w}_{\text{ridge}} = (X^\top X + \lambda I)^{-1} X^\top y $$
Notation:
\( \lambda \): regularization coefficient controlling shrinkage of weights
\( I \): identity matrix of size \( D \times D \)
Larger \( \lambda \) reduces variance but increases bias.

5. Summary

  • Linear regression models the relationship between input features and a continuous target.
  • Parameters can be estimated analytically or via optimization.
  • Regularization techniques like Ridge Regression improve generalization and numerical stability.

Comments

Popular posts from this blog

DINOv3

DINOv3: Unified Global & Local Self-Supervision DINOv3: Unified Global & Local Self-Supervision DINOv3 extends the DINOv2 framework by combining global self-distillation with masked patch prediction . This allows the model to learn both image-level and dense, spatial representations within a single self-supervised pipeline. This image shows the cosine similarity maps from DINOv3 output features, illustrating the relationships between the patch marked with a red cross and all other patches (as reported in the DINOv3 GitHub repository ). If you find DINOv3 useful, consider giving it a star ⭐. Citation for this work is provided in the References section. 1. Student–Teacher Architecture As in DINOv2, DINOv3 uses a student–teacher setup: a student network with parameters \( \theta \) a teacher network with parameters \( \xi \) Both networks receive different augmented views of the inpu...

Vision Transformers

Vision Transformer (ViT): A Mathematical Explanation Vision Transformer (ViT) The Vision Transformer (ViT) is a deep learning model that applies the Transformer architecture—originally designed for language processing—to visual data. Unlike CNNs, which operate on local pixel neighborhoods, ViT divides an image into patches and models global relationships among them via self-attention. 1. Image to Patch Embeddings The input image: $$ \mathbf{x} \in \mathbb{R}^{H \times W \times C} $$ is divided into non-overlapping patches of size \( P \times P \), giving a total of $$ N = \frac{H \times W}{P^2} $$ patches. Each patch \( \mathbf{x}^{(i)} \) is flattened and linearly projected into a \( D \)-dimensional embedding: $$ \mathbf{e}^{(i)} = \mathbf{W}_{\text{embed}} \, \text{vec}(\mathbf{x}^{(i)}) \in \mathbb{R}^D, \quad i = 1, \dots, N $$ After stacking all patch embeddings, we form: $$ \mathbf{E} = [\mathbf{e}^{(1)}, \dots, \mathb...

DINOv2

DINOv2: A Mathematical Explanation of Self-Supervised Vision Learning DINOv2: Self-Distillation for Vision Without Labels DINOv2 is a powerful self-supervised vision model that learns visual representations without using labels. It builds on the original DINO framework, using a student–teacher architecture and advanced augmentations to produce strong, semantically rich embeddings. 1. Student–Teacher Architecture DINOv2 uses two networks: a student network with parameters \( \theta \) a teacher network with parameters \( \xi \) Both networks receive different augmented views of the same image. $$ x_s = \text{Aug}_{\text{student}}(x), \qquad x_t = \text{Aug}_{\text{teacher}}(x) $$ The student learns by matching the teacher’s output distribution. The teacher is updated using an exponential moving average (EMA) of the student. 2. Image Embeddings The student and teacher networks (often Vision Transformers) pr...