Linear Regression: Mathematical Foundations
Linear regression is a fundamental statistical technique used to predict a real-valued output \( y \in \mathbb{R} \) for a given input data point \( x \in \mathbb{R}^D \). It assumes that the expected value of the target variable is a linear function of the input features:
1. Dataset Representation
Let the training dataset be represented by a feature matrix:
where \( N \) is the number of data points and \( D \) is the number of features. The dataset can be expressed as:
Each \( x_i \) (for \( i = 1, \dots, D \)) is a column vector representing one feature across all samples.
2. Model Formulation
A general polynomial form of regression can be written as:
or more compactly using basis functions:
In practice, the most common and simple form is the linear (degree 1) model:
\( y \): output (target) variable
\( x \in \mathbb{R}^D \): input feature vector
\( w = [w_1, \dots, w_D]^\top \): weight vector
\( w_0 \): bias (intercept) term
\( \phi_i(x_i) \): basis or feature transformation (e.g., polynomial term)
3. Estimating Parameters
There are two primary approaches for estimating the parameters \( w \) and \( w_0 \):
- Normal Equation — analytical solution using matrix operations
- Gradient Descent — iterative optimization minimizing a loss function
3.1 Normal Equation
The objective is to minimize the mean squared error (MSE) between predicted and true values:
Taking the gradient with respect to \( w \):
Setting the gradient to zero gives the maximum likelihood estimate (MLE) for \( w \):
3.2 Gradient Descent Method
Gradient descent updates weights iteratively:
where \( \eta \) is the learning rate controlling the step size.
4. Ridge Regression (L2 Regularization)
To prevent overfitting or handle multicollinearity, a regularization term is added:
The gradient becomes:
and the closed-form solution is:
\( \lambda \): regularization coefficient controlling shrinkage of weights
\( I \): identity matrix of size \( D \times D \)
Larger \( \lambda \) reduces variance but increases bias.
5. Summary
- Linear regression models the relationship between input features and a continuous target.
- Parameters can be estimated analytically or via optimization.
- Regularization techniques like Ridge Regression improve generalization and numerical stability.

Comments
Post a Comment