Skip to main content

Basis Change and Matrix Approximation with SVD

Basis Change and Matrix Approximation with SVD

Basis Change and Matrix Approximation with SVD

1. Basis Change

In this section, we’ll see how a transformation matrix changes when we change the basis of a linear mapping. Let there be a linear transformation:

$$ \phi: V \rightarrow W $$

where \( V \in \mathbb{R}^n \) and \( W \in \mathbb{R}^m \). Let the ordered bases for \( V \) and \( W \) be:

$$ B = (b_1, \dots, b_n), \quad C = (c_1, \dots, c_m) $$

and the new bases be:

$$ \tilde{B} = (\tilde{b}_1, \dots, \tilde{b}_n), \quad \tilde{C} = (\tilde{c}_1, \dots, \tilde{c}_m) $$

If \( A \) represents the transformation matrix of \( \phi \) with respect to the bases \( (B, C) \), and \( \tilde{A} \) is the corresponding matrix with respect to \( (\tilde{B}, \tilde{C}) \), we aim to relate \( A \) and \( \tilde{A} \).

Basis change mapping diagram

1.1 Relationship between old and new bases

Each new basis vector in \( \tilde{B} \) can be written as a linear combination of the old basis vectors in \( B \):

$$ \tilde{b}_j = \sum_{i=1}^{n} s_{ij} b_i, \quad \forall j = 1, \dots, n $$

where \( S \in \mathbb{R}^{n \times n} \) is the basis change matrix for \( V \):

$$ S = [s_{:,1}, \dots, s_{:,n}] $$

Similarly, for the codomain basis:

$$ \tilde{c}_l = \sum_{k=1}^{m} t_{kl} c_k, \quad \forall l = 1, \dots, m $$

where \( T \in \mathbb{R}^{m \times m} \) is the basis change matrix for \( W \):

$$ T = [t_{:,1}, \dots, t_{:,m}] $$

1.2 Relationship between transformation matrices

Since \( \phi \) is linear, we can write:

$$ \phi(\tilde{b}_j) = \sum_{l=1}^{m} \tilde{a}_{lj} \tilde{c}_l = \sum_{l=1}^{m} \tilde{a}_{lj} \sum_{k=1}^{m} t_{kl} c_k = \sum_{k=1}^{m} \left( \sum_{l=1}^{m} t_{kl} \tilde{a}_{lj} \right) c_k $$

Also, from the old basis representation:

$$ \phi(\tilde{b}_j) = \phi\left( \sum_{i=1}^{n} s_{ij} b_i \right) = \sum_{i=1}^{n} s_{ij} \phi(b_i) = \sum_{i=1}^{n} s_{ij} \sum_{k=1}^{m} a_{ki} c_k = \sum_{k=1}^{m} \left( \sum_{i=1}^{n} a_{ki} s_{ij} \right) c_k $$

Comparing both forms, we get:

$$ \sum_{l=1}^{m} t_{kl} \tilde{a}_{lj} = \sum_{i=1}^{n} a_{ki} s_{ij} $$

In matrix form:

$$ T \tilde{A} = A S $$

Thus, the new transformation matrix under the changed bases is:

$$ \tilde{A} = T^{-1} A S $$
Interpretation:
Matrix \( S \) changes the basis of the domain, while \( T \) changes that of the codomain. The formula \( \tilde{A} = T^{-1} A S \) gives the correct linear transformation representation in the new coordinate system.

2. Matrix Approximation with Singular Value Decomposition (SVD)

A matrix \( A \in \mathbb{R}^{m \times n} \) can be factorized using SVD as:

$$ A = U S V^\top $$

where:

  • \( U \): left-singular vectors (eigenvectors of \( A A^\top \))
  • \( V \): right-singular vectors (eigenvectors of \( A^\top A \))
  • \( S \): diagonal matrix containing singular values \( \sigma_i = \sqrt{\lambda_i} \)

Each rank-1 component of \( A \) can be expressed as:

$$ A_i = u_i v_i^\top $$

Thus, a rank-\( k \) approximation of \( A \) can be constructed as:

$$ \tilde{A}(k) = \sum_{i=1}^{k} \sigma_i u_i v_i^\top = \sum_{i=1}^{k} \sigma_i A_i $$
Remark: We typically keep singular vectors corresponding to the largest singular values to retain most of the data variance while reducing dimensionality.

2.1 Visual Example: Image Approximation

Consider an image \( I \) (e.g., a picture of a Pug). Below are visualizations of rank-\( k \) approximations for \( k = [1, 2, 3, 4, 10, 20, 50, 100] \).

Original Image SVD Rank-k Approximations

Images \( I_1, I_2, I_3, I_4 \) correspond to the outer products of \( u_1, u_2, u_3, u_4 \) with \( v_1, v_2, v_3, v_4 \). For instance:

  • Rank-1 approximation: \( \sigma_1 u_1 v_1^\top = \sigma_1 I_1 \)
  • Rank-2 approximation: \( \sigma_1 u_1 v_1^\top + \sigma_2 u_2 v_2^\top = \sigma_1 I_1 + \sigma_2 I_2 \)
  • Rank-3 approximation: \( \sigma_1 I_1 + \sigma_2 I_2 + \sigma_3 I_3 \)

References

  1. Mathematics for Machine Learning (MML-Book)

Comments

Popular posts from this blog

DINOv3

DINOv3: Unified Global & Local Self-Supervision DINOv3: Unified Global & Local Self-Supervision DINOv3 extends the DINOv2 framework by combining global self-distillation with masked patch prediction . This allows the model to learn both image-level and dense, spatial representations within a single self-supervised pipeline. This image shows the cosine similarity maps from DINOv3 output features, illustrating the relationships between the patch marked with a red cross and all other patches (as reported in the DINOv3 GitHub repository ). If you find DINOv3 useful, consider giving it a star ⭐. Citation for this work is provided in the References section. 1. Student–Teacher Architecture As in DINOv2, DINOv3 uses a student–teacher setup: a student network with parameters \( \theta \) a teacher network with parameters \( \xi \) Both networks receive different augmented views of the inpu...

Vision Transformers

Vision Transformer (ViT): A Mathematical Explanation Vision Transformer (ViT) The Vision Transformer (ViT) is a deep learning model that applies the Transformer architecture—originally designed for language processing—to visual data. Unlike CNNs, which operate on local pixel neighborhoods, ViT divides an image into patches and models global relationships among them via self-attention. 1. Image to Patch Embeddings The input image: $$ \mathbf{x} \in \mathbb{R}^{H \times W \times C} $$ is divided into non-overlapping patches of size \( P \times P \), giving a total of $$ N = \frac{H \times W}{P^2} $$ patches. Each patch \( \mathbf{x}^{(i)} \) is flattened and linearly projected into a \( D \)-dimensional embedding: $$ \mathbf{e}^{(i)} = \mathbf{W}_{\text{embed}} \, \text{vec}(\mathbf{x}^{(i)}) \in \mathbb{R}^D, \quad i = 1, \dots, N $$ After stacking all patch embeddings, we form: $$ \mathbf{E} = [\mathbf{e}^{(1)}, \dots, \mathb...

DINOv2

DINOv2: A Mathematical Explanation of Self-Supervised Vision Learning DINOv2: Self-Distillation for Vision Without Labels DINOv2 is a powerful self-supervised vision model that learns visual representations without using labels. It builds on the original DINO framework, using a student–teacher architecture and advanced augmentations to produce strong, semantically rich embeddings. 1. Student–Teacher Architecture DINOv2 uses two networks: a student network with parameters \( \theta \) a teacher network with parameters \( \xi \) Both networks receive different augmented views of the same image. $$ x_s = \text{Aug}_{\text{student}}(x), \qquad x_t = \text{Aug}_{\text{teacher}}(x) $$ The student learns by matching the teacher’s output distribution. The teacher is updated using an exponential moving average (EMA) of the student. 2. Image Embeddings The student and teacher networks (often Vision Transformers) pr...