Dimensionality Reduction: Trade-offs and Techniques
In this blog, we discuss some trade-offs involved when choosing a Dimensionality Reduction (DR) technique. Dimensionality reduction aims to represent high-dimensional data in a lower-dimensional space while preserving as much meaningful structure as possible.
The goal is to find a mapping \( f \) that captures the essential relationships among the data points in \( \mathbb{R}^n \) within a lower-dimensional space \( \mathbb{R}^k \). However, depending on the technique used, distortions such as artificial clusters or misplaced neighbors may appear. We will focus on three widely used methods: t-SNE, UMAP, and TriMap.
1. t-SNE (t-distributed Stochastic Neighbor Embedding)
t-SNE models the pairwise similarities between points in both the high-dimensional and low-dimensional spaces using probability distributions.
In the high-dimensional space, the conditional probability of point \( x_j \) being a neighbor of \( x_i \) is defined as:
Here, \( \sigma_i \) controls the local neighborhood size around point \( x_i \) and is chosen such that the perplexity of the distribution equals a user-defined value:
To make the similarity symmetric, we define:
In the low-dimensional space, a Student-t distribution with one degree of freedom (equivalent to a Cauchy distribution) is used to measure similarity:
The cost function minimizes the Kullback–Leibler (KL) divergence between the high- and low-dimensional distributions:
The embedding \( y_i \) is initialized from a small Gaussian distribution:
\( x_i, x_j \in \mathbb{R}^n \): high-dimensional data points
\( y_i, y_j \in \mathbb{R}^k \): corresponding low-dimensional embeddings
\( \sigma_i \): local variance parameter (determined by perplexity)
\( p_{ij}, q_{ij} \): pairwise similarity probabilities in high and low dimensions
\( n \): number of data points
2. UMAP (Uniform Manifold Approximation and Projection)
UMAP constructs a weighted graph from high-dimensional data that approximates the local manifold structure, then optimizes a corresponding low-dimensional layout that preserves these relationships.
The algorithm proceeds in two stages:
- Graph Construction: A fuzzy topological representation is built by connecting each point to its nearest neighbors with membership strength based on distance and local connectivity.
- Graph Optimization: The low-dimensional embedding is found by minimizing a cross-entropy loss between high- and low-dimensional graphs.
\( w_{ij} \): edge weight (similarity) in high-dimensional graph
\( w'_{ij} \): corresponding edge weight in low-dimensional graph
\( (i,j) \): pair of neighboring points
3. TriMap
TriMap takes a different approach by considering triplets of points rather than pairs. It seeks an embedding that preserves the relative ordering of distances among triplets.
Given a triplet \((i, j, k)\), the objective is to ensure that if \(x_i\) is closer to \(x_j\) than to \(x_k\) in high-dimensional space, the same holds true for their low-dimensional representations:
TriMap minimizes a weighted sum of triplet-based loss terms using a smooth logistic function:
\( (i,j,k) \): triplet of indices
\( w_{ijk} \): weight indicating the importance of triplet \((i,j,k)\)
\( y_i, y_j, y_k \in \mathbb{R}^k \): low-dimensional representations
4. Parametric vs. Non-Parametric Mapping
Dimensionality reduction methods can be categorized as:
- Parametric: Learn a mapping function \( f_\theta \) so that new points can be directly embedded using the same parameters \( \theta \). Example: Parametric t-SNE, Autoencoders.
- Non-Parametric: The embedding is learned only for the training data. New points require recomputation of the entire mapping. Example: standard t-SNE, UMAP, TriMap.
Comments
Post a Comment