Hi, In this blog I will be discussing some trade-offs we make while choosing a dimensionality reduction technique for our problem. Now, let's jump into this directly.
Dimensionality reduction(DR) reduces higher dimensional data to lower dimensions. Or we can say that DR maps -dimensional data into
-dimensions (
), (
), where these new
-dimensions hold nearly all of the relevant information about the original data. Sometimes DR results can show clusters of data that are not even present in the original data and sometimes it can map two neighbors from the higher dimension far into the lower dimension. So let's discuss and compare some methods which can prevent these problems. I will be discussing t-SNE, UMAP, and TriMap in this blog.
1. t-SNE (t-distributed Stochastic Neighborhood Embedding)
t-SNE uses the distance between two points in higher dimensions and maps it to the lower dimension.
where is computed by using binary search in the equation, Perplexity =
and the perplexity is a user-defined parameter.
Then define symmetric probability,
Symmetric probability in lower dimension,
Loss function,
now define mapped data points using the multivariate normal distribution .
2. UMAP(Uniform Manifold Approximation and Projection)
UMAP is a two-step method, first creating a graph for high-dimensional data and second, optimizing for the low-dimensional graph layout.
3. TriMap
TripMap considers three points(triplets) in high-dimension and finds an embedding that preserves the ordering of distances within a subset of triplets.
_____
We can do mapping of the data either in some parametric way or non-parametric way. Parametric methods learn some function, and while mapping a new data point we just need to plug the new data and we have the mapped data point, but in the case of non-parametric methods we need to learn the function again as we get new data points.
References:
Comments
Post a Comment