SNE squared norm estimation

Last updated on 04 Jul 2023

SNE (Stochastic Neighbor Embedding) is a popular dimensionality reduction technique used in machine learning and data visualization. It aims to represent high-dimensional data in a lower-dimensional space while preserving the similarity relationships between data points. SNE achieves this by modeling the probability distribution of pairwise similarities in both the high-dimensional and low-dimensional spaces.

In SNE, the pairwise similarity between two data points is defined as the conditional probability that one point would pick another point as its neighbor, given the Euclidean distance between them. The similarity is calculated using a Gaussian kernel function:

scssCopy codep_{j|i} = exp(-||x_i - x_j||^2 / (2 * sigma_i^2))

where p_{j|i} represents the similarity between data points i and j, x_i and x_j are the high-dimensional input feature vectors of the respective data points, and sigma_i is the variance parameter that controls the width of the Gaussian distribution for each data point.

The goal of SNE is to learn a lower-dimensional representation, y_i, for each data point, which captures the similarities present in the high-dimensional space. The similarity between two points in the low-dimensional space is defined in a similar manner using a Student's t-distribution:

cssCopy codeq_{j|i} = (1 + ||y_i - y_j||^2)^{-1} / Z

where q_{j|i} represents the similarity between the low-dimensional representations y_i and y_j, and Z is a normalization constant.

The SNE algorithm seeks to minimize the divergence between the two distributions, p_{j|i} and q_{j|i}, by adjusting the positions of the points in the low-dimensional space. It uses gradient descent to optimize an objective function known as the Kullback-Leibler (KL) divergence:

cssCopy codeKL(P || Q) = \sum_i \sum_j p_{j|i} log(p_{j|i} / q_{j|i})

To efficiently compute the pairwise similarities and perform the optimization, SNE uses a stochastic approach. It approximates the similarity distributions using mini-batches of data points and performs updates in an iterative manner.

Now, let's focus on the SNE squared norm estimation. One limitation of the original SNE algorithm is that it treats pairwise similarities as probabilities and applies logarithms, which can be computationally expensive and prone to numerical instability. To address this, the squared Euclidean distance can be used as a proxy for the pairwise similarity.

The squared Euclidean distance between two points i and j in the high-dimensional space is given by:

Copy code||x_i - x_j||^2

Similarly, the squared Euclidean distance between the low-dimensional representations y_i and y_j can be computed as:

Copy code||y_i - y_j||^2

By using the squared Euclidean distance, we can avoid the need for costly logarithmic calculations and simplify the estimation process.

To estimate the squared Euclidean distances in the low-dimensional space, a modification is made to the SNE algorithm. Instead of using the Student's t-distribution to model the similarities, it uses a Gaussian distribution. The similarity between two points in the low-dimensional space is defined as:

scssCopy codeq_{j|i} = exp(-||y_i - y_j||^2 / (2 * sigma_i^2))

where sigma_i is the variance parameter for each data point, similar to the original SNE.

With this modification, the objective function to be minimized becomes the squared norm of the differences between the pairwise similarities in the high-dimensional and low-dimensional spaces:

cssCopy codeKL(P || Q) = \sum_i \sum_j (p_{j|i} - q_{j|i})^2

This formulation allows for more efficient computations and avoids the numerical instability associated with taking logarithms.

In summary, SNE squared norm estimation is a modification of the original SNE algorithm that uses the squared Euclidean distance as a proxy for pairwise similarity. It replaces the use of probabilities and logarithms with squared norms, simplifying the estimation process and improving computational efficiency.