4,497 research outputs found
Theoretical Foundations of t-SNE for Visualizing High-Dimensional Clustered Data
This paper investigates the theoretical foundations of the t-distributed
stochastic neighbor embedding (t-SNE) algorithm, a popular nonlinear dimension
reduction and data visualization method. A novel theoretical framework for the
analysis of t-SNE based on the gradient descent approach is presented. For the
early exaggeration stage of t-SNE, we show its asymptotic equivalence to power
iterations based on the underlying graph Laplacian, characterize its limiting
behavior, and uncover its deep connection to Laplacian spectral clustering, and
fundamental principles including early stopping as implicit regularization. The
results explain the intrinsic mechanism and the empirical benefits of such a
computational strategy. For the embedding stage of t-SNE, we characterize the
kinematics of the low-dimensional map throughout the iterations, and identify
an amplification phase, featuring the intercluster repulsion and the expansive
behavior of the low-dimensional map, and a stabilization phase. The general
theory explains the fast convergence rate and the exceptional empirical
performance of t-SNE for visualizing clustered data, brings forth
interpretations of the t-SNE visualizations, and provides theoretical guidance
for applying t-SNE and selecting its tuning parameters in various applications.Comment: Accepted by Journal of Machine Learning Researc
Parametric t-Distributed Stochastic Exemplar-centered Embedding
Parametric embedding methods such as parametric t-SNE (pt-SNE) have been
widely adopted for data visualization and out-of-sample data embedding without
further computationally expensive optimization or approximation. However, the
performance of pt-SNE is highly sensitive to the hyper-parameter batch size due
to conflicting optimization goals, and often produces dramatically different
embeddings with different choices of user-defined perplexities. To effectively
solve these issues, we present parametric t-distributed stochastic
exemplar-centered embedding methods. Our strategy learns embedding parameters
by comparing given data only with precomputed exemplars, resulting in a cost
function with linear computational and memory complexity, which is further
reduced by noise contrastive samples. Moreover, we propose a shallow embedding
network with high-order feature interactions for data visualization, which is
much easier to tune but produces comparable performance in contrast to a deep
neural network employed by pt-SNE. We empirically demonstrate, using several
benchmark datasets, that our proposed methods significantly outperform pt-SNE
in terms of robustness, visual effects, and quantitative evaluations.Comment: fixed typo
Universally Consistent Latent Position Estimation and Vertex Classification for Random Dot Product Graphs
In this work we show that, using the eigen-decomposition of the adjacency
matrix, we can consistently estimate latent positions for random dot product
graphs provided the latent positions are i.i.d. from some distribution. If
class labels are observed for a number of vertices tending to infinity, then we
show that the remaining vertices can be classified with error converging to
Bayes optimal using the -nearest-neighbors classification rule. We evaluate
the proposed methods on simulated data and a graph derived from Wikipedia
Dimensionality Reduction Mappings
A wealth of powerful dimensionality reduction methods has been established which can be used for data visualization and preprocessing. These are accompanied by formal evaluation schemes, which allow a quantitative evaluation along general principles and which even lead to further visualization schemes based on these objectives. Most methods, however, provide a mapping of a priorly given finite set of points only, requiring additional steps for out-of-sample extensions. We propose a general view on dimensionality reduction based on the concept of cost functions, and, based on this general principle, extend dimensionality reduction to explicit mappings of the data manifold. This offers simple out-of-sample extensions. Further, it opens a way towards a theory of data visualization taking the perspective of its generalization ability to new data points. We demonstrate the approach based on a simple global linear mapping as well as prototype-based local linear mappings.
- …