Search CORE

7 research outputs found

Heavy-tailed kernels reveal a finer cluster structure in t-SNE visualisations

Author: A Zeisel
B Tasic
EAD Amir
GC Linderman
JA Lee
L van der Maaten
L van der Maaten
M Wattenberg
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 04/04/2019
Field of study

T-distributed stochastic neighbour embedding (t-SNE) is a widely used data visualisation technique. It differs from its predecessor SNE by the low-dimensional similarity kernel: the Gaussian kernel was replaced by the heavy-tailed Cauchy kernel, solving the "crowding problem" of SNE. Here, we develop an efficient implementation of t-SNE for a

t

-distribution kernel with an arbitrary degree of freedom

\nu

, with

\nu\to\infty

corresponding to SNE and

\nu=1

corresponding to the standard t-SNE. Using theoretical analysis and toy examples, we show that

\nu<1

can further reduce the crowding problem and reveal finer cluster structure that is invisible in standard t-SNE. We further demonstrate the striking effect of heavier-tailed kernels on large real-life data sets such as MNIST, single-cell RNA-sequencing data, and the HathiTrust library. We use domain knowledge to confirm that the revealed clusters are meaningful. Overall, we argue that modifying the tail heaviness of the t-SNE kernel can yield additional insight into the cluster structure of the data

arXiv.org e-Print Archive

Crossref

Stochastic Neighbor Embedding with Gaussian and Student-t Distributions: Tutorial and Survey

Author: Crowley Mark
Ghodsi Ali
Ghojogh Benyamin
Karray Fakhri
Publication venue
Publication date: 21/09/2020
Field of study

Stochastic Neighbor Embedding (SNE) is a manifold learning and dimensionality reduction method with a probabilistic approach. In SNE, every point is consider to be the neighbor of all other points with some probability and this probability is tried to be preserved in the embedding space. SNE considers Gaussian distribution for the probability in both the input and embedding spaces. However, t-SNE uses the Student-t and Gaussian distributions in these spaces, respectively. In this tutorial and survey paper, we explain SNE, symmetric SNE, t-SNE (or Cauchy-SNE), and t-SNE with general degrees of freedom. We also cover the out-of-sample extension and acceleration for these methods. Some simulations to visualize the embeddings are also provided.Comment: To appear as a part of an upcoming academic book on dimensionality reduction and manifold learnin

arXiv.org e-Print Archive

Dimensionality Reduction by Supervised Neighbor Embedding Using Laplacian Search

Author: Carlo Cattani
Hangke Zhang
Jianwei Zheng
Wanliang Wang
Publication venue: 'Hindawi Limited'
Publication date: 01/01/2014
Field of study

Crossref

Manifold Learning in Atomistic Simulations: A Conceptual Review

Author: Chen Ming
Rydzewski Jakub
Valsson Omar
Publication venue
Publication date: 27/05/2023
Field of study

Analyzing large volumes of high-dimensional data requires dimensionality reduction: finding meaningful low-dimensional structures hidden in their high-dimensional observations. Such practice is needed in atomistic simulations of complex systems where even thousands of degrees of freedom are sampled. An abundance of such data makes gaining insight into a specific physical problem strenuous. Our primary aim in this review is to focus on unsupervised machine learning methods that can be used on simulation data to find a low-dimensional manifold providing a collective and informative characterization of the studied process. Such manifolds can be used for sampling long-timescale processes and free-energy estimation. We describe methods that can work on datasets from standard and enhanced sampling atomistic simulations. Unlike recent reviews on manifold learning for atomistic simulations, we consider only methods that construct low-dimensional manifolds based on Markov transition probabilities between high-dimensional samples. We discuss these techniques from a conceptual point of view, including their underlying theoretical frameworks and possible limitations

arXiv.org e-Print Archive