1,764 research outputs found
Approximated and User Steerable tSNE for Progressive Visual Analytics
Progressive Visual Analytics aims at improving the interactivity in existing
analytics techniques by means of visualization as well as interaction with
intermediate results. One key method for data analysis is dimensionality
reduction, for example, to produce 2D embeddings that can be visualized and
analyzed efficiently. t-Distributed Stochastic Neighbor Embedding (tSNE) is a
well-suited technique for the visualization of several high-dimensional data.
tSNE can create meaningful intermediate results but suffers from a slow
initialization that constrains its application in Progressive Visual Analytics.
We introduce a controllable tSNE approximation (A-tSNE), which trades off speed
and accuracy, to enable interactive data exploration. We offer real-time
visualization techniques, including a density-based solution and a Magic Lens
to inspect the degree of approximation. With this feedback, the user can decide
on local refinements and steer the approximation level during the analysis. We
demonstrate our technique with several datasets, in a real-world research
scenario and for the real-time analysis of high-dimensional streams to
illustrate its effectiveness for interactive data analysis
Vibration-Based structural health monitoring using piezoelectric transducers and parametric t-SNE
In this paper, we evaluate the performance of the so-called parametric t-distributed stochastic neighbor embedding (P-t-SNE), comparing it to the performance of the t-SNE, the non-parametric version. The methodology used in this study is introduced for the detection and classification of structural changes in the field of structural health monitoring. This method is based on the combination of principal component analysis (PCA) and P-t-SNE, and it is applied to an experimental case study of an aluminum plate with four piezoelectric transducers. The basic steps of the detection and classification process are: (i) the raw data are scaled using mean-centered group scaling and then PCA is applied to reduce its dimensionality; (ii) P-t-SNE is applied to represent the scaled and reduced data as 2-dimensional points, defining a cluster for each structural state; and (iii) the current structure to be diagnosed is associated with a cluster employing two strategies: (a) majority voting; and (b) the sum of the inverse distances. The results in the frequency domain manifest the strong performance of P-t-SNE, which is comparable to the performance of t-SNE but outperforms t-SNE in terms of computational cost and runtime. When the method is based on P-t-SNE, the overall accuracy fluctuates between 99.5% and 99.75%.Peer ReviewedPostprint (published version
Visualizing probabilistic models: Intensive Principal Component Analysis
Unsupervised learning makes manifest the underlying structure of data without
curated training and specific problem definitions. However, the inference of
relationships between data points is frustrated by the `curse of
dimensionality' in high-dimensions. Inspired by replica theory from statistical
mechanics, we consider replicas of the system to tune the dimensionality and
take the limit as the number of replicas goes to zero. The result is the
intensive embedding, which is not only isometric (preserving local distances)
but allows global structure to be more transparently visualized. We develop the
Intensive Principal Component Analysis (InPCA) and demonstrate clear
improvements in visualizations of the Ising model of magnetic spins, a neural
network, and the dark energy cold dark matter ({\Lambda}CDM) model as applied
to the Cosmic Microwave Background.Comment: 6 pages, 5 figure
Compressive Embedding and Visualization using Graphs
Visualizing high-dimensional data has been a focus in data analysis
communities for decades, which has led to the design of many algorithms, some
of which are now considered references (such as t-SNE for example). In our era
of overwhelming data volumes, the scalability of such methods have become more
and more important. In this work, we present a method which allows to apply any
visualization or embedding algorithm on very large datasets by considering only
a fraction of the data as input and then extending the information to all data
points using a graph encoding its global similarity. We show that in most
cases, using only samples is sufficient to diffuse the
information to all data points. In addition, we propose quantitative
methods to measure the quality of embeddings and demonstrate the validity of
our technique on both synthetic and real-world datasets
- …