1,764 research outputs found

    Approximated and User Steerable tSNE for Progressive Visual Analytics

    Full text link
    Progressive Visual Analytics aims at improving the interactivity in existing analytics techniques by means of visualization as well as interaction with intermediate results. One key method for data analysis is dimensionality reduction, for example, to produce 2D embeddings that can be visualized and analyzed efficiently. t-Distributed Stochastic Neighbor Embedding (tSNE) is a well-suited technique for the visualization of several high-dimensional data. tSNE can create meaningful intermediate results but suffers from a slow initialization that constrains its application in Progressive Visual Analytics. We introduce a controllable tSNE approximation (A-tSNE), which trades off speed and accuracy, to enable interactive data exploration. We offer real-time visualization techniques, including a density-based solution and a Magic Lens to inspect the degree of approximation. With this feedback, the user can decide on local refinements and steer the approximation level during the analysis. We demonstrate our technique with several datasets, in a real-world research scenario and for the real-time analysis of high-dimensional streams to illustrate its effectiveness for interactive data analysis

    Vibration-Based structural health monitoring using piezoelectric transducers and parametric t-SNE

    Get PDF
    In this paper, we evaluate the performance of the so-called parametric t-distributed stochastic neighbor embedding (P-t-SNE), comparing it to the performance of the t-SNE, the non-parametric version. The methodology used in this study is introduced for the detection and classification of structural changes in the field of structural health monitoring. This method is based on the combination of principal component analysis (PCA) and P-t-SNE, and it is applied to an experimental case study of an aluminum plate with four piezoelectric transducers. The basic steps of the detection and classification process are: (i) the raw data are scaled using mean-centered group scaling and then PCA is applied to reduce its dimensionality; (ii) P-t-SNE is applied to represent the scaled and reduced data as 2-dimensional points, defining a cluster for each structural state; and (iii) the current structure to be diagnosed is associated with a cluster employing two strategies: (a) majority voting; and (b) the sum of the inverse distances. The results in the frequency domain manifest the strong performance of P-t-SNE, which is comparable to the performance of t-SNE but outperforms t-SNE in terms of computational cost and runtime. When the method is based on P-t-SNE, the overall accuracy fluctuates between 99.5% and 99.75%.Peer ReviewedPostprint (published version

    Visualizing probabilistic models: Intensive Principal Component Analysis

    Full text link
    Unsupervised learning makes manifest the underlying structure of data without curated training and specific problem definitions. However, the inference of relationships between data points is frustrated by the `curse of dimensionality' in high-dimensions. Inspired by replica theory from statistical mechanics, we consider replicas of the system to tune the dimensionality and take the limit as the number of replicas goes to zero. The result is the intensive embedding, which is not only isometric (preserving local distances) but allows global structure to be more transparently visualized. We develop the Intensive Principal Component Analysis (InPCA) and demonstrate clear improvements in visualizations of the Ising model of magnetic spins, a neural network, and the dark energy cold dark matter ({\Lambda}CDM) model as applied to the Cosmic Microwave Background.Comment: 6 pages, 5 figure

    Compressive Embedding and Visualization using Graphs

    Get PDF
    Visualizing high-dimensional data has been a focus in data analysis communities for decades, which has led to the design of many algorithms, some of which are now considered references (such as t-SNE for example). In our era of overwhelming data volumes, the scalability of such methods have become more and more important. In this work, we present a method which allows to apply any visualization or embedding algorithm on very large datasets by considering only a fraction of the data as input and then extending the information to all data points using a graph encoding its global similarity. We show that in most cases, using only O(log(N))\mathcal{O}(\log(N)) samples is sufficient to diffuse the information to all NN data points. In addition, we propose quantitative methods to measure the quality of embeddings and demonstrate the validity of our technique on both synthetic and real-world datasets
    corecore