Search CORE

102,566 research outputs found

Visualizing Bags of Vectors

Author: Balasubramanian Sriramkumar
Nagireddy Raghuram Reddy
Publication venue
Publication date: 11/10/2013
Field of study

The motivation of this work is two-fold - a) to compare between two different modes of visualizing data that exists in a bag of vectors format b) to propose a theoretical model that supports a new mode of visualizing data. Visualizing high dimensional data can be achieved using Minimum Volume Embedding, but the data has to exist in a format suitable for computing similarities while preserving local distances. This paper compares the visualization between two methods of representing data and also proposes a new method providing sample visualizations for that method

arXiv.org e-Print Archive

CiteSeerX

Dynamical projections for the visualization of PDFSense data

Author: Cook Dianne
Laa Ursula
Valencia German
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 22/07/2018
Field of study

A recent paper on visualizing the sensitivity of hadronic experiments to nucleon structure [1] introduces the tool PDFSense which defines measures to allow the user to judge the sensitivity of PDF fits to a given experiment. The sensitivity is characterized by high-dimensional data residuals that are visualized in a 3-d subspace of the 10 first principal components or using t-SNE [2]. We show how a tour, a dynamic visualisation of high dimensional data, can extend this tool beyond 3-d relationships. This approach enables resolving structure orthogonal to the 2-d viewing plane used so far, and hence finer tuned assessment of the sensitivity.Comment: Format of the animations changed for easier viewin

arXiv.org e-Print Archive

EDP Sciences OAI-PMH repository (1.2.0)

Directory of Open Access Journals

Monash University Research Portal

Theoretical Foundations of t-SNE for Visualizing High-Dimensional Clustered Data

Author: Cai T. Tony
Ma Rong
Publication venue
Publication date: 31/10/2022
Field of study

This paper investigates the theoretical foundations of the t-distributed stochastic neighbor embedding (t-SNE) algorithm, a popular nonlinear dimension reduction and data visualization method. A novel theoretical framework for the analysis of t-SNE based on the gradient descent approach is presented. For the early exaggeration stage of t-SNE, we show its asymptotic equivalence to power iterations based on the underlying graph Laplacian, characterize its limiting behavior, and uncover its deep connection to Laplacian spectral clustering, and fundamental principles including early stopping as implicit regularization. The results explain the intrinsic mechanism and the empirical benefits of such a computational strategy. For the embedding stage of t-SNE, we characterize the kinematics of the low-dimensional map throughout the iterations, and identify an amplification phase, featuring the intercluster repulsion and the expansive behavior of the low-dimensional map, and a stabilization phase. The general theory explains the fast convergence rate and the exceptional empirical performance of t-SNE for visualizing clustered data, brings forth interpretations of the t-SNE visualizations, and provides theoretical guidance for applying t-SNE and selecting its tuning parameters in various applications.Comment: Accepted by Journal of Machine Learning Researc

arXiv.org e-Print Archive

Superheat: An R package for creating beautiful and extendable heatmaps for visualizing complex data

Author: Barter Rebecca L
Yu Bin
Publication venue
Publication date: 26/01/2017
Field of study

The technological advancements of the modern era have enabled the collection of huge amounts of data in science and beyond. Extracting useful information from such massive datasets is an ongoing challenge as traditional data visualization tools typically do not scale well in high-dimensional settings. An existing visualization technique that is particularly well suited to visualizing large datasets is the heatmap. Although heatmaps are extremely popular in fields such as bioinformatics for visualizing large gene expression datasets, they remain a severely underutilized visualization tool in modern data analysis. In this paper we introduce superheat, a new R package that provides an extremely flexible and customizable platform for visualizing large datasets using extendable heatmaps. Superheat enhances the traditional heatmap by providing a platform to visualize a wide range of data types simultaneously, adding to the heatmap a response variable as a scatterplot, model results as boxplots, correlation information as barplots, text information, and more. Superheat allows the user to explore their data to greater depths and to take advantage of the heterogeneity present in the data to inform analysis decisions. The goal of this paper is two-fold: (1) to demonstrate the potential of the heatmap as a default visualization method for a wide range of data types using reproducible examples, and (2) to highlight the customizability and ease of implementation of the superheat package in R for creating beautiful and extendable heatmaps. The capabilities and fundamental applicability of the superheat package will be explored via three case studies, each based on publicly available data sources and accompanied by a file outlining the step-by-step analytic pipeline (with code).Comment: 26 pages, 10 figure

arXiv.org e-Print Archive

eScholarship - University of California