20,108 research outputs found
Clustrophile: A Tool for Visual Clustering Analysis
While clustering is one of the most popular methods for data mining, analysts
lack adequate tools for quick, iterative clustering analysis, which is
essential for hypothesis generation and data reasoning. We introduce
Clustrophile, an interactive tool for iteratively computing discrete and
continuous data clusters, rapidly exploring different choices of clustering
parameters, and reasoning about clustering instances in relation to data
dimensions. Clustrophile combines three basic visualizations -- a table of raw
datasets, a scatter plot of planar projections, and a matrix diagram (heatmap)
of discrete clusterings -- through interaction and intermediate visual
encoding. Clustrophile also contributes two spatial interaction techniques,
and , and a
visualization method, , for reasoning about two-dimensional
projections obtained through dimensionality reductions.Comment: KDD IDEA'1
A Space-Efficient Method for Navigable Ensemble Analysis and Visualization
Scientists increasingly rely on simulation runs of complex models in lieu of
cost-prohibitive or infeasible experimentation. The data output of many
controlled simulation runs, the ensemble, is used to verify correctness and
quantify uncertainty. However, due to their size and complexity, ensembles are
difficult to visually analyze because the working set often exceeds strict
memory limitations. We present a navigable ensemble analysis tool, NEA, for
interactive exploration of ensembles. NEA's pre-processing component takes
advantage of the data similarity characteristics of ensembles to represent the
data in a new, spatially-efficient data structure which does not require fully
reconstructing the original data at visualization time. This data structure
allows a fine degree of control in working set management, which enables
interactive ensemble exploration while fitting within memory limitations.
Scientists can also gain new insights from the data-similarity analysis in the
pre-processing component.Comment: 11 pages, 10 figure
Exploring the Human Connectome Topology in Group Studies
Visually comparing brain networks, or connectomes, is an essential task in
the field of neuroscience. Especially relevant to the field of clinical
neuroscience, group studies that examine differences between populations or
changes over time within a population enable neuroscientists to reason about
effective diagnoses and treatments for a range of neuropsychiatric disorders.
In this paper, we specifically explore how visual analytics tools can be used
to facilitate various clinical neuroscience tasks, in which observation and
analysis of meaningful patterns in the connectome can support patient diagnosis
and treatment. We conduct a survey of visualization tasks that enable clinical
neuroscience activities, and further explore how existing connectome
visualization tools support or fail to support these tasks. Based on our
investigation of these tasks, we introduce a novel visualization tool,
NeuroCave, to support group studies analyses. We discuss how our design
decisions (the use of immersive visualization, the use of hierarchical
clustering and dimensionality reduction techniques, and the choice of visual
encodings) are motivated by these tasks. We evaluate NeuroCave through two use
cases that illustrate the utility of interactive connectome visualization in
clinical neuroscience contexts. In the first use case, we study sex differences
using functional connectomes and discover hidden connectome patterns associated
with well-known cognitive differences in spatial and verbal abilities. In the
second use case, we show how the utility of visualizing the brain in different
topological space coupled with clustering information can reveal the brain's
intrinsic structure
Visual Analytics of Image-Centric Cohort Studies in Epidemiology
Epidemiology characterizes the influence of causes to disease and health
conditions of defined populations. Cohort studies are population-based studies
involving usually large numbers of randomly selected individuals and comprising
numerous attributes, ranging from self-reported interview data to results from
various medical examinations, e.g., blood and urine samples. Since recently,
medical imaging has been used as an additional instrument to assess risk
factors and potential prognostic information. In this chapter, we discuss such
studies and how the evaluation may benefit from visual analytics. Cluster
analysis to define groups, reliable image analysis of organs in medical imaging
data and shape space exploration to characterize anatomical shapes are among
the visual analytics tools that may enable epidemiologists to fully exploit the
potential of their huge and complex data. To gain acceptance, visual analytics
tools need to complement more classical epidemiologic tools, primarily
hypothesis-driven statistical analysis
GPGPU Linear Complexity t-SNE Optimization
The t-distributed Stochastic Neighbor Embedding (tSNE) algorithm has become
in recent years one of the most used and insightful techniques for the
exploratory data analysis of high-dimensional data. tSNE reveals clusters of
high-dimensional data points at different scales while it requires only minimal
tuning of its parameters. Despite these advantages, the computational
complexity of the algorithm limits its application to relatively small
datasets. To address this problem, several evolutions of tSNE have been
developed in recent years, mainly focusing on the scalability of the similarity
computations between data points. However, these contributions are insufficient
to achieve interactive rates when visualizing the evolution of the tSNE
embedding for large datasets. In this work, we present a novel approach to the
minimization of the tSNE objective function that heavily relies on modern
graphics hardware and has linear computational complexity. Our technique does
not only beat the state of the art, but can even be executed on the client side
in a browser. We propose to approximate the repulsion forces between data
points using adaptive-resolution textures that are drawn at every iteration
with WebGL. This approximation allows us to reformulate the tSNE minimization
problem as a series of tensor operation that are computed with TensorFlow.js, a
JavaScript library for scalable tensor computations
Parameter clustering in Bayesian functional PCA of fMRI data
The extraordinary advancements in neuroscientific technology for brain
recordings over the last decades have led to increasingly complex
spatio-temporal datasets. To reduce oversimplifications, new models have been
developed to be able to identify meaningful patterns and new insights within a
highly demanding data environment. To this extent, we propose a new model
called parameter clustering functional Principal Component Analysis (PCl-fPCA)
that merges ideas from Functional Data Analysis and Bayesian nonparametrics to
obtain a flexible and computationally feasible signal reconstruction and
exploration of spatio-temporal neuroscientific data. In particular, we use a
Dirichlet process Gaussian mixture model to cluster functional principal
component scores within the standard Bayesian functional PCA framework. This
approach captures the spatial dependence structure among smoothed time series
(curves) and its interaction with the time domain without imposing a prior
spatial structure on the data. Moreover, by moving the mixture from data to
functional principal component scores, we obtain a more general clustering
procedure, thus allowing a higher level of intricate insight and understanding
of the data. We present results from a simulation study showing improvements in
curve and correlation reconstruction compared with different Bayesian and
frequentist fPCA models and we apply our method to functional Magnetic
Resonance Imaging and Electroencephalogram data analyses providing a rich
exploration of the spatio-temporal dependence in brain time series
Exploration of Heterogeneous Data Using Robust Similarity
Heterogeneous data pose serious challenges to data analysis tasks, including
exploration and visualization. Current techniques often utilize dimensionality
reductions, aggregation, or conversion to numerical values to analyze
heterogeneous data. However, the effectiveness of such techniques to find
subtle structures such as the presence of multiple modes or detection of
outliers is hindered by the challenge to find the proper subspaces or prior
knowledge to reveal the structures. In this paper, we propose a generic
similarity-based exploration technique that is applicable to a wide variety of
datatypes and their combinations, including heterogeneous ensembles. The
proposed concept of similarity has a close connection to statistical analysis
and can be deployed for summarization, revealing fine structures such as the
presence of multiple modes, and detection of anomalies or outliers. We then
propose a visual encoding framework that enables the exploration of a
heterogeneous dataset in different levels of detail and provides insightful
information about both global and local structures. We demonstrate the utility
of the proposed technique using various real datasets, including ensemble data.Comment: Presented at Visualization in Data Science (VDS at IEEE VIS 2017
Sherlock: Sparse Hierarchical Embeddings for Visually-aware One-class Collaborative Filtering
Building successful recommender systems requires uncovering the underlying
dimensions that describe the properties of items as well as users' preferences
toward them. In domains like clothing recommendation, explaining users'
preferences requires modeling the visual appearance of the items in question.
This makes recommendation especially challenging, due to both the complexity
and subtlety of people's 'visual preferences,' as well as the scale and
dimensionality of the data and features involved. Ultimately, a successful
model should be capable of capturing considerable variance across different
categories and styles, while still modeling the commonalities explained by
`global' structures in order to combat the sparsity (e.g. cold-start),
variability, and scale of real-world datasets. Here, we address these
challenges by building such structures to model the visual dimensions across
different product categories. With a novel hierarchical embedding architecture,
our method accounts for both high-level (colorfulness, darkness, etc.) and
subtle (e.g. casualness) visual characteristics simultaneously.Comment: 7 pages, 3 figure
Visual Feature Fusion and its Application to Support Unsupervised Clustering Tasks
On visual analytics applications, the concept of putting the user on the loop
refers to the ability to replace heuristics by user knowledge on machine
learning and data mining tasks. On supervised tasks, the user engagement occurs
via the manipulation of the training data. However, on unsupervised tasks, the
user involvement is limited to changes in the algorithm parametrization or the
input data representation, also known as features. Depending on the application
domain, different types of features can be extracted from the raw data.
Therefore, the result of unsupervised algorithms heavily depends on the type of
employed feature. Since there is no perfect feature extractor, combining
different features have been explored in a process called feature fusion. The
feature fusion is straightforward when the machine learning or data mining task
has a cost function. However, when such a function does not exist, user support
for combination needs to be provided otherwise the process is impractical. In
this paper, we present a novel feature fusion approach that uses small data
samples to allows users not only to effortless control the combination of
different feature sets but also to interpret the attained results. The
effectiveness of our approach is confirmed by a comprehensive set of
qualitative and quantitative tests, opening up different possibilities of
user-guided analytical scenarios not covered yet. The ability of our approach
to providing real-time feedback for the feature fusion is exploited on the
context of unsupervised clustering techniques, where the composed groups
reflect the semantics of the feature combination.Comment: 15 pages, 21 Figure
Bag of Visual Words and Fusion Methods for Action Recognition: Comprehensive Study and Good Practice
Video based action recognition is one of the important and challenging
problems in computer vision research. Bag of Visual Words model (BoVW) with
local features has become the most popular method and obtained the
state-of-the-art performance on several realistic datasets, such as the HMDB51,
UCF50, and UCF101. BoVW is a general pipeline to construct a global
representation from a set of local features, which is mainly composed of five
steps: (i) feature extraction, (ii) feature pre-processing, (iii) codebook
generation, (iv) feature encoding, and (v) pooling and normalization. Many
efforts have been made in each step independently in different scenarios and
their effect on action recognition is still unknown. Meanwhile, video data
exhibits different views of visual pattern, such as static appearance and
motion dynamics. Multiple descriptors are usually extracted to represent these
different views. Many feature fusion methods have been developed in other areas
and their influence on action recognition has never been investigated before.
This paper aims to provide a comprehensive study of all steps in BoVW and
different fusion methods, and uncover some good practice to produce a
state-of-the-art action recognition system. Specifically, we explore two kinds
of local features, ten kinds of encoding methods, eight kinds of pooling and
normalization strategies, and three kinds of fusion methods. We conclude that
every step is crucial for contributing to the final recognition rate.
Furthermore, based on our comprehensive study, we propose a simple yet
effective representation, called hybrid representation, by exploring the
complementarity of different BoVW frameworks and local descriptors. Using this
representation, we obtain the state-of-the-art on the three challenging
datasets: HMDB51 (61.1%), UCF50 (92.3%), and UCF101 (87.9%)
- …