1,704 research outputs found
Towards Data-Driven Large Scale Scientific Visualization and Exploration
Technological advances have enabled us to acquire extremely large
datasets but it remains a challenge to store, process, and extract
information from them. This dissertation builds upon recent advances
in machine learning, visualization, and user interactions to
facilitate exploration of large-scale scientific datasets. First, we
use data-driven approaches to computationally identify regions of
interest in the datasets. Second, we use visual presentation for
effective user comprehension. Third, we provide interactions for
human users to integrate domain knowledge and semantic information
into this exploration process.
Our research shows how to extract, visualize, and explore informative
regions on very large 2D landscape images, 3D volumetric datasets,
high-dimensional volumetric mouse brain datasets with thousands of
spatially-mapped gene expression profiles, and geospatial trajectories
that evolve over time. The contribution of this dissertation include:
(1) We introduce a sliding-window saliency model that discovers
regions of user interest in very large images; (2) We develop visual
segmentation of intensity-gradient histograms to identify meaningful
components from volumetric datasets; (3) We extract boundary surfaces
from a wealth of volumetric gene expression mouse brain profiles to
personalize the reference brain atlas; (4) We show how to efficiently
cluster geospatial trajectories by mapping each sequence of locations
to a high-dimensional point with the kernel distance framework.
We aim to discover patterns, relationships, and anomalies that would
lead to new scientific, engineering, and medical advances. This work
represents one of the first steps toward better visual understanding
of large-scale scientific data by combining machine learning and human
intelligence
Lagrangian Based Methods for Coherent Structure Detection
There has been a proliferation in the development of Lagrangian analytical methods for detecting coherent structures in fluid flow transport, yielding a variety of qualitatively different approaches. We present a review of four approaches and demonstrate the utility of these methods via their application to the same sample analytic model, the canonical double-gyre flow, highlighting the pros and cons of each approach. Two of the methods, the geometric and probabilistic approaches, are well established and require velocity field data over the time interval of interest to identify particularly important material lines and surfaces, and influential regions, respectively. The other two approaches, implementing tools from cluster and braid theory, seek coherent structures based on limited trajectory data, attempting to partition the flow transport into distinct regions. All four of these approaches share the common trait that they are objective methods, meaning that their results do not depend on the frame of reference used. For each method, we also present a number of example applications ranging from blood flow and chemical reactions to ocean and atmospheric flows. (C) 2015 AIP Publishing LLC.ONR N000141210665Center for Nonlinear Dynamic
voxel2vec: A Natural Language Processing Approach to Learning Distributed Representations for Scientific Data
Relationships in scientific data, such as the numerical and spatial
distribution relations of features in univariate data, the scalar-value
combinations' relations in multivariate data, and the association of volumes in
time-varying and ensemble data, are intricate and complex. This paper presents
voxel2vec, a novel unsupervised representation learning model, which is used to
learn distributed representations of scalar values/scalar-value combinations in
a low-dimensional vector space. Its basic assumption is that if two scalar
values/scalar-value combinations have similar contexts, they usually have high
similarity in terms of features. By representing scalar values/scalar-value
combinations as symbols, voxel2vec learns the similarity between them in the
context of spatial distribution and then allows us to explore the overall
association between volumes by transfer prediction. We demonstrate the
usefulness and effectiveness of voxel2vec by comparing it with the isosurface
similarity map of univariate data and applying the learned distributed
representations to feature classification for multivariate data and to
association analysis for time-varying and ensemble data.Comment: Accepted by IEEE Transaction on Visualization and Computer Graphics
(TVCG
Lifted Wasserstein Matcher for Fast and Robust Topology Tracking
This paper presents a robust and efficient method for tracking topological
features in time-varying scalar data. Structures are tracked based on the
optimal matching between persistence diagrams with respect to the Wasserstein
metric. This fundamentally relies on solving the assignment problem, a special
case of optimal transport, for all consecutive timesteps. Our approach relies
on two main contributions. First, we revisit the seminal assignment algorithm
by Kuhn and Munkres which we specifically adapt to the problem of matching
persistence diagrams in an efficient way. Second, we propose an extension of
the Wasserstein metric that significantly improves the geometrical stability of
the matching of domain-embedded persistence pairs. We show that this
geometrical lifting has the additional positive side-effect of improving the
assignment matrix sparsity and therefore computing time. The global framework
implements a coarse-grained parallelism by computing persistence diagrams and
finding optimal matchings in parallel for every couple of consecutive
timesteps. Critical trajectories are constructed by associating successively
matched persistence pairs over time. Merging and splitting events are detected
with a geometrical threshold in a post-processing stage. Extensive experiments
on real-life datasets show that our matching approach is an order of magnitude
faster than the seminal Munkres algorithm. Moreover, compared to a modern
approximation method, our method provides competitive runtimes while yielding
exact results. We demonstrate the utility of our global framework by extracting
critical point trajectories from various simulated time-varying datasets and
compare it to the existing methods based on associated overlaps of volumes.
Robustness to noise and temporal resolution downsampling is empirically
demonstrated
An Ensemble Machine Learning Approach for Tropical Cyclone Detection Using ERA5 Reanalysis Data
Tropical Cyclones (TCs) are counted among the most destructive phenomena that
can be found in nature. Every year, globally an average of 90 TCs occur over
tropical waters, and global warming is making them stronger, larger and more
destructive. The accurate detection and tracking of such phenomena have become
a relevant and interesting area of research in weather and climate science.
Traditionally, TCs have been identified in large climate datasets through the
use of deterministic tracking schemes that rely on subjective thresholds.
Machine Learning (ML) models can complement deterministic approaches due to
their ability to capture the mapping between the input climatic drivers and the
geographical position of the TC center from the available data. This study
presents a ML ensemble approach for locating TC center coordinates, embedding
both TC classification and localization in a single end-to-end learning task.
The ensemble combines TC center estimates of different ML models that agree
about the presence of a TC in input data. ERA5 reanalysis were used for model
training and testing jointly with the International Best Track Archive for
Climate Stewardship records. Results showed that the ML approach is well-suited
for TC detection providing good generalization capabilities on out of sample
data. In particular, it was able to accurately detect lower TC categories than
those used for training the models. On top of this, the ensemble approach was
able to further improve TC localization performance with respect to single
model TC center estimates, demonstrating the good capabilities of the proposed
approach.Comment: 27 pages, 8 figures, 1 table, submitted to Journal of Advances in
Modeling Earth System
- …