1,434 research outputs found
Combining in-situ and in-transit processing to enable extreme-scale scientific analysis
pre-printWith the onset of extreme-scale computing, I/O constraints make it increasingly difficult for scientists to save a sufficient amount of raw simulation data to persistent storage. One potential solution is to change the data analysis pipeline from a post-process centric to a concurrent approach based on either in-situ or in-transit processing. In this context computations are considered in-situ if they utilize the primary compute resources, while in-transit processing refers to offloading computations to a set of secondary resources using asynchronous data transfers. In this paper we explore the design and implementation of three common analysis techniques typically performed on large-scale scientific simulations: topological analysis, descriptive statistics, and visualization. We summarize algorithmic developments, describe a resource scheduling system to coordinate the execution of various analysis workflows, and discuss our implementation using the DataSpaces and ADIOS frameworks that support efficient data movement between in-situ and in-transit computations. We demonstrate the efficiency of our lightweight, flexible framework by deploying it on the Jaguar XK6 to analyze data generated by S3D, a massively parallel turbulent combustion code. Our framework allows scientists dealing with the data deluge at extreme scale to perform analyses at increased temporal resolutions, mitigate I/O costs, and significantly improve the time to insight
Task-based Augmented Contour Trees with Fibonacci Heaps
This paper presents a new algorithm for the fast, shared memory, multi-core
computation of augmented contour trees on triangulations. In contrast to most
existing parallel algorithms our technique computes augmented trees, enabling
the full extent of contour tree based applications including data segmentation.
Our approach completely revisits the traditional, sequential contour tree
algorithm to re-formulate all the steps of the computation as a set of
independent local tasks. This includes a new computation procedure based on
Fibonacci heaps for the join and split trees, two intermediate data structures
used to compute the contour tree, whose constructions are efficiently carried
out concurrently thanks to the dynamic scheduling of task parallelism. We also
introduce a new parallel algorithm for the combination of these two trees into
the output global contour tree. Overall, this results in superior time
performance in practice, both in sequential and in parallel thanks to the
OpenMP task runtime. We report performance numbers that compare our approach to
reference sequential and multi-threaded implementations for the computation of
augmented merge and contour trees. These experiments demonstrate the run-time
efficiency of our approach and its scalability on common workstations. We
demonstrate the utility of our approach in data segmentation applications
Exploring power behaviors and trade-offs of in-situ data analytics
pre-printAs scientific applications target exascale, challenges related to data and energy are becoming dominating concerns. For example, coupled simulation workflows are increasingly adopting in-situ data processing and analysis techniques to address costs and overheads due to data movement and I/O. However it is also critical to understand these overheads and associated trade-offs from an energy perspective. The goal of this paper is exploring data-related energy/performance trade-offs for end-to-end simulation workflows running at scale on current high-end computing systems. Specifically, this paper presents: (1) an analysis of the data-related behaviors of a combustion simulation workflow with an in-situ data analytics pipeline, running on the Titan system at ORNL; (2) a power model based on system power and data exchange patterns, which is empirically validated; and (3) the use of the model to characterize the energy behavior of the workflow and to explore energy/performance tradeoffs on current as well as emerging systems
Doctor of Philosophy
dissertationA broad range of applications capture dynamic data at an unprecedented scale. Independent of the application area, finding intuitive ways to understand the dynamic aspects of these increasingly large data sets remains an interesting and, to some extent, unsolved research problem. Generically, dynamic data sets can be described by some, often hierarchical, notion of feature of interest that exists at each moment in time, and those features evolve across time. Consequently, exploring the evolution of these features is considered to be one natural way of studying these data sets. Usually, this process entails the ability to: 1) define and extract features from each time step in the data set; 2) find their correspondences over time; and 3) analyze their evolution across time. However, due to the large data sizes, visualizing the evolution of features in a comprehensible manner and performing interactive changes are challenging. Furthermore, feature evolution details are often unmanageably large and complex, making it difficult to identify the temporal trends in the underlying data. Additionally, many existing approaches develop these components in a specialized and standalone manner, thus failing to address the general task of understanding feature evolution across time. This dissertation demonstrates that interactive exploration of feature evolution can be achieved in a non-domain-specific manner so that it can be applied across a wide variety of application domains. In particular, a novel generic visualization and analysis environment that couples a multiresolution unified spatiotemporal representation of features with progressive layout and visualization strategies for studying the feature evolution across time is introduced. This flexible framework enables on-the-fly changes to feature definitions, their correspondences, and other arbitrary attributes while providing an interactive view of the resulting feature evolution details. Furthermore, to reduce the visual complexity within the feature evolution details, several subselection-based and localized, per-feature parameter value-based strategies are also enabled. The utility and generality of this framework is demonstrated by using several large-scale dynamic data sets
Lifted Wasserstein Matcher for Fast and Robust Topology Tracking
This paper presents a robust and efficient method for tracking topological
features in time-varying scalar data. Structures are tracked based on the
optimal matching between persistence diagrams with respect to the Wasserstein
metric. This fundamentally relies on solving the assignment problem, a special
case of optimal transport, for all consecutive timesteps. Our approach relies
on two main contributions. First, we revisit the seminal assignment algorithm
by Kuhn and Munkres which we specifically adapt to the problem of matching
persistence diagrams in an efficient way. Second, we propose an extension of
the Wasserstein metric that significantly improves the geometrical stability of
the matching of domain-embedded persistence pairs. We show that this
geometrical lifting has the additional positive side-effect of improving the
assignment matrix sparsity and therefore computing time. The global framework
implements a coarse-grained parallelism by computing persistence diagrams and
finding optimal matchings in parallel for every couple of consecutive
timesteps. Critical trajectories are constructed by associating successively
matched persistence pairs over time. Merging and splitting events are detected
with a geometrical threshold in a post-processing stage. Extensive experiments
on real-life datasets show that our matching approach is an order of magnitude
faster than the seminal Munkres algorithm. Moreover, compared to a modern
approximation method, our method provides competitive runtimes while yielding
exact results. We demonstrate the utility of our global framework by extracting
critical point trajectories from various simulated time-varying datasets and
compare it to the existing methods based on associated overlaps of volumes.
Robustness to noise and temporal resolution downsampling is empirically
demonstrated
Viscous Fingering: A Topological Visual Analytic Approach
International audienceWe present a methodology to analyze and visualize an ensemble of finite pointset method (FPM) simulations that model the viscous fingering process of salt solutions inside water. In course of the simulations the solutions form structures with increased salt concentration called viscous fingers. These structures are of primary interest to domain scientists as it is not clear when and where viscous fingers appear and how they evolve. To explore the aleatoric uncertainty embedded in the simulations we analyze an ensemble of simulation runs which differ due to stochastic effects. To detect and track the viscous fingers we derive a voxel volume for each simulation where fingers are identified as subvolumes that satisfy geometrical and topological constraints. Properties and the evolution of fingers are illustrated through tracking graphs that visualize when fingers form, dissolve, merge, and split. We provide multiple linked views to compare, browse, and analyze the ensemble in real-time. Fig. 1: Detail view of our visualization tool consisting of the 3D rendering of viscous fingers (left), and the tracking graph showing their evolution (right)
Doctor of Philosophy
dissertationWith the ever-increasing amount of available computing resources and sensing devices, a wide variety of high-dimensional datasets are being produced in numerous fields. The complexity and increasing popularity of these data have led to new challenges and opportunities in visualization. Since most display devices are limited to communication through two-dimensional (2D) images, many visualization methods rely on 2D projections to express high-dimensional information. Such a reduction of dimension leads to an explosion in the number of 2D representations required to visualize high-dimensional spaces, each giving a glimpse of the high-dimensional information. As a result, one of the most important challenges in visualizing high-dimensional datasets is the automatic filtration and summarization of the large exploration space consisting of all 2D projections. In this dissertation, a new type of algorithm is introduced to reduce the exploration space that identifies a small set of projections that capture the intrinsic structure of high-dimensional data. In addition, a general framework for summarizing the structure of quality measures in the space of all linear 2D projections is presented. However, identifying the representative or informative projections is only part of the challenge. Due to the high-dimensional nature of these datasets, obtaining insights and arriving at conclusions based solely on 2D representations are limited and prone to error. How to interpret the inaccuracies and resolve the ambiguity in the 2D projections is the other half of the puzzle. This dissertation introduces projection distortion error measures and interactive manipulation schemes that allow the understanding of high-dimensional structures via data manipulation in 2D projections
- …