44 research outputs found

    Combining in-situ and in-transit processing to enable extreme-scale scientific analysis

    Get PDF
    pre-printWith the onset of extreme-scale computing, I/O constraints make it increasingly difficult for scientists to save a sufficient amount of raw simulation data to persistent storage. One potential solution is to change the data analysis pipeline from a post-process centric to a concurrent approach based on either in-situ or in-transit processing. In this context computations are considered in-situ if they utilize the primary compute resources, while in-transit processing refers to offloading computations to a set of secondary resources using asynchronous data transfers. In this paper we explore the design and implementation of three common analysis techniques typically performed on large-scale scientific simulations: topological analysis, descriptive statistics, and visualization. We summarize algorithmic developments, describe a resource scheduling system to coordinate the execution of various analysis workflows, and discuss our implementation using the DataSpaces and ADIOS frameworks that support efficient data movement between in-situ and in-transit computations. We demonstrate the efficiency of our lightweight, flexible framework by deploying it on the Jaguar XK6 to analyze data generated by S3D, a massively parallel turbulent combustion code. Our framework allows scientists dealing with the data deluge at extreme scale to perform analyses at increased temporal resolutions, mitigate I/O costs, and significantly improve the time to insight

    ISCR Annual Report: Fical Year 2004

    Full text link

    Exploring power behaviors and trade-offs of in-situ data analytics

    Get PDF
    pre-printAs scientific applications target exascale, challenges related to data and energy are becoming dominating concerns. For example, coupled simulation workflows are increasingly adopting in-situ data processing and analysis techniques to address costs and overheads due to data movement and I/O. However it is also critical to understand these overheads and associated trade-offs from an energy perspective. The goal of this paper is exploring data-related energy/performance trade-offs for end-to-end simulation workflows running at scale on current high-end computing systems. Specifically, this paper presents: (1) an analysis of the data-related behaviors of a combustion simulation workflow with an in-situ data analytics pipeline, running on the Titan system at ORNL; (2) a power model based on system power and data exchange patterns, which is empirically validated; and (3) the use of the model to characterize the energy behavior of the workflow and to explore energy/performance tradeoffs on current as well as emerging systems

    Adaptive remote visualization system with optimized network performance for large scale scientific data

    Get PDF
    This dissertation discusses algorithmic and implementation aspects of an automatically configurable remote visualization system, which optimally decomposes and adaptively maps the visualization pipeline to a wide-area network. The first node typically serves as a data server that generates or stores raw data sets and a remote client resides on the last node equipped with a display device ranging from a personal desktop to a powerwall. Intermediate nodes can be located anywhere on the network and often include workstations, clusters, or custom rendering engines. We employ a regression model-based network daemon to estimate the effective bandwidth and minimal delay of a transport path using active traffic measurement. Data processing time is predicted for various visualization algorithms using block partition and statistical technique. Based on the link measurements, node characteristics, and module properties, we strategically organize visualization pipeline modules such as filtering, geometry generation, rendering, and display into groups, and dynamically assign them to appropriate network nodes to achieve minimal total delay for post-processing or maximal frame rate for streaming applications. We propose polynomial-time algorithms using the dynamic programming method to compute the optimal solutions for the problems of pipeline decomposition and network mapping under different constraints. A parallel based remote visualization system, which comprises a logical group of autonomous nodes that cooperate to enable sharing, selection, and aggregation of various types of resources distributed over a network, is implemented and deployed at geographically distributed nodes for experimental testing. Our system is capable of handling a complete spectrum of remote visualization tasks expertly including post processing, computational steering and wireless sensor network monitoring. Visualization functionalities such as isosurface, ray casting, streamline, linear integral convolution (LIC) are supported in our system. The proposed decomposition and mapping scheme is generic and can be applied to other network-oriented computation applications whose computing components form a linear arrangement

    Doctor of Philosophy

    Get PDF
    dissertationWith modern computational resources rapidly advancing towards exascale, large-scale simulations useful for understanding natural and man-made phenomena are becoming in- creasingly accessible. As a result, the size and complexity of data representing such phenom- ena are also increasing, making the role of data analysis to propel science even more integral. This dissertation presents research on addressing some of the contemporary challenges in the analysis of vector fields--an important type of scientific data useful for representing a multitude of physical phenomena, such as wind flow and ocean currents. In particular, new theories and computational frameworks to enable consistent feature extraction from vector fields are presented. One of the most fundamental challenges in the analysis of vector fields is that their features are defined with respect to reference frames. Unfortunately, there is no single ""correct"" reference frame for analysis, and an unsuitable frame may cause features of interest to remain undetected, thus creating serious physical consequences. This work develops new reference frames that enable extraction of localized features that other techniques and frames fail to detect. As a result, these reference frames objectify the notion of ""correctness"" of features for certain goals by revealing the phenomena of importance from the underlying data. An important consequence of using these local frames is that the analysis of unsteady (time-varying) vector fields can be reduced to the analysis of sequences of steady (time- independent) vector fields, which can be performed using simpler and scalable techniques that allow better data management by accessing the data on a per-time-step basis. Nevertheless, the state-of-the-art analysis of steady vector fields is not robust, as most techniques are numerical in nature. The residing numerical errors can violate consistency with the underlying theory by breaching important fundamental laws, which may lead to serious physical consequences. This dissertation considers consistency as the most fundamental characteristic of computational analysis that must always be preserved, and presents a new discrete theory that uses combinatorial representations and algorithms to provide consistency guarantees during vector field analysis along with the uncertainty visualization of unavoidable discretization errors. Together, the two main contributions of this dissertation address two important concerns regarding feature extraction from scientific data: correctness and precision. The work presented here also opens new avenues for further research by exploring more-general reference frames and more-sophisticated domain discretizations

    Data Analysis with the Morse-Smale Complex: The msr Package for R

    Get PDF
    In many areas, scientists deal with increasingly high-dimensional data sets. An important aspect for these scientists is to gain a qualitative understanding of the process or system from which the data is gathered. Often, both input variables and an outcome are observed and the data can be characterized as a sample from a high-dimensional scalar function. This work presents the R package msr for exploratory data analysis of multivariate scalar functions based on the Morse-Smale complex. The Morse-Smale complex provides a topologically meaningful decomposition of the domain. The msr package implements a discrete approximation of the Morse-Smale complex for data sets. In previous work this approximation has been exploited for visualization and partition-based regression, which are both supported in the msr package. The visualization combines the Morse-Smale complex with dimension-reduction techniques for a visual summary representation that serves as a guide for interactive exploration of the high-dimensional function. In a similar fashion, the regression employs a combination of linear models based on the Morse-Smale decomposition of the domain. This regression approach yields topologically accurate estimates and facilitates interpretation of general trends and statistical comparisons between partitions. In this manner, the msr package supports high-dimensional data understanding and exploration through the Morse-Smale complex

    Institute for Scientific Computing Research Annual Report: Fiscal Year 2004

    Full text link

    Doctor of Philosophy

    Get PDF
    dissertationA broad range of applications capture dynamic data at an unprecedented scale. Independent of the application area, finding intuitive ways to understand the dynamic aspects of these increasingly large data sets remains an interesting and, to some extent, unsolved research problem. Generically, dynamic data sets can be described by some, often hierarchical, notion of feature of interest that exists at each moment in time, and those features evolve across time. Consequently, exploring the evolution of these features is considered to be one natural way of studying these data sets. Usually, this process entails the ability to: 1) define and extract features from each time step in the data set; 2) find their correspondences over time; and 3) analyze their evolution across time. However, due to the large data sizes, visualizing the evolution of features in a comprehensible manner and performing interactive changes are challenging. Furthermore, feature evolution details are often unmanageably large and complex, making it difficult to identify the temporal trends in the underlying data. Additionally, many existing approaches develop these components in a specialized and standalone manner, thus failing to address the general task of understanding feature evolution across time. This dissertation demonstrates that interactive exploration of feature evolution can be achieved in a non-domain-specific manner so that it can be applied across a wide variety of application domains. In particular, a novel generic visualization and analysis environment that couples a multiresolution unified spatiotemporal representation of features with progressive layout and visualization strategies for studying the feature evolution across time is introduced. This flexible framework enables on-the-fly changes to feature definitions, their correspondences, and other arbitrary attributes while providing an interactive view of the resulting feature evolution details. Furthermore, to reduce the visual complexity within the feature evolution details, several subselection-based and localized, per-feature parameter value-based strategies are also enabled. The utility and generality of this framework is demonstrated by using several large-scale dynamic data sets

    Avoiding the Global Sort: A Faster Contour Tree Algorithm

    Get PDF
    We revisit the classical problem of computing the \emph{contour tree} of a scalar field f:MRf:\mathbb{M} \to \mathbb{R}, where M\mathbb{M} is a triangulated simplicial mesh in Rd\mathbb{R}^d. The contour tree is a fundamental topological structure that tracks the evolution of level sets of ff and has numerous applications in data analysis and visualization. All existing algorithms begin with a global sort of at least all critical values of ff, which can require (roughly) Ω(nlogn)\Omega(n\log n) time. Existing lower bounds show that there are pathological instances where this sort is required. We present the first algorithm whose time complexity depends on the contour tree structure, and avoids the global sort for non-pathological inputs. If CC denotes the set of critical points in M\mathbb{M}, the running time is roughly O(vClogv)O(\sum_{v \in C} \log \ell_v), where v\ell_v is the depth of vv in the contour tree. This matches all existing upper bounds, but is a significant improvement when the contour tree is short and fat. Specifically, our approach ensures that any comparison made is between nodes in the same descending path in the contour tree, allowing us to argue strong optimality properties of our algorithm. Our algorithm requires several novel ideas: partitioning M\mathbb{M} in well-behaved portions, a local growing procedure to iteratively build contour trees, and the use of heavy path decompositions for the time complexity analysis
    corecore