33 research outputs found

    Out-of-core visualization using iterator-aware multidimensional prefetching

    Full text link

    Granite: A scientific database model and implementation

    Get PDF
    The principal goal of this research was to develop a formal comprehensive model for representing highly complex scientific data. An effective model should provide a conceptually uniform way to represent data and it should serve as a framework for the implementation of an efficient and easy-to-use software environment that implements the model. The dissertation work presented here describes such a model and its contributions to the field of scientific databases. In particular, the Granite model encompasses a wide variety of datatypes used across many disciplines of science and engineering today. It is unique in that it defines dataset geometry and topology as separate conceptual components of a scientific dataset. We provide a novel classification of geometries and topologies that has important practical implications for a scientific database implementation. The Granite model also offers integrated support for multiresolution and adaptive resolution data. Many of these ideas have been addressed by others, but no one has tried to bring them all together in a single comprehensive model. The datasource portion of the Granite model offers several further contributions. In addition to providing a convenient conceptual view of rectilinear data, it also supports multisource data. Data can be taken from various sources and combined into a unified view. The rod storage model is an abstraction for file storage that has proven an effective platform upon which to develop efficient access to storage. Our spatial prefetching technique is built upon the rod storage model, and demonstrates very significant improvement in access to scientific datasets, and also allows machines to access data that is far too large to fit in main memory. These improvements bring the extremely large datasets now being generated in many scientific fields into the realm of tractability for the ordinary researcher. We validated the feasibility and viability of the model by implementing a significant portion of it in the Granite system. Extensive performance evaluations of the implementation indicate that the features of the model can be provided in a user-friendly manner with an efficiency that is competitive with more ad hoc systems and more specialized application specific solutions

    Dynamic Chunking for Out-of-Core Volume Visualization Applications

    Get PDF

    Out-of-Core Wavefront Computations with Reduced Synchronization

    Get PDF
    International audienceMatrix computation algorithms often exhibit dependencies between neighboring elements inside loop nests such that the frontier between computed elements and those to be computed wanders in form of a 'wave' through the matrix. Macro-pipelining techniques can achieve an efficient parallelization of such algorithms by overlapping communication and computation. Usually these techniques are limited to situations where all the data to be processed fits into main memory, whereas for larger data the I/O usage pattern for external storage requires special attention. The work [CDS05] presented a first extension of the wavefront framework to these so-called out-of-core problems. The present paper proposes a redesign of their algorithm that minimizes both overhead and perturbations coming from communications. To tackle the issue of non-contiguous I/O, we also propose an optimized data layout. These two major modifications of the original algorithm eventually allow us to present a third improvement as our implementation shortens the transition phase between two consecutive iterations of the wavefront algorithm. Experiments performed with the parXXL library show that we can significantly reduce the time lost during inefficient I/O operations and thus obtain faster computations

    ArrayBridge: Interweaving declarative array processing with high-performance computing

    Full text link
    Scientists are increasingly turning to datacenter-scale computers to produce and analyze massive arrays. Despite decades of database research that extols the virtues of declarative query processing, scientists still write, debug and parallelize imperative HPC kernels even for the most mundane queries. This impedance mismatch has been partly attributed to the cumbersome data loading process; in response, the database community has proposed in situ mechanisms to access data in scientific file formats. Scientists, however, desire more than a passive access method that reads arrays from files. This paper describes ArrayBridge, a bi-directional array view mechanism for scientific file formats, that aims to make declarative array manipulations interoperable with imperative file-centric analyses. Our prototype implementation of ArrayBridge uses HDF5 as the underlying array storage library and seamlessly integrates into the SciDB open-source array database system. In addition to fast querying over external array objects, ArrayBridge produces arrays in the HDF5 file format just as easily as it can read from it. ArrayBridge also supports time travel queries from imperative kernels through the unmodified HDF5 API, and automatically deduplicates between array versions for space efficiency. Our extensive performance evaluation in NERSC, a large-scale scientific computing facility, shows that ArrayBridge exhibits statistically indistinguishable performance and I/O scalability to the native SciDB storage engine.Comment: 12 pages, 13 figure

    Performance Modeling and Prediction for Dense Linear Algebra

    Full text link
    This dissertation introduces measurement-based performance modeling and prediction techniques for dense linear algebra algorithms. As a core principle, these techniques avoid executions of such algorithms entirely, and instead predict their performance through runtime estimates for the underlying compute kernels. For a variety of operations, these predictions allow to quickly select the fastest algorithm configurations from available alternatives. We consider two scenarios that cover a wide range of computations: To predict the performance of blocked algorithms, we design algorithm-independent performance models for kernel operations that are generated automatically once per platform. For various matrix operations, instantaneous predictions based on such models both accurately identify the fastest algorithm, and select a near-optimal block size. For performance predictions of BLAS-based tensor contractions, we propose cache-aware micro-benchmarks that take advantage of the highly regular structure inherent to contraction algorithms. At merely a fraction of a contraction's runtime, predictions based on such micro-benchmarks identify the fastest combination of tensor traversal and compute kernel

    Gridfields: Model-Driven Data Transformation in the Physical Sciences

    Get PDF
    Scientists\u27 ability to generate and store simulation results is outpacing their ability to analyze them via ad hoc programs. We observe that these programs exhibit an algebraic structure that can be used to facilitate reasoning and improve performance. In this dissertation, we present a formal data model that exposes this algebraic structure, then implement the model, evaluate it, and use it to express, optimize, and reason about data transformations in a variety of scientific domains. Simulation results are defined over a logical grid structure that allows a continuous domain to be represented discretely in the computer. Existing approaches for manipulating these gridded datasets are incomplete. The performance of SQL queries that manipulate large numeric datasets is not competitive with that of specialized tools, and the up-front effort required to deploy a relational database makes them unpopular for dynamic scientific applications. Tools for processing multidimensional arrays can only capture regular, rectilinear grids. Visualization libraries accommodate arbitrary grids, but no algebra has been developed to simplify their use and afford optimization. Further, these libraries are data dependent—physical changes to data characteristics break user programs. We adopt the grid as a first-class citizen, separating topology from geometry and separating structure from data. Our model is agnostic with respect to dimension, uniformly capturing, for example, particle trajectories (1-D), sea-surface temperatures (2-D), and blood flow in the heart (3-D). Equipped with data, a grid becomes a gridfield. We provide operators for constructing, transforming, and aggregating gridfields that admit algebraic laws useful for optimization. We implement the model by analyzing several candidate data structures and incorporating their best features. We then show how to deploy gridfields in practice by injecting the model as middleware between heterogeneous, ad hoc file formats and a popular visualization library. In this dissertation, we define, develop, implement, evaluate and deploy a model of gridded datasets that accommodates a variety of complex grid structures and a variety of complex data products. We evaluate the applicability and performance of the model using datasets from oceanography, seismology, and medicine and conclude that our model-driven approach offers significant advantages over the status quo

    Polyhedral+Dataflow Graphs

    Get PDF
    This research presents an intermediate compiler representation that is designed for optimization, and emphasizes the temporary storage requirements and execution schedule of a given computation to guide optimization decisions. The representation is expressed as a dataflow graph that describes computational statements and data mappings within the polyhedral compilation model. The targeted applications include both the regular and irregular scientific domains. The intermediate representation can be integrated into existing compiler infrastructures. A specification language implemented as a domain specific language in C++ describes the graph components and the transformations that can be applied. The visual representation allows users to reason about optimizations. Graph variants can be translated into source code or other representation. The language, intermediate representation, and associated transformations have been applied to improve the performance of differential equation solvers, or sparse matrix operations, tensor decomposition, and structured multigrid methods
    corecore