4 research outputs found

    Efficient Iterative Processing in the SciDB Parallel Array Engine

    Full text link
    Many scientific data-intensive applications perform iterative computations on array data. There exist multiple engines specialized for array processing. These engines efficiently support various types of operations, but none includes native support for iterative processing. In this paper, we develop a model for iterative array computations and a series of optimizations. We evaluate the benefits of an optimized, native support for iterative array processing on the SciDB engine and real workloads from the astronomy domain

    Incremental aggregation on MOLAP cube based on n-dimensional extendible karnaugh arrays

    Get PDF
    Data is increasing so rapidly that new data warehousing approaches are required to process and analyze data. Aggregation of data incrementally is needed to fast access of data and compute aggregation functions. Multidimensional arrays are generally used for this purpose. But some disadvantages such as address space requirement is large and processing time is comparatively slow in case of aggregation. For this purpose we use Extendible Karnaugh Array (EKA). EKA is an efficient scheme which has better performance than other data structures that we have tested in our research. In this research work we use EKA as basic structure for implementing incremental aggregation of data and evaluate its performance over other approaches. We use Multidimensional Online Analytical Processing (MOLAP) which stores data in optimized multi-dimensional array storage, rather than in a relational database. We create 4 and 6 dimensional MOLAP data cube using Traditional Multidimensional Array (TMA) and EKA scheme and compare incremental aggregation with Relational Online Analytical Processing (ROLAP). The effective outcome of EKA structure for incremental aggregation on 4 and 6 dimensional MOLAP structure is shown by some experimental results and efficiency is proved for n higher dimensions

    Optimal caching of large multi-dimensional datasets

    Get PDF
    We propose a novel organization for multi-dimensional data based on the conceptof macro-voxels. This organization improves computer performance by enhancingspatial and temporal locality. Caching of macro-voxels not only reduces therequired storage space but also leads to an efficient organization of the dataset resulting in faster data access. We have developed a macro-voxel caching theory that predicts the optimal macro-voxel sizes required for minimum cache size and access time. The model also identifies a region of trade-off between time and storage, which can be exploited in making an efficient choice of macro-voxel size for this scheme. Based on the macro-voxel caching model, we have implemented a macro-voxel I/O layer in C, intended to be used as an interface between applications and datasets. It is capable of both scattered access, typical in online applications, and row/column access, typical in batched applications. We integrated this I/O layer in the ALIGN program (online application) which aligns images based on 3D distance maps; this improved access time by a factor of 3 when accessing local disks and a factor of 20 for remote disks. We also applied the macro-voxel caching scheme on SPEC.s Seismic (batched application) benchmark datasets which improved the read process by a factor of 8.Ph.D., Electrical and Computer Engineering -- Drexel University, 200
    corecore