36 research outputs found

    CubiST: A New Algorithm for Improving the Performance of Ad-hoc OLAP Queries

    Get PDF
    Being able to efficiently answer arbitrary OLAP queries that aggregate along any combination of dimensions over numerical and categorical attributes has been a continued, major concern in data warehousing. In this paper, we introduce a new data structure, called Statistics Tree (ST), together with an efficient algorithm called CubiST, for evaluating ad-hoc OLAP queries on top of a relational data warehouse. We are focusing on a class of queries called cube queries, which generalize the data cube operator. CubiST represents a drastic departure from existing relational (ROLAP) and multi-dimensional (MOLAP) approaches in that it does not use the familiar view lattice to compute and materialize new views from existing views in some heuristic fashion. CubiST is the first OLAP algorithm that needs only one scan over the detailed data set and can efficiently answer any cube query without additional I/O when the ST fits into memory. We have implemented CubiST and our experiments have demonstrated significant improvements in performance and scalability over existing ROLAP/MOLAP approaches

    CubiST++: Evaluating Ad-Hoc CUBE Queries Using Statistics Trees

    Get PDF
    We report on a new, efficient encoding for the data cube, which results in a drastic speed-up of OLAP queries that aggregate along any combination of dimensions over numerical and categorical attributes. We are focusing on a class of queries called cube queries, which return aggregated values rather than sets of tuples. Our approach, termed CubiST++ (Cubing with Statistics Trees Plus Families), represents a drastic departure from existing relational (ROLAP) and multi-dimensional (MOLAP) approaches in that it does not use the view lattice to compute and materialize new views from existing views in some heuristic fashion. Instead, CubiST++ encodes all possible aggregate views in the leaves of a new data structure called statistics tree (ST) during a one-time scan of the detailed data. In order to optimize the queries involving constraints on hierarchy levels of the underlying dimensions, we select and materialize a family of candidate trees, which represent superviews over the different hierarchical levels of the dimensions. Given a query, our query evaluation algorithm selects the smallest tree in the family, which can provide the answer. Extensive evaluations of our prototype implementation have demonstrated its superior run-time performance and scalability when compared with existing MOLAP and ROLAP systems

    One size does not fit all : accelerating OLAP workloads with GPUs

    Get PDF
    GPU has been considered as one of the next-generation platforms for real-time query processing databases. In this paper we empirically demonstrate that the representative GPU databases [e.g., OmniSci (Open Source Analytical Database & SQL Engine,, 2019)] may be slower than the representative in-memory databases [e.g., Hyper (Neumann and Leis, IEEE Data Eng Bull 37(1):3-11, 2014)] with typical OLAP workloads (with Star Schema Benchmark) even if the actual dataset size of each query can completely fit in GPU memory. Therefore, we argue that GPU database designs should not be one-size-fits-all; a general-purpose GPU database engine may not be well-suited for OLAP workloads without careful designed GPU memory assignment and GPU computing locality. In order to achieve better performance for GPU OLAP, we need to re-organize OLAP operators and re-optimize OLAP model. In particular, we propose the 3-layer OLAP model to match the heterogeneous computing platforms. The core idea is to maximize data and computing locality to specified hardware. We design the vector grouping algorithm for data-intensive workload which is proved to be assigned to CPU platform adaptive. We design the TOP-DOWN query plan tree strategy to guarantee the optimal operation in final stage and pushing the respective optimizations to the lower layers to make global optimization gains. With this strategy, we design the 3-stage processing model (OLAP acceleration engine) for hybrid CPU-GPU platform, where the computing-intensive star-join stage is accelerated by GPU, and the data-intensive grouping & aggregation stage is accelerated by CPU. This design maximizes the locality of different workloads and simplifies the GPU acceleration implementation. Our experimental results show that with vector grouping and GPU accelerated star-join implementation, the OLAP acceleration engine runs 1.9x, 3.05x and 3.92x faster than Hyper, OmniSci GPU and OmniSci CPU in SSB evaluation with dataset of SF = 100.Peer reviewe

    Segment Oriented Compression Scheme for MOLAP Based on Extendible Multidimensional Arrays

    Get PDF
    Many statistical and MOLAP applications use multidimensional arrays as the basic data structure to allow the efficient and convenient storage and retrieval of large volumes of business data for decision making. Allocation of data or data compression is a key performance factor for this purpose because performance strongly depends on the amount of storage required and availability of memory. This holds especially for data warehousing environments in which huge amounts of data have to be dealt with. The most evident consequence of data compression is that it reduces storage cost by packing more logical data per unit of physical capacity. And improved performance is a net outcome because less physical data need to be retrieved during scan-oriented queries. In this paper, an efficient data compression technique is proposed based on the notion of extendible array. The main idea of the scheme is to compress each of the segments of the extendible array using the position information only. We compare the proposed scheme for different performance issues with prominent compression schemes.</p

    Efficient Evaluation of Sparse Data Cubes

    Get PDF
    Computing data cubes requires the aggregation of measures over arbitrary combinations of dimensions in a data set. Efficient data cube evaluation remains challenging because of the potentially very large sizes of input datasets (e.g., in the data warehousing context), the well-known curse of dimensionality, and the complexity of queries that need to be supported. This paper proposes a new dynamic data structure called SST (Sparse Statistics Trees) and a novel, in-teractive, and fast cube evaluation algorithm called CUPS (Cubing by Pruning SST), which is especially well suitable for computing aggregates in cubes whose data sets are sparse. SST only stores the aggregations of non-empty cube cells instead of the detailed records. Furthermore, it retains in memory the dense cubes (a.k.a. iceberg cubes) whose aggregate values are above a threshold. Sparse cubes are stored on disks. This allows a fast, accurate approximation for queries. If users desire more refined answers, related sparse cubes are aggregated. SST is incrementally maintainable, which makes CUPS suitable for data warehousing and analysis of streaming data. Experiment results demonstrate the excellent performance and good scalability of our approach

    Fusion OLAP : Fusing the Pros of MOLAP and ROLAP Together for In-memory OLAP

    Get PDF
    OLAP models can be categorized with two types: MOLAP (multidimensional OLAP) and ROLAP (relational OLAP). In particular, MOLAP is efficient in multidimensional computing at the cost of cube maintenance, while ROLAP reduces the data storage size at the cost of expensive multidimensional join operations. In this paper, we propose a novel Fusion OLAP model to fuse the multidimensional computing model and relational storage model together to make the best aspects of both MOLAP and ROLAP worlds. This is achieved by mapping the relation tables into virtual multidimensional model and binding the multidimensional operations into a set of vector indexes to enable multidimensional computing on relation tables. The Fusion OLAP model can be integrated into the state-of-the-art in-memory databases with additional surrogate key indexes and vector indexes. We compared the Fusion OLAP implementations with three leading analytical in-memory databases. Our comprehensive experimental results show that Fusion OLAP implementation can achieve up to 35, 365, and 169 percent performance improvements based on the Hyper, Vectorwise, and MonetDB databases, respectively, for the Star Schema Benchmark (SSB) with scale factor 100.Peer reviewe

    Business Intelligence for Small and Middle-Sized Entreprises

    Full text link
    Data warehouses are the core of decision support sys- tems, which nowadays are used by all kind of enter- prises in the entire world. Although many studies have been conducted on the need of decision support systems (DSSs) for small businesses, most of them adopt ex- isting solutions and approaches, which are appropriate for large-scaled enterprises, but are inadequate for small and middle-sized enterprises. Small enterprises require cheap, lightweight architec- tures and tools (hardware and software) providing on- line data analysis. In order to ensure these features, we review web-based business intelligence approaches. For real-time analysis, the traditional OLAP architecture is cumbersome and storage-costly; therefore, we also re- view in-memory processing. Consequently, this paper discusses the existing approa- ches and tools working in main memory and/or with web interfaces (including freeware tools), relevant for small and middle-sized enterprises in decision making

    Query Optimization and Execution for Multi-Dimensional OLAP

    Get PDF
    Online Analytical Processing (OLAP) is a database paradigm that supports the rich analysis of multi-dimensional data. While current OLAP tools are primarily constructed as extensions to conventional relational databases, the unique modeling and processing requirements of OLAP systems often make for a relatively awkward fit with RDBM systems in general, and their embedded string-based query languages in particular. In this thesis, we discuss the design, implementation, and evaluation of a robust multi-dimensional OLAP server. In fact, we focus on several distinct but related themes. To begin, we investigate the integration of an open source embedded storage engine with our own OLAP-specific indexing and access methods. We then present a comprehensive OLAP query algebra that ultimately allows developers to create expressive OLAP queries in native client languages such as Java. By utilizing a formal algebraic model, we are able to support an intuitive Object Oriented query API, as well as a powerful query optimization and execution engine. The thesis describes both the optimization methodology and the related algorithms for the efficient execution of the associated query plans. The end result of our research is a comprehensive OLAP DBMS prototype that clearly demonstrates new opportunities for improving the accessibility, functionality, and performance of current OLAP database management systems

    Sparse array representations and some selected array operations on GPUs

    Get PDF
    A dissertation submitted to the Faculty of Science, University of the Witwatersrand, Johannesburg, in fulfilment of the requirements for the degree of Master of Science. Johannesburg, 2014.A multi-dimensional data model provides a good conceptual view of the data in data warehousing and On-Line Analytical Processing (OLAP). A typical representation of such a data model is as a multi-dimensional array which is well suited when the array is dense. If the array is sparse, i.e., has a few number of non-zero elements relative to the product of the cardinalities of the dimensions, using a multi-dimensional array to represent the data set requires extremely large memory space while the actual data elements occupy a relatively small fraction of the space. Existing storage schemes for Multi-Dimensional Sparse Arrays (MDSAs) of higher dimensions k (k > 2), focus on optimizing the storage utilization, and offer little flexibility in data access efficiency. Most efficient storage schemes for sparse arrays are limited to matrices that are arrays in 2 dimensions. In this dissertation, we introduce four storage schemes for MDSAs that handle the sparsity of the array with two primary goals; reducing the storage overhead and maintaining efficient data element access. These schemes, including a well known method referred to as the Bit Encoded Sparse Storage (BESS), were evaluated and compared on four basic array operations, namely construction of a scheme, large scale random element access, sub-array retrieval and multi-dimensional aggregation. The four storage schemes being proposed, together with the evaluation results are: i.) The extended compressed row storage (xCRS) which extends CRS method for sparse matrix storage to sparse arrays of higher dimensions and achieves the best data element access efficiency among the methods compared; ii.) The bit encoded xCRS (BxCRS) which optimizes the storage utilization of xCRS by applying data compression methods with run length encoding, while maintaining its data access efficiency; iii.) A hybrid approach (Hybrid) that provides the best control of the balance between the storage utilization and data manipulation efficiency by combining xCRS and BESS. iv.) The PATRICIA trie compressed storage (PTCS) which uses PATRICIA trie to store the valid non-zero array elements. PTCS supports efficient data access, and has a unique property of supporting update operations conveniently. v.) BESS performs the best for the multi-dimensional aggregation, closely followed by the other schemes. We also addressed the problem of accelerating some selected array operations using General Purpose Computing on Graphics Processing Unit (GPGPU). The experimental results showed different levels of speed up, ranging from 2 to over 20 times, on large scale random element access and sub-array retrieval. In particular, we utilized GPUs on the computation of the cube operator, a special case of multi-dimensional aggregation, using BESS. This resulted in a 5 to 8 times of speed up compared with our CPU only implementation. The main contributions of this dissertation include the developments, implementations and evaluations of four efficient schemes to store multi-dimensional sparse arrays, as well as utilizing massive parallelism of GPUs for some data warehousing operations
    corecore