Search CORE

30 research outputs found

Efficient Evaluation of Sparse Data Cubes

Author: Fu Lixin
NC DOCKS at The University of North Carolina at Greensboro
Publication venue
Publication date: 01/01/2004
Field of study

Computing data cubes requires the aggregation of measures over arbitrary combinations of dimensions in a data set. Efficient data cube evaluation remains challenging because of the potentially very large sizes of input datasets (e.g., in the data warehousing context), the well-known curse of dimensionality, and the complexity of queries that need to be supported. This paper proposes a new dynamic data structure called SST (Sparse Statistics Trees) and a novel, in-teractive, and fast cube evaluation algorithm called CUPS (Cubing by Pruning SST), which is especially well suitable for computing aggregates in cubes whose data sets are sparse. SST only stores the aggregations of non-empty cube cells instead of the detailed records. Furthermore, it retains in memory the dense cubes (a.k.a. iceberg cubes) whose aggregate values are above a threshold. Sparse cubes are stored on disks. This allows a fast, accurate approximation for queries. If users desire more refined answers, related sparse cubes are aggregated. SST is incrementally maintainable, which makes CUPS suitable for data warehousing and analysis of streaming data. Experiment results demonstrate the excellent performance and good scalability of our approach

The University of North Carolina at Greensboro

CubiST: A New Algorithm for Improving the Performance of Ad-hoc OLAP Queries

Author: Fu Lixin
NC DOCKS at The University of North Carolina at Greensboro
Publication venue
Publication date: 01/01/2000
Field of study

Being able to efficiently answer arbitrary OLAP queries that aggregate along any combination of dimensions over numerical and categorical attributes has been a continued, major concern in data warehousing. In this paper, we introduce a new data structure, called Statistics Tree (ST), together with an efficient algorithm called CubiST, for evaluating ad-hoc OLAP queries on top of a relational data warehouse. We are focusing on a class of queries called cube queries, which generalize the data cube operator. CubiST represents a drastic departure from existing relational (ROLAP) and multi-dimensional (MOLAP) approaches in that it does not use the familiar view lattice to compute and materialize new views from existing views in some heuristic fashion. CubiST is the first OLAP algorithm that needs only one scan over the detailed data set and can efficiently answer any cube query without additional I/O when the ST fits into memory. We have implemented CubiST and our experiments have demonstrated significant improvements in performance and scalability over existing ROLAP/MOLAP approaches

The University of North Carolina at Greensboro

Efficient Evaluation of Sparse Data Cubes

Author: J. Gray
S. Chaudhuri
S. Goil
T. Johnson
V. Harinarayan
Y. Zhao
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2004
Field of study

available at www.springerlink.com ***Note: Figures may be missing from this format of the document Computing data cubes requires the aggregation of measures over arbitrary combinations of dimensions in a data set. Efficient data cube evaluation remains challenging because of the potentially very large sizes of input datasets (e.g., in the data warehousing context), the well-known curse of dimensionality, and the complexity of queries that need to be supported. This paper proposes a new dynamic data structure called SST (Sparse Statistics Trees) and a novel, in-teractive, and fast cube evaluation algorithm called CUPS (Cubing by Pruning SST), which is especially well suitable for computing aggregates in cubes whose data sets are sparse. SST only stores the aggregations of non-empty cube cells instead of the detailed records. Furthermore, it retains in memory the dense cubes (a.k.a. iceberg cubes) whose aggregate values are above a threshold. Sparse cubes are stored on disks. This allows a fast, accurate approximation for queries. If users desire more refined answers, related sparse cubes are aggregated. SST is incrementally maintainable, which makes CUPS suitable for data warehousing and analysis of streaming data. Experiment results demonstrate the excellent performance and good scalability of our approach. Article

CiteSeerX

Crossref

Dynamic Programming: The Next Step

Author: Eich Marius
Moerkotte Guido
Publication venue
Publication date: 01/01/2014
Field of study

Since 2013, dynamic programming (DP)-based plan generators are capable of correctly reordering not only inner joins, but also outer joins. Now, we consider the next big step: reordering not only joins, but also joins and grouping. Since only reorderings of grouping with inner joins are known, we first develop equivalences which allow reordering of grouping with outer joins. Then, we show how to extend a state-of-the-art DP-based plan generator to fully explore these new plan alternatives

MAnnheim DOCument Server

CubiST++: Evaluating Ad-Hoc CUBE Queries Using Statistics Trees

Author: Fu Lixin
NC DOCKS at The University of North Carolina at Greensboro
Publication venue
Publication date: 01/01/2003
Field of study

We report on a new, efficient encoding for the data cube, which results in a drastic speed-up of OLAP queries that aggregate along any combination of dimensions over numerical and categorical attributes. We are focusing on a class of queries called cube queries, which return aggregated values rather than sets of tuples. Our approach, termed CubiST++ (Cubing with Statistics Trees Plus Families), represents a drastic departure from existing relational (ROLAP) and multi-dimensional (MOLAP) approaches in that it does not use the view lattice to compute and materialize new views from existing views in some heuristic fashion. Instead, CubiST++ encodes all possible aggregate views in the leaves of a new data structure called statistics tree (ST) during a one-time scan of the detailed data. In order to optimize the queries involving constraints on hierarchy levels of the underlying dimensions, we select and materialize a family of candidate trees, which represent superviews over the different hierarchical levels of the dimensions. Given a query, our query evaluation algorithm selects the smallest tree in the family, which can provide the answer. Extensive evaluations of our prototype implementation have demonstrated its superior run-time performance and scalability when compared with existing MOLAP and ROLAP systems

The University of North Carolina at Greensboro