16 research outputs found
Diamond Dicing
In OLAP, analysts often select an interesting sample of the data. For
example, an analyst might focus on products bringing revenues of at least 100
000 dollars, or on shops having sales greater than 400 000 dollars. However,
current systems do not allow the application of both of these thresholds
simultaneously, selecting products and shops satisfying both thresholds. For
such purposes, we introduce the diamond cube operator, filling a gap among
existing data warehouse operations.
Because of the interaction between dimensions the computation of diamond
cubes is challenging. We compare and test various algorithms on large data sets
of more than 100 million facts. We find that while it is possible to implement
diamonds in SQL, it is inefficient. Indeed, our custom implementation can be a
hundred times faster than popular database engines (including a row-store and a
column-store).Comment: 29 page
Pruning Attributes From Data Cubes with Diamond Dicing
Data stored in a data warehouse are inherently multidimensional, but most data-pruning techniques (such as iceberg and top-k queries) are unidimensional. However, analysts need to issue multidimensional queries. For example, an analyst may need to select not just the most profitable stores or--separately--the most profitable products, but simultaneous sets of stores and products fulfilling some profitability constraints. To fill this need, we propose a new operator, the diamond dice. Because of the interaction between dimensions, the computation of diamonds is challenging. We present the first diamond-dicing experiments on large data sets. Experiments show that we can compute diamond cubes over fact tables containing 100 million facts in less than 35 minutes using a standard PC
Compressed bitmap indexes: beyond unions and intersections
Compressed bitmap indexes are used to speed up simple aggregate queries in databases. Indeed, set operations like intersections, unions and complements can be represented as logical operations (AND,OR,NOT) that are ideally suited for bitmaps. However, it is less obvious how to apply bitmaps to more advanced queries. For example, we might seek products in a store that meet some, but maybe not all, criteria. Such threshold queries generalize intersections and unions; they are often used in information-retrieval and data-mining applications. We introduce new algorithms that are sometimes three orders of magnitude faster than a naive approach. Our work shows that bitmap indexes are more broadly applicable than is commonly believed
Reordering Columns for Smaller Indexes
Column-oriented indexes-such as projection or bitmap indexes-are compressed
by run-length encoding to reduce storage and increase speed. Sorting the tables
improves compression. On realistic data sets, permuting the columns in the
right order before sorting can reduce the number of runs by a factor of two or
more. Unfortunately, determining the best column order is NP-hard. For many
cases, we prove that the number of runs in table columns is minimized if we
sort columns by increasing cardinality. Experimentally, sorting based on
Hilbert space-filling curves is poor at minimizing the number of runs.Comment: to appear in Information Science
Ontology based data warehousing for mining of heterogeneous and multidimensional data sources
Heterogeneous and multidimensional big-data sources are virtually prevalent in all business environments. System and data analysts are unable to fast-track and access big-data sources. A robust and versatile data warehousing system is developed, integrating domain ontologies from multidimensional data sources. For example, petroleum digital ecosystems and digital oil field solutions, derived from big-data petroleum (information) systems, are in increasing demand in multibillion dollar resource businesses worldwide. This work is recognized by Industrial Electronic Society of IEEE and appeared in more than 50 international conference proceedings and journals
IDEAS-1997-2021-Final-Programs
This document records the final program for each of the 26 meetings of the International Database and Engineering Application Symposium from 1997 through 2021. These meetings were organized in various locations on three continents. Most of the papers published during these years are in the digital libraries of IEEE(1997-2007) or ACM(2008-2021)