404 research outputs found
Partial-sum queries in OLAP data cubes using covering codes
A partial-sum query obtains the summation over a set of specified cells of a data cube. We establish a connection between the covering problem in the theory of error-correcting codes and the partial-sum problem and use this connection to devise algorithms for the partial-sum problem with efficient space-time trade-offs. For example, using our algorithms, with 44 percent additional storage, the query response time can be improved by about 12 percent; by roughly doubling the storage requirement, the query response time can be improved by about 34 percent
Caching in Multidimensional Databases
One utilisation of multidimensional databases is the field of On-line
Analytical Processing (OLAP). The applications in this area are designed to
make the analysis of shared multidimensional information fast [9]. On one hand,
speed can be achieved by specially devised data structures and algorithms. On
the other hand, the analytical process is cyclic. In other words, the user of
the OLAP application runs his or her queries one after the other. The output of
the last query may be there (at least partly) in one of the previous results.
Therefore caching also plays an important role in the operation of these
systems. However, caching itself may not be enough to ensure acceptable
performance. Size does matter: The more memory is available, the more we gain
by loading and keeping information in there. Oftentimes, the cache size is
fixed. This limits the performance of the multidimensional database, as well,
unless we compress the data in order to move a greater proportion of them into
the memory. Caching combined with proper compression methods promise further
performance improvements. In this paper, we investigate how caching influences
the speed of OLAP systems. Different physical representations (multidimensional
and table) are evaluated. For the thorough comparison, models are proposed. We
draw conclusions based on these models, and the conclusions are verified with
empirical data.Comment: 14 pages, 5 figures, 8 tables. Paper presented at the Fifth
Conference of PhD Students in Computer Science, Szeged, Hungary, 27 - 30 June
2006. For further details, please refer to
http://www.inf.u-szeged.hu/~szepkuti/papers.html#cachin
Data Cube Approximation and Mining using Probabilistic Modeling
On-line Analytical Processing (OLAP) techniques commonly used in data warehouses allow the exploration of data cubes according to different analysis axes (dimensions) and under different abstraction levels in a dimension hierarchy. However, such techniques are not aimed at mining multidimensional data.
Since data cubes are nothing but multi-way tables, we propose to analyze the potential of two probabilistic modeling techniques, namely non-negative multi-way array factorization and log-linear modeling, with the ultimate objective of compressing and mining aggregate and multidimensional values. With the first technique, we compute the set of components that best fit the initial data set and whose superposition coincides with the original data; with the second technique we identify a parsimonious model (i.e., one with a reduced set of parameters), highlight strong associations among dimensions and discover possible outliers in data cells. A real life example will be
used to (i) discuss the potential benefits of the modeling output on cube exploration and mining, (ii) show how OLAP queries can be answered in an approximate way, and (iii) illustrate the strengths and limitations of these modeling approaches
Using Fuzzy Linguistic Representations to Provide Explanatory Semantics for Data Warehouses
A data warehouse integrates large amounts of extracted and summarized data from multiple sources for direct querying and analysis. While it provides decision makers with easy access to such historical and aggregate data, the real meaning of the data has been ignored. For example, "whether a total sales amount 1,000 items indicates a good or bad sales performance" is still unclear. From the decision makers' point of view, the semantics rather than raw numbers which convey the meaning of the data is very important. In this paper, we explore the use of fuzzy technology to provide this semantics for the summarizations and aggregates developed in data warehousing systems. A three layered data warehouse semantic model, consisting of quantitative (numerical) summarization, qualitative (categorical) summarization, and quantifier summarization, is proposed for capturing and explicating the semantics of warehoused data. Based on the model, several algebraic operators are defined. We also extend the SQL language to allow for flexible queries against such enhanced data warehouses
- …