2,204 research outputs found
QB2OLAP : enabling OLAP on statistical linked open data
Publication and sharing of multidimensional (MD) data on the Semantic Web (SW) opens new opportunities for the use of On-Line Analytical Processing (OLAP). The RDF Data Cube (QB) vocabulary, the current standard for statistical data publishing, however, lacks key MD concepts such as dimension hierarchies and aggregate functions. QB4OLAP was proposed to remedy this. However, QB4OLAP requires extensive manual annotation and users must still write queries in SPARQL, the standard query language for RDF, which typical OLAP users are not familiar with. In this demo, we present QB2OLAP, a tool for enabling OLAP on existing QB data. Without requiring any RDF, QB(4OLAP), or SPARQL skills, it allows semi-automatic transformation of a QB data set into a QB4OLAP one via enrichment with QB4OLAP semantics, exploration of the enriched schema, and querying with the high-level OLAP language QL that exploits the QB4OLAP semantics and is automatically translated to SPARQL.Peer ReviewedPostprint (author's final draft
Caching in Multidimensional Databases
One utilisation of multidimensional databases is the field of On-line
Analytical Processing (OLAP). The applications in this area are designed to
make the analysis of shared multidimensional information fast [9]. On one hand,
speed can be achieved by specially devised data structures and algorithms. On
the other hand, the analytical process is cyclic. In other words, the user of
the OLAP application runs his or her queries one after the other. The output of
the last query may be there (at least partly) in one of the previous results.
Therefore caching also plays an important role in the operation of these
systems. However, caching itself may not be enough to ensure acceptable
performance. Size does matter: The more memory is available, the more we gain
by loading and keeping information in there. Oftentimes, the cache size is
fixed. This limits the performance of the multidimensional database, as well,
unless we compress the data in order to move a greater proportion of them into
the memory. Caching combined with proper compression methods promise further
performance improvements. In this paper, we investigate how caching influences
the speed of OLAP systems. Different physical representations (multidimensional
and table) are evaluated. For the thorough comparison, models are proposed. We
draw conclusions based on these models, and the conclusions are verified with
empirical data.Comment: 14 pages, 5 figures, 8 tables. Paper presented at the Fifth
Conference of PhD Students in Computer Science, Szeged, Hungary, 27 - 30 June
2006. For further details, please refer to
http://www.inf.u-szeged.hu/~szepkuti/papers.html#cachin
On-line analytical processing
On-line analytical processing (OLAP) describes an approach to decision support, which aims to extract knowledge from a data warehouse, or more specifically, from data marts. Its main idea is providing navigation through data to non-expert users, so that they are able to interactively generate ad hoc queries without the intervention of IT professionals. This name was introduced in contrast to on-line transactional processing (OLTP), so that it reflected the different requirements and characteristics between these classes of uses. The concept falls in the area of business intelligence.Peer ReviewedPostprint (author's final draft
Scalable Model-Based Management of Correlated Dimensional Time Series in ModelarDB+
To monitor critical infrastructure, high quality sensors sampled at a high
frequency are increasingly used. However, as they produce huge amounts of data,
only simple aggregates are stored. This removes outliers and fluctuations that
could indicate problems. As a remedy, we present a model-based approach for
managing time series with dimensions that exploits correlation in and among
time series. Specifically, we propose compressing groups of correlated time
series using an extensible set of model types within a user-defined error bound
(possibly zero). We name this new category of model-based compression methods
for time series Multi-Model Group Compression (MMGC). We present the first MMGC
method GOLEMM and extend model types to compress time series groups. We propose
primitives for users to effectively define groups for differently sized data
sets, and based on these, an automated grouping method using only the time
series dimensions. We propose algorithms for executing simple and
multi-dimensional aggregate queries on models. Last, we implement our methods
in the Time Series Management System (TSMS) ModelarDB (ModelarDB+). Our
evaluation shows that compared to widely used formats, ModelarDB+ provides up
to 13.7 times faster ingestion due to high compression, 113 times better
compression due to the adaptivity of GOLEMM, 630 times faster aggregates by
using models, and close to linear scalability. It is also extensible and
supports online query processing.Comment: 12 Pages, 28 Figures, and 1 Tabl
- …