60,217 research outputs found
On the selection of secondary indices in relational databases
An important problem in the physical design of databases is the selection of secondary indices. In general, this problem cannot be solved in an optimal way due to the complexity of the selection process. Often use is made of heuristics such as the well-known ADD and DROP algorithms. In this paper it will be shown that frequently used cost functions can be classified as super- or submodular functions. For these functions several mathematical properties have been derived which reduce the complexity of the index selection problem. These properties will be used to develop a tool for physical database design and also give a mathematical foundation for the success of the before-mentioned ADD and DROP algorithms
Interactive Data Exploration with Smart Drill-Down
We present {\em smart drill-down}, an operator for interactively exploring a
relational table to discover and summarize "interesting" groups of tuples. Each
group of tuples is described by a {\em rule}. For instance, the rule tells us that there are a thousand tuples with value in the
first column and in the second column (and any value in the third column).
Smart drill-down presents an analyst with a list of rules that together
describe interesting aspects of the table. The analyst can tailor the
definition of interesting, and can interactively apply smart drill-down on an
existing rule to explore that part of the table. We demonstrate that the
underlying optimization problems are {\sc NP-Hard}, and describe an algorithm
for finding the approximately optimal list of rules to display when the user
uses a smart drill-down, and a dynamic sampling scheme for efficiently
interacting with large tables. Finally, we perform experiments on real datasets
on our experimental prototype to demonstrate the usefulness of smart drill-down
and study the performance of our algorithms
QB2OLAP : enabling OLAP on statistical linked open data
Publication and sharing of multidimensional (MD) data on the Semantic Web (SW) opens new opportunities for the use of On-Line Analytical Processing (OLAP). The RDF Data Cube (QB) vocabulary, the current standard for statistical data publishing, however, lacks key MD concepts such as dimension hierarchies and aggregate functions. QB4OLAP was proposed to remedy this. However, QB4OLAP requires extensive manual annotation and users must still write queries in SPARQL, the standard query language for RDF, which typical OLAP users are not familiar with. In this demo, we present QB2OLAP, a tool for enabling OLAP on existing QB data. Without requiring any RDF, QB(4OLAP), or SPARQL skills, it allows semi-automatic transformation of a QB data set into a QB4OLAP one via enrichment with QB4OLAP semantics, exploration of the enriched schema, and querying with the high-level OLAP language QL that exploits the QB4OLAP semantics and is automatically translated to SPARQL.Peer ReviewedPostprint (author's final draft
Rapid Sampling for Visualizations with Ordering Guarantees
Visualizations are frequently used as a means to understand trends and gather
insights from datasets, but often take a long time to generate. In this paper,
we focus on the problem of rapidly generating approximate visualizations while
preserving crucial visual proper- ties of interest to analysts. Our primary
focus will be on sampling algorithms that preserve the visual property of
ordering; our techniques will also apply to some other visual properties. For
instance, our algorithms can be used to generate an approximate visualization
of a bar chart very rapidly, where the comparisons between any two bars are
correct. We formally show that our sampling algorithms are generally applicable
and provably optimal in theory, in that they do not take more samples than
necessary to generate the visualizations with ordering guarantees. They also
work well in practice, correctly ordering output groups while taking orders of
magnitude fewer samples and much less time than conventional sampling schemes.Comment: Tech Report. 17 pages. Condensed version to appear in VLDB Vol. 8 No.
Historical forest biomass dynamics modelled with Landsat spectral trajectories
Acknowledgements National Forest Inventory data are available online, provided by Ministerio de Agricultura, Alimentación y Medio Ambiente (España). Landsat images are available online, provided by the USGS.Peer reviewedPostprin
- …