4,164 research outputs found
The Bag Semantics of Ontology-Based Data Access
Ontology-based data access (OBDA) is a popular approach for integrating and
querying multiple data sources by means of a shared ontology. The ontology is
linked to the sources using mappings, which assign views over the data to
ontology predicates. Motivated by the need for OBDA systems supporting
database-style aggregate queries, we propose a bag semantics for OBDA, where
duplicate tuples in the views defined by the mappings are retained, as is the
case in standard databases. We show that bag semantics makes conjunctive query
answering in OBDA coNP-hard in data complexity. To regain tractability, we
consider a rather general class of queries and show its rewritability to a
generalisation of the relational calculus to bags
Certain Answers of Extensions of Conjunctive Queries by Datalog and First-Order Rewriting
International audienc
Get the Most out of Your Sample: Optimal Unbiased Estimators using Partial Information
Random sampling is an essential tool in the processing and transmission of
data. It is used to summarize data too large to store or manipulate and meet
resource constraints on bandwidth or battery power. Estimators that are applied
to the sample facilitate fast approximate processing of queries posed over the
original data and the value of the sample hinges on the quality of these
estimators.
Our work targets data sets such as request and traffic logs and sensor
measurements, where data is repeatedly collected over multiple {\em instances}:
time periods, locations, or snapshots.
We are interested in queries that span multiple instances, such as distinct
counts and distance measures over selected records. These queries are used for
applications ranging from planning to anomaly and change detection.
Unbiased low-variance estimators are particularly effective as the relative
error decreases with the number of selected record keys.
The Horvitz-Thompson estimator, known to minimize variance for sampling with
"all or nothing" outcomes (which reveals exacts value or no information on
estimated quantity), is not optimal for multi-instance operations for which an
outcome may provide partial information.
We present a general principled methodology for the derivation of (Pareto)
optimal unbiased estimators over sampled instances and aim to understand its
potential. We demonstrate significant improvement in estimate accuracy of
fundamental queries for common sampling schemes.Comment: This is a full version of a PODS 2011 pape
- …