77,872 research outputs found
PF-OLA: A High-Performance Framework for Parallel On-Line Aggregation
Online aggregation provides estimates to the final result of a computation
during the actual processing. The user can stop the computation as soon as the
estimate is accurate enough, typically early in the execution. This allows for
the interactive data exploration of the largest datasets. In this paper we
introduce the first framework for parallel online aggregation in which the
estimation virtually does not incur any overhead on top of the actual
execution. We define a generic interface to express any estimation model that
abstracts completely the execution details. We design a novel estimator
specifically targeted at parallel online aggregation. When executed by the
framework over a massive TPC-H instance, the estimator provides
accurate confidence bounds early in the execution even when the cardinality of
the final result is seven orders of magnitude smaller than the dataset size and
without incurring overhead.Comment: 36 page
XML Document Adaptation Queries (XDAQ)
Adaptive web applications combine data retrieval on the web with reasoning so as to generate context dependent contents. The data is retrieved either as content or as context specifications. Content data is, for example, fragments of a textbook or e-commerce catalogue, whereas context data is, for example, a user model or a device profile. Current adaptive web applications are often implemented using ad hoc and heterogeneous techniques. This paper describes a novel approach called ”XML Document Adaptation Queries (XDAQ)” requiring less heterogeneous software components. The approach is based on using a web query language for data retrieval (content as well as context) and on a novel generic formalism to express adaptation. The approach is generic in the sense that it is applicable with all web query and transformation languages, for example with XQuery and XSLT
On Reasoning with RDF Statements about Statements using Singleton Property Triples
The Singleton Property (SP) approach has been proposed for representing and
querying metadata about RDF triples such as provenance, time, location, and
evidence. In this approach, one singleton property is created to uniquely
represent a relationship in a particular context, and in general, generates a
large property hierarchy in the schema. It has become the subject of important
questions from Semantic Web practitioners. Can an existing reasoner recognize
the singleton property triples? And how? If the singleton property triples
describe a data triple, then how can a reasoner infer this data triple from the
singleton property triples? Or would the large property hierarchy affect the
reasoners in some way? We address these questions in this paper and present our
study about the reasoning aspects of the singleton properties. We propose a
simple mechanism to enable existing reasoners to recognize the singleton
property triples, as well as to infer the data triples described by the
singleton property triples. We evaluate the effect of the singleton property
triples in the reasoning processes by comparing the performance on RDF datasets
with and without singleton properties. Our evaluation uses as benchmark the
LUBM datasets and the LUBM-SP datasets derived from LUBM with temporal
information added through singleton properties
Graphing of E-Science Data with varying user requirements
Based on our experience in the Swiss Experiment, exploring experimental, scientific data is often done in a visual way. Starting from a global overview the users are zooming in on interesting events. In case of huge data volumes special data structures have to be introduced to provide fast and easy access to the data. Since it is hard to predict on how users will work with the data a generic approach requires self-adaptation of the required special data structures. In this paper we describe the underlying NP-hard problem and present several approaches to address the problem with varying properties. The approaches are illustrated with a small example and are evaluated with a synthetic data set and user queries
- …