421 research outputs found

    The Odyssey Approach for Optimizing Federated SPARQL Queries

    Full text link
    Answering queries over a federation of SPARQL endpoints requires combining data from more than one data source. Optimizing queries in such scenarios is particularly challenging not only because of (i) the large variety of possible query execution plans that correctly answer the query but also because (ii) there is only limited access to statistics about schema and instance data of remote sources. To overcome these challenges, most federated query engines rely on heuristics to reduce the space of possible query execution plans or on dynamic programming strategies to produce optimal plans. Nevertheless, these plans may still exhibit a high number of intermediate results or high execution times because of heuristics and inaccurate cost estimations. In this paper, we present Odyssey, an approach that uses statistics that allow for a more accurate cost estimation for federated queries and therefore enables Odyssey to produce better query execution plans. Our experimental results show that Odyssey produces query execution plans that are better in terms of data transfer and execution time than state-of-the-art optimizers. Our experiments using the FedBench benchmark show execution time gains of at least 25 times on average.Comment: 16 pages, 10 figure

    Knowledge-infused and Consistent Complex Event Processing over Real-time and Persistent Streams

    Full text link
    Emerging applications in Internet of Things (IoT) and Cyber-Physical Systems (CPS) present novel challenges to Big Data platforms for performing online analytics. Ubiquitous sensors from IoT deployments are able to generate data streams at high velocity, that include information from a variety of domains, and accumulate to large volumes on disk. Complex Event Processing (CEP) is recognized as an important real-time computing paradigm for analyzing continuous data streams. However, existing work on CEP is largely limited to relational query processing, exposing two distinctive gaps for query specification and execution: (1) infusing the relational query model with higher level knowledge semantics, and (2) seamless query evaluation across temporal spaces that span past, present and future events. These allow accessible analytics over data streams having properties from different disciplines, and help span the velocity (real-time) and volume (persistent) dimensions. In this article, we introduce a Knowledge-infused CEP (X-CEP) framework that provides domain-aware knowledge query constructs along with temporal operators that allow end-to-end queries to span across real-time and persistent streams. We translate this query model to efficient query execution over online and offline data streams, proposing several optimizations to mitigate the overheads introduced by evaluating semantic predicates and in accessing high-volume historic data streams. The proposed X-CEP query model and execution approaches are implemented in our prototype semantic CEP engine, SCEPter. We validate our query model using domain-aware CEP queries from a real-world Smart Power Grid application, and experimentally analyze the benefits of our optimizations for executing these queries, using event streams from a campus-microgrid IoT deployment.Comment: 34 pages, 16 figures, accepted in Future Generation Computer Systems, October 27, 201

    Manycore processing of repeated range queries over massive moving objects observations

    Full text link
    The ability to timely process significant amounts of continuously updated spatial data is mandatory for an increasing number of applications. Parallelism enables such applications to face this data-intensive challenge and allows the devised systems to feature low latency and high scalability. In this paper we focus on a specific data-intensive problem, concerning the repeated processing of huge amounts of range queries over massive sets of moving objects, where the spatial extents of queries and objects are continuously modified over time. To tackle this problem and significantly accelerate query processing we devise a hybrid CPU/GPU pipeline that compresses data output and save query processing work. The devised system relies on an ad-hoc spatial index leading to a problem decomposition that results in a set of independent data-parallel tasks. The index is based on a point-region quadtree space decomposition and allows to tackle effectively a broad range of spatial object distributions, even those very skewed. Also, to deal with the architectural peculiarities and limitations of the GPUs, we adopt non-trivial GPU data structures that avoid the need of locked memory accesses and favour coalesced memory accesses, thus enhancing the overall memory throughput. To the best of our knowledge this is the first work that exploits GPUs to efficiently solve repeated range queries over massive sets of continuously moving objects, characterized by highly skewed spatial distributions. In comparison with state-of-the-art CPU-based implementations, our method highlights significant speedups in the order of 14x-20x, depending on the datasets, even when considering very cheap GPUs

    Query Modification in Object-oriented Database Federation

    Get PDF
    We discuss the modification of queries against an integrated view in a federation of object-oriented databases. We present a generalisation of existing algorithms for simple global query processing that works for arbitrarily defined integration classes. We then extend this algorithm to deal with object-oriented features such as queries involving path expressions and nesting. We show how properties of the OO-style of modelling relationships through object references can be exploited to reduce the number of subqueries necessary to evaluate such querie

    Energy-efficient query management scheme for a wireless sensor database system

    Get PDF
    Minimizing the communication overhead to reduce the energy consumption is an essential consideration in sensor network applications, and existing research has mostly concentrated on data aggregation and in-network processing. However, effective query management to optimize the query aggregation plan at the gateway side is also a significant approach to energy saving in practice. In this paper, we present a multiquery management framework to support historical and continuous queries, where the key idea is to reduce common tasks in a collection of queries through merging and aggregation, according to query region, attribute, time duration, and frequency, by executing the common subqueries only once. In this framework, we propose a query management scheme to support query partitioning, region aggregation and approximate processing, time partitioning and aggregation rules, multirate queries, and historical database. In order to validate the performance of our algorithm, a heuristic routing protocol is also described. The performance simulation results show that the overall energy consumption for forwarding and answering a collection of queries can be significantly reduced by applying our query management scheme. The advantages and disadvantages of the proposed scheme are discussed, together with open research issues

    Partout: A Distributed Engine for Efficient RDF Processing

    Full text link
    The increasing interest in Semantic Web technologies has led not only to a rapid growth of semantic data on the Web but also to an increasing number of backend applications with already more than a trillion triples in some cases. Confronted with such huge amounts of data and the future growth, existing state-of-the-art systems for storing RDF and processing SPARQL queries are no longer sufficient. In this paper, we introduce Partout, a distributed engine for efficient RDF processing in a cluster of machines. We propose an effective approach for fragmenting RDF data sets based on a query log, allocating the fragments to nodes in a cluster, and finding the optimal configuration. Partout can efficiently handle updates and its query optimizer produces efficient query execution plans for ad-hoc SPARQL queries. Our experiments show the superiority of our approach to state-of-the-art approaches for partitioning and distributed SPARQL query processing
    corecore