204 research outputs found

    The Odyssey Approach for Optimizing Federated SPARQL Queries

    Full text link
    Answering queries over a federation of SPARQL endpoints requires combining data from more than one data source. Optimizing queries in such scenarios is particularly challenging not only because of (i) the large variety of possible query execution plans that correctly answer the query but also because (ii) there is only limited access to statistics about schema and instance data of remote sources. To overcome these challenges, most federated query engines rely on heuristics to reduce the space of possible query execution plans or on dynamic programming strategies to produce optimal plans. Nevertheless, these plans may still exhibit a high number of intermediate results or high execution times because of heuristics and inaccurate cost estimations. In this paper, we present Odyssey, an approach that uses statistics that allow for a more accurate cost estimation for federated queries and therefore enables Odyssey to produce better query execution plans. Our experimental results show that Odyssey produces query execution plans that are better in terms of data transfer and execution time than state-of-the-art optimizers. Our experiments using the FedBench benchmark show execution time gains of at least 25 times on average.Comment: 16 pages, 10 figure

    Efficient Query Processing for SPARQL Federations with Replicated Fragments

    Get PDF
    Low reliability and availability of public SPARQL endpoints prevent real-world applications from exploiting all the potential of these querying infras-tructures. Fragmenting data on servers can improve data availability but degrades performance. Replicating fragments can offer new tradeoff between performance and availability. We propose FEDRA, a framework for querying Linked Data that takes advantage of client-side data replication, and performs a source selection algorithm that aims to reduce the number of selected public SPARQL endpoints, execution time, and intermediate results. FEDRA has been implemented on the state-of-the-art query engines ANAPSID and FedX, and empirically evaluated on a variety of real-world datasets

    Hypermedia-based discovery for source selection using low-cost linked data interfaces

    Get PDF
    Evaluating federated Linked Data queries requires consulting multiple sources on the Web. Before a client can execute queries, it must discover data sources, and determine which ones are relevant. Federated query execution research focuses on the actual execution, while data source discovery is often marginally discussed-even though it has a strong impact on selecting sources that contribute to the query results. Therefore, the authors introduce a discovery approach for Linked Data interfaces based on hypermedia links and controls, and apply it to federated query execution with Triple Pattern Fragments. In addition, the authors identify quantitative metrics to evaluate this discovery approach. This article describes generic evaluation measures and results for their concrete approach. With low-cost data summaries as seed, interfaces to eight large real-world datasets can discover each other within 7 minutes. Hypermedia-based client-side querying shows a promising gain of up to 50% in execution time, but demands algorithms that visit a higher number of interfaces to improve result completeness

    KGRAM Versatile Inference and Query Engine for the Web of Linked Data

    Get PDF
    International audienceQuerying and linking distributed and heterogeneous databases is increasingly needed, as plentiful data resources are published over the Web. This work describes the design of a versatile query system named KGRAM that supports (i) multiple query languages among which the SPARQL 1.1 standard, (ii) federation of multiple heterogeneous and distributed data sources, and (iii) adaptability to various data manipulation use cases. KGRAM provides abstractions for both the query language and the data model, thus delivering unifying reasoning mechanisms. It is implemented as a modular software suite to ease architecting and deploying dedicated data manipulation platforms. Its design integrates optimization concerns to deliver high query performance. Both KGRAM's software versatility and performance are evaluated

    Distributed Join Approaches for W3C-Conform SPARQL Endpoints

    Get PDF
    Currently many SPARQL endpoints are freely available and accessible without any costs to users: Everyone can submit SPARQL queries to SPARQL endpoints via a standardized protocol, where the queries are processed on the datasets of the SPARQL endpoints and the query results are sent back to the user in a standardized format. As these distributed execution environments for semantic big data (as intersection of semantic data and big data) are freely accessible, the Semantic Web is an ideal playground for big data research. However, when utilizing these distributed execution environments, questions about the performance arise. Especially when several datasets (locally and those residing in SPARQL endpoints) need to be combined, distributed joins need to be computed. In this work we give an overview of the various possibilities of distributed join processing in SPARQL endpoints, which follow the SPARQL specification and hence are "W3C conform". We also introduce new distributed join approaches as variants of the Bitvector-Join and combination of the Semi- and Bitvector-Join. Finally we compare all the existing and newly proposed distributed join approaches for W3C conform SPARQL endpoints in an extensive experimental evaluation

    Federating Queries to RDF repositories

    Get PDF
    Currently large amounts of RDF data are being published in the Web. These data is commonly accessed by means of SPARQL endpoints. However to query a set of SPARQL endpoints new mechanisms are needed due to neither the SPARQL protocol nor the language provide any norms or guidelines about how to proceed. In this paper we present an approach for federating queries to a set of SPARQL endpoints, using relational database distributed query processing techniques and part of the WS-DAI specification for web-service based access to relational and XML databases

    Federating queries in SPARQL 1.1: syntax, semantics and evaluation

    Full text link
    Given the sustained growth that we are experiencing in the number of SPARQL endpoints available, the need to be able to send federated SPARQL queries across these has also grown. To address this use case, the W3C SPARQL working group is defining a federation extension for SPARQL 1.1 which allows for combining graph patterns that can be evaluated over several endpoints within a single query. In this paper, we describe the syntax of that extension and formalize its semantics. Additionally, we describe how a query evaluation system can be implemented for that federation extension, describing some static optimization techniques and reusing a query engine used for data-intensive science, so as to deal with large amounts of intermediate and final results. Finally we carry out a series of experiments that show that our optimizations speed up the federated query evaluation process
    • …
    corecore