6,685 research outputs found
Co-evolution of RDF Datasets
Linking Data initiatives have fostered the publication of large number of RDF
datasets in the Linked Open Data (LOD) cloud, as well as the development of
query processing infrastructures to access these data in a federated fashion.
However, different experimental studies have shown that availability of LOD
datasets cannot be always ensured, being RDF data replication required for
envisioning reliable federated query frameworks. Albeit enhancing data
availability, RDF data replication requires synchronization and conflict
resolution when replicas and source datasets are allowed to change data over
time, i.e., co-evolution management needs to be provided to ensure consistency.
In this paper, we tackle the problem of RDF data co-evolution and devise an
approach for conflict resolution during co-evolution of RDF datasets. Our
proposed approach is property-oriented and allows for exploiting semantics
about RDF properties during co-evolution management. The quality of our
approach is empirically evaluated in different scenarios on the DBpedia-live
dataset. Experimental results suggest that proposed proposed techniques have a
positive impact on the quality of data in source datasets and replicas.Comment: 18 pages, 4 figures, Accepted in ICWE, 201
Tracking Federated Queries in the Linked Data
Federated query engines allow data consumers to execute queries over the
federation of Linked Data (LD). However, as federated queries are decomposed
into potentially thousands of subqueries distributed among SPARQL endpoints,
data providers do not know federated queries, they only know subqueries they
process. Consequently, unlike warehousing approaches, LD data providers have no
access to secondary data. In this paper, we propose FETA (FEderated query
TrAcking), a query tracking algorithm that infers Basic Graph Patterns (BGPs)
processed by a federation from a shared log maintained by data providers.
Concurrent execution of thousand subqueries generated by multiple federated
query engines makes the query tracking process challenging and uncertain.
Experiments with Anapsid show that FETA is able to extract BGPs which, even in
a worst case scenario, contain BGPs of original queries
Hypermedia-based discovery for source selection using low-cost linked data interfaces
Evaluating federated Linked Data queries requires consulting multiple sources on the Web. Before a client can execute queries, it must discover data sources, and determine which ones are relevant. Federated query execution research focuses on the actual execution, while data source discovery is often marginally discussed-even though it has a strong impact on selecting sources that contribute to the query results. Therefore, the authors introduce a discovery approach for Linked Data interfaces based on hypermedia links and controls, and apply it to federated query execution with Triple Pattern Fragments. In addition, the authors identify quantitative metrics to evaluate this discovery approach. This article describes generic evaluation measures and results for their concrete approach. With low-cost data summaries as seed, interfaces to eight large real-world datasets can discover each other within 7 minutes. Hypermedia-based client-side querying shows a promising gain of up to 50% in execution time, but demands algorithms that visit a higher number of interfaces to improve result completeness
The Odyssey Approach for Optimizing Federated SPARQL Queries
Answering queries over a federation of SPARQL endpoints requires combining
data from more than one data source. Optimizing queries in such scenarios is
particularly challenging not only because of (i) the large variety of possible
query execution plans that correctly answer the query but also because (ii)
there is only limited access to statistics about schema and instance data of
remote sources. To overcome these challenges, most federated query engines rely
on heuristics to reduce the space of possible query execution plans or on
dynamic programming strategies to produce optimal plans. Nevertheless, these
plans may still exhibit a high number of intermediate results or high execution
times because of heuristics and inaccurate cost estimations. In this paper, we
present Odyssey, an approach that uses statistics that allow for a more
accurate cost estimation for federated queries and therefore enables Odyssey to
produce better query execution plans. Our experimental results show that
Odyssey produces query execution plans that are better in terms of data
transfer and execution time than state-of-the-art optimizers. Our experiments
using the FedBench benchmark show execution time gains of at least 25 times on
average.Comment: 16 pages, 10 figure
Efficient Query Processing for SPARQL Federations with Replicated Fragments
Low reliability and availability of public SPARQL endpoints prevent
real-world applications from exploiting all the potential of these querying
infras-tructures. Fragmenting data on servers can improve data availability but
degrades performance. Replicating fragments can offer new tradeoff between
performance and availability. We propose FEDRA, a framework for querying Linked
Data that takes advantage of client-side data replication, and performs a
source selection algorithm that aims to reduce the number of selected public
SPARQL endpoints, execution time, and intermediate results. FEDRA has been
implemented on the state-of-the-art query engines ANAPSID and FedX, and
empirically evaluated on a variety of real-world datasets
- …