5,294 research outputs found
Distributed Subweb Specifications for Traversing the Web
Link Traversal-based Query Processing (ltqp), in which a sparql query is
evaluated over a web of documents rather than a single dataset, is often seen
as a theoretically interesting yet impractical technique. However, in a time
where the hypercentralization of data has increasingly come under scrutiny, a
decentralized Web of Data with a simple document-based interface is appealing,
as it enables data publishers to control their data and access rights. While
ltqp allows evaluating complex queries over such webs, it suffers from
performance issues (due to the high number of documents containing data) as
well as information quality concerns (due to the many sources providing such
documents). In existing ltqp approaches, the burden of finding sources to query
is entirely in the hands of the data consumer. In this paper, we argue that to
solve these issues, data publishers should also be able to suggest sources of
interest and guide the data consumer towards relevant and trustworthy data. We
introduce a theoretical framework that enables such guided link traversal and
study its properties. We illustrate with a theoretic example that this can
improve query results and reduce the number of network requests. We evaluate
our proposal experimentally on a virtual linked web with specifications and
indeed observe that not just the data quality but also the efficiency of
querying improves.
Under consideration in Theory and Practice of Logic Programming (TPLP).Comment: Under consideration in Theory and Practice of Logic Programming
(TPLP
How Many and What Types of SPARQL Queries can be Answered through Zero-Knowledge Link Traversal?
The current de-facto way to query the Web of Data is through the SPARQL
protocol, where a client sends queries to a server through a SPARQL endpoint.
Contrary to an HTTP server, providing and maintaining a robust and reliable
endpoint requires a significant effort that not all publishers are willing or
able to make. An alternative query evaluation method is through link traversal,
where a query is answered by dereferencing online web resources (URIs) at real
time. While several approaches for such a lookup-based query evaluation method
have been proposed, there exists no analysis of the types (patterns) of queries
that can be directly answered on the live Web, without accessing local or
remote endpoints and without a-priori knowledge of available data sources. In
this paper, we first provide a method for checking if a SPARQL query (to be
evaluated on a SPARQL endpoint) can be answered through zero-knowledge link
traversal (without accessing the endpoint), and analyse a large corpus of real
SPARQL query logs for finding the frequency and distribution of answerable and
non-answerable query patterns. Subsequently, we provide an algorithm for
transforming answerable queries to SPARQL-LD queries that bypass the endpoints.
We report experimental results about the efficiency of the transformed queries
and discuss the benefits and the limitations of this query evaluation method.Comment: Preprint of paper accepted for publication in the 34th ACM/SIGAPP
Symposium On Applied Computing (SAC 2019
Hypermedia-based discovery for source selection using low-cost linked data interfaces
Evaluating federated Linked Data queries requires consulting multiple sources on the Web. Before a client can execute queries, it must discover data sources, and determine which ones are relevant. Federated query execution research focuses on the actual execution, while data source discovery is often marginally discussed-even though it has a strong impact on selecting sources that contribute to the query results. Therefore, the authors introduce a discovery approach for Linked Data interfaces based on hypermedia links and controls, and apply it to federated query execution with Triple Pattern Fragments. In addition, the authors identify quantitative metrics to evaluate this discovery approach. This article describes generic evaluation measures and results for their concrete approach. With low-cost data summaries as seed, interfaces to eight large real-world datasets can discover each other within 7 minutes. Hypermedia-based client-side querying shows a promising gain of up to 50% in execution time, but demands algorithms that visit a higher number of interfaces to improve result completeness
- …