Search CORE

532 research outputs found

An Analytical Study of Large SPARQL Query Logs

Author: Bonifati Angela
Martens Wim
Timm Thomas
Publication venue
Publication date: 01/08/2017
Field of study

With the adoption of RDF as the data model for Linked Data and the Semantic Web, query specification from end- users has become more and more common in SPARQL end- points. In this paper, we conduct an in-depth analytical study of the queries formulated by end-users and harvested from large and up-to-date query logs from a wide variety of RDF data sources. As opposed to previous studies, ours is the first assessment on a voluminous query corpus, span- ning over several years and covering many representative SPARQL endpoints. Apart from the syntactical structure of the queries, that exhibits already interesting results on this generalized corpus, we drill deeper in the structural char- acteristics related to the graph- and hypergraph represen- tation of queries. We outline the most common shapes of queries when visually displayed as pseudographs, and char- acterize their (hyper-)tree width. Moreover, we analyze the evolution of queries over time, by introducing the novel con- cept of a streak, i.e., a sequence of queries that appear as subsequent modifications of a seed query. Our study offers several fresh insights on the already rich query features of real SPARQL queries formulated by real users, and brings us to draw a number of conclusions and pinpoint future di- rections for SPARQL query evaluation, query optimization, tuning, and benchmarking

arXiv.org e-Print Archive

HAL

Hal-Diderot

An Analytical Study of Large SPARQL Query Logs

Author: Bonifati Angela
Martens Wim
Timm Thomas
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/06/2020
Field of study

International audienceWith the adoption of RDF as the data model for Linked Data and the Semantic Web, query specification from end-users has become more and more common in SPARQL endpoints. In this paper, we conduct an in-depth analytical study of the queries formulated by end-users and harvested from large and up-to-date structured query logs from a wide variety of RDF data sources. As opposed to previous studies, ours is the first assessment on a voluminous query corpus, spanning over several years and covering many representative SPARQL endpoints. Apart from the syntactical structure of the queries, that exhibits already interesting results on this generalized corpus, we drill deeper in the structural characteristics related to the graph and hypergraph representation of queries. We outline the most common shapes of queries when visually displayed as undirected graphs, characterize their tree width, length of their cycles, maximal degree of nodes, and more. For queries that cannot be adequately represented as graphs, we investigate their hypergraphs and hypertree width. Moreover, we analyze the evolution of queries over time, by introducing the novel concept of a streak, i.e., a sequence of queries that appear as subsequent modifications of

Hal - Université Grenoble Alpes

INRIA a CCSD electronic archive server

HAL

Hal-Diderot

How Many and What Types of SPARQL Queries can be Answered through Zero-Knowledge Link Traversal?

Author: Fafalios Pavlos
Harth Andreas
Heath Tom
Luczak-Roesch Markus
Miranker Daniel P
Tzitzikas Y.
Verborgh Ruben
Yannakis T.
Publication venue
Publication date: 13/12/2018
Field of study

The current de-facto way to query the Web of Data is through the SPARQL protocol, where a client sends queries to a server through a SPARQL endpoint. Contrary to an HTTP server, providing and maintaining a robust and reliable endpoint requires a significant effort that not all publishers are willing or able to make. An alternative query evaluation method is through link traversal, where a query is answered by dereferencing online web resources (URIs) at real time. While several approaches for such a lookup-based query evaluation method have been proposed, there exists no analysis of the types (patterns) of queries that can be directly answered on the live Web, without accessing local or remote endpoints and without a-priori knowledge of available data sources. In this paper, we first provide a method for checking if a SPARQL query (to be evaluated on a SPARQL endpoint) can be answered through zero-knowledge link traversal (without accessing the endpoint), and analyse a large corpus of real SPARQL query logs for finding the frequency and distribution of answerable and non-answerable query patterns. Subsequently, we provide an algorithm for transforming answerable queries to SPARQL-LD queries that bypass the endpoints. We report experimental results about the efficiency of the transformed queries and discuss the benefits and the limitations of this query evaluation method.Comment: Preprint of paper accepted for publication in the 34th ACM/SIGAPP Symposium On Applied Computing (SAC 2019

arXiv.org e-Print Archive

Crossref

Using SPARQL – the practitioners’ viewpoint

Author: A Bonifati
M Ford
R Angles
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2018
Field of study

A number of studies have analyzed SPARQL log data to draw conclusions about how SPARQL is being used. To complement this work, a survey of SPARQL users has been undertaken. Whilst confirming some of the conclusions of the previous studies, the current work is able to provide additional insight into how users create SPARQL queries, the difficulties they encounter, and the features they would like to see included in the language. Based on this insight, a number of recommendations are presented to the community. These relate to predicting and avoiding computationally expensive queries; extensions to the language; and extending the search paradigm

Crossref

Open Research Online (The Open University)

SMART-KG: Hybrid Shipping for SPARQL Querying on the Web

Author: Amr Azzam
Fernandez Garcia Javier David
Maribel Acosta
Polleres Axel
Publication venue: Department für Informationsverarbeitung und Prozessmanagement
Publication date: 16/01/2020
Field of study

While Linked Data (LD) provides standards for publishing (RDF) and (SPARQL) querying Knowledge Graphs (KGs) on the Web, serving, accessing and processing such open, decentralized KGs is often practically impossible, as query timeouts on publicly available SPARQL endpoints show. Alternative solutions such as Triple Pattern Fragments (TPF) attempt to tackle the problem of availability by pushing query processing workload to the client side, but suffer from unnecessary transfer of irrelevant data on complex queries with large intermediate results. In this paper we present smart-KG, a novel approach to share the load between servers and clients, while significantly reducing data transfer volume, by combining TPF with shipping compressed KG partitions. Our evaluations show that outperforms state-of-the-art client-side solutions and increases server-side availability towards more cost-effective and balanced hosting of open and decentralized KGs.Series: Working Papers on Information Systems, Information Business and Operation

Elektronische Publikationen der Wirtschaftsuniversität Wien

Exploration of Large-Scale SPARQL Query Collections : Finding Structure and Regularity for Optimizing Database Systems

Author: Timm Thomas
Publication venue
Publication date: 01/01/2020
Field of study

EPub Bayreuth

PFed: Recommending Plausible Federated SPARQL Queries

Author: El Hassad Sara
Hacques Florian
Molli Pascal
Skaf-Molli Hala
Publication venue: HAL CCSD
Publication date: 26/08/2019
Field of study

International audienceFederated SPARQL queries allow to query multiple inter-linked datasets hosted by remote SPARQL endpoints. However, finding federated queries over a growing number of datasets is challenging. In this paper, we propose PFed, an approach to recommend plausible fed-erated queries based on real query logs of different datasets. The problem is not to find similar federated queries, but plausible complementary queries over different datasets. Starting with a real SPARQL query from a given log, PFed stretches the query with real queries from different logs. To prune the research space, PFed proposes semantic summary to prune the query logs. Experimental results with real logs of DBpedia and SWDF demonstrate that PFed is able to prune drastically the logs and recommend plausible federated queries

Machine Learning-based Query Augmentation for SPARQL Endpoints

Author: Pérez María S.
Queralt Calafat Anna
Rico Mariano
Touma Rizkallah
Publication venue: 'Scitepress'
Publication date: 01/01/2018
Field of study

Linked Data repositories have become a popular source of publicly-available data. Users accessing this data through SPARQL endpoints usually launch several restrictive yet similar consecutive queries, either to find the information they need through trial-and-error or to query related resources. However, instead of executing each individual query separately, query augmentation aims at modifying the incoming queries to retrieve more data that is potentially relevant to subsequent requests. In this paper, we propose a novel approach to query augmentation for SPARQL endpoints based on machine learning. Our approach separates the structure of the query from its contents and measures two types of similarity, which are then used to predict the structure and contents of the augmented query. We test the approach on the real-world query logs of the Spanish and English DBpedia and show that our approach yields high-accuracy prediction. We also show that, by caching the results of the predicted (More)This work has been supported by the European Union's Horizon 2020 research and innovation program (grant H2020-MSCA-ITN-2014-642963), the Spanish Ministry of Science and Innovation (contract TIN2015-65316, project RTC-2016-4952-7 and contract TIN2016-78011-C4-4-R), the Spanish Ministry of Education, Culture and Sports (contract CAS18/00333) and the Generalitat de Catalunya (contract 2014-SGR-1051). The authors would also like to thank Toni Cortes for his feedback.Peer ReviewedPostprint (author's final draft

Crossref

UPCommons. Portal del coneixement obert de la UPC

Semantic data mining and linked data for a recommender system in the AEC industry

Author: Jensen Rasmus Lund
Pauwels Pieter
Petrova Ekaterina
Svidt Kjeld
Publication venue: 'European Council for Computing in Construction'
Publication date: 01/01/2019
Field of study

Even though it can provide design teams with valuable performance insights and enhance decision-making, monitored building data is rarely reused in an effective feedback loop from operation to design. Data mining allows users to obtain such insights from the large datasets generated throughout the building life cycle. Furthermore, semantic web technologies allow to formally represent the built environment and retrieve knowledge in response to domain-specific requirements. Both approaches have independently established themselves as powerful aids in decision-making. Combining them can enrich data mining processes with domain knowledge and facilitate knowledge discovery, representation and reuse. In this article, we look into the available data mining techniques and investigate to what extent they can be fused with semantic web technologies to provide recommendations to the end user in performance-oriented design. We demonstrate an initial implementation of a linked data-based system for generation of recommendations

Crossref

Ghent University Academic Bibliography

Archivsystem Ask23

VBN