Search CORE

20 research outputs found

An Empirical Study of Real-World SPARQL Queries

Author: Arias Mario
de la Fuente Pablo
Fernández Javier D.
Martínez-Prieto Miguel A.
Publication venue
Publication date: 01/01/2011
Field of study

Understanding how users tailor their SPARQL queries is crucial when designing query evaluation engines or fine-tuning RDF stores with performance in mind. In this paper we analyze 3 million real-world SPARQL queries extracted from logs of the DBPedia and SWDF public endpoints. We aim at finding which are the most used language elements both from syntactical and structural perspectives, paying special attention to triple patterns and joins, since they are indeed some of the most expensive SPARQL operations at evaluation phase. We have determined that most of the queries are simple and include few triple patterns and joins, being Subject-Subject, Subject-Object and Object-Object the most common join types. The graph patterns are usually star-shaped and despite triple pattern chains exist, they are generally short.Comment: 1st International Workshop on Usage Analysis and the Web of Data (USEWOD2011) in the 20th International World Wide Web Conference (WWW2011), Hyderabad, India, March 28th, 201

arXiv.org e-Print Archive

CiteSeerX

SPARQL Query Recommendation by Example: Assessing the Impact of Structural Analysis on Star-Shaped Queries

Author: Adamou Alessandro
Allocca Carlo
d\u27Aquin Mathieu
Motta Enrico
Publication venue: OASIcs - OpenAccess Series in Informatics. 2nd Conference on Language, Data and Knowledge (LDK 2019)
Publication date: 01/01/2019
Field of study

One of the existing query recommendation strategies for unknown datasets is "by example", i.e. based on a query that the user already knows how to formulate on another dataset within a similar domain. In this paper we measure what contribution a structural analysis of the query and the datasets can bring to a recommendation strategy, to go alongside approaches that provide a semantic analysis. Here we concentrate on the case of star-shaped SPARQL queries over RDF datasets. The illustrated strategy performs a least general generalization on the given query, computes the specializations of it that are satisfiable by the target dataset, and organizes them into a graph. It then visits the graph to recommend first the reformulated queries that reflect the original query as closely as possible. This approach does not rely upon a semantic mapping between the two datasets. An implementation as part of the SQUIRE query recommendation library is discussed

Dagstuhl Research Online Publication Server

Extending SPARQL algebra to support efficient evaluation of top-k SPARQL queries

Author: Bozzon A.
Della Valle E
Magliacane S.
Publication venue: Springer
Publication date: 01/01/2012
Field of study

Archivio istituzionale della ricerca - Politecnico di Milano

VU Research Portal

Crossref

An Analytical Study of Large SPARQL Query Logs

Author: Bonifati Angela
Martens Wim
Timm Thomas
Publication venue
Publication date: 01/08/2017
Field of study

With the adoption of RDF as the data model for Linked Data and the Semantic Web, query specification from end- users has become more and more common in SPARQL end- points. In this paper, we conduct an in-depth analytical study of the queries formulated by end-users and harvested from large and up-to-date query logs from a wide variety of RDF data sources. As opposed to previous studies, ours is the first assessment on a voluminous query corpus, span- ning over several years and covering many representative SPARQL endpoints. Apart from the syntactical structure of the queries, that exhibits already interesting results on this generalized corpus, we drill deeper in the structural char- acteristics related to the graph- and hypergraph represen- tation of queries. We outline the most common shapes of queries when visually displayed as pseudographs, and char- acterize their (hyper-)tree width. Moreover, we analyze the evolution of queries over time, by introducing the novel con- cept of a streak, i.e., a sequence of queries that appear as subsequent modifications of a seed query. Our study offers several fresh insights on the already rich query features of real SPARQL queries formulated by real users, and brings us to draw a number of conclusions and pinpoint future di- rections for SPARQL query evaluation, query optimization, tuning, and benchmarking

arXiv.org e-Print Archive

HAL

Hal-Diderot

Heuristics-based query optimisation for SPARQL

Author: Boncz P.A.
Christophides V.
Fundulaki I.
Sidirourgos L.
Tsialiamanis P.
Publication venue
Publication date: 01/01/2012
Field of study

VU Research Portal

Heuristics-based query optimisation for SPARQL

Author: Boncz P.A. (Peter)
Christophides V.
Fundulaki I.
Sidirourgos E. (Eleftherios)
Tsialiamanis P. (Petros)
Publication venue: 'Springer Fachmedien Wiesbaden GmbH'
Publication date: 01/03/2012
Field of study

Query optimization in RDF Stores is a challenging problem as SPARQL queries typically contain many more joins than equivalent relational plans, and hence lead to a large join order search space. In such cases, cost-based query optimization often is not possible. One practical reason for this is that statistics typically are missing in web scale setting such as the Linked Open Datasets (LOD). The more profound reason is that due to the absence of schematic structure in RDF, join-hit ratio estimation requires complicated forms of correlated join statistics; and currently there are no methods to identify the relevant correlations beforehand. For this reason, the use of good heuristics is essential in SPARQL query optimization, even in the case that are partially used with cost-based statistics (i.e., hybrid query optimization). In this paper we describe a set of useful heuristics for SPARQL query optimizers. We present these in the context of a new Heuristic SPARQL Planner (HSP) that is capable of exploiting the syntactic and the structural variations of the triple patterns in a SPARQL query in order to choose an execution plan without the need of any cost model. For this, we deﬁne the variable graph and we show a reduction of the SPARQL query optimization problem to the maximum weight independent set problem. We implemented our planner on top of the MonetDB open source column-store and evaluated its effectiveness against the state-ofthe-art RDF-3X engine as well as comparing the plan quality with a relational (SQL) equivalent of the benchmarks

CWI's Institutional Repository

ANAPSID: An Adaptive Query Processing Engine for SPARQL Endpoints

Author: A. Deshpande
B. Quilitz
C. Buil-Aranda
C. Weiss
D. Izquierdo
G. Ladwig
G. Ladwig
G. Wiederhold
M.-E. Vidal
O. Hartig
O. Hartig
T. Tran
T. Urhan
Y. Li
Z. Kaoudi
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2011
Field of study

Abstract. Following the design rules of Linked Data, the number of available SPARQL endpoints that support remote query processing is quickly growing; however, because of the lack of adaptivity, query executions may frequently be unsuccessful. First, fixed plans identified following the traditional optimize-then-execute paradigm, may timeout as a consequence of endpoint availability. Sec-ond, because blocking operators are usually implemented, endpoint query en-gines are not able to incrementally produce results, and may become blocked if data sources stop sending data. We present ANAPSID, an adaptive query engine for SPARQL endpoints that adapts query execution schedulers to data availabil-ity and run-time conditions. ANAPSID provides physical SPARQL operators that detect when a source becomes blocked or data traffic is bursty, and opportunis-tically, the operators produce results as quickly as data arrives from the sources. Additionally, ANAPSID operators implement main memory replacement policies to move previously computed matches to secondary memory avoiding duplicates. We compared ANAPSID performance with respect to RDF stores and endpoints, and observed that ANAPSID speeds up execution time, in some cases, in more than one order of magnitude.

CiteSeerX

Crossref

Characteristic sets profile features: Estimation and application to SPARQL query planning

Author: Acosta Maribel
Heling Lars
Kirrane Sabrina
Kirrane Sabrina
Ngonga Ngomo Axel-Cyrille
Ngonga Ngomo Axel-Cyrille
Publication venue: IOS Press
Publication date: 13/07/2023
Field of study

RDF dataset profiling is the task of extracting a formal representation of a dataset’s features. Such features may cover various aspects of the RDF dataset ranging from information on licensing and provenance to statistical descriptors of the data distribution and its semantics. In this work, we focus on the characteristics sets profile features that capture both structural and semantic information of an RDF dataset, making them a valuable resource for different downstream applications. While previous research demonstrated the benefits of characteristic sets in centralized and federated query processing, access to these fine-grained statistics is taken for granted. However, especially in federated query processing, computing this profile feature is challenging as it can be difficult and/or costly to access and process the entire data from all federation members. We address this shortcoming by introducing the concept of a profile feature estimation and propose a sampling-based approach to generate estimations for the characteristic sets profile feature. In addition, we showcase the applicability of these feature estimations in federated querying by proposing a query planning approach that is specifically designed to leverage these feature estimations. In our first experimental study, we intrinsically evaluate our approach on the representativeness of the feature estimation. The results show that even small samples of just 0.5% of the original graph’s entities allow for estimating both structural and statistical properties of the characteristic sets profile features. Our second experimental study extrinsically evaluates the estimations by investigating their applicability in our query planner using the well-known FedBench benchmark. The results of the experiments show that the estimated profile features allow for obtaining efficient query plans

KITopen