Search CORE

2,604 research outputs found

An Analytical Study of Large SPARQL Query Logs

Author: Bonifati Angela
Martens Wim
Timm Thomas
Publication venue
Publication date: 01/08/2017
Field of study

With the adoption of RDF as the data model for Linked Data and the Semantic Web, query specification from end- users has become more and more common in SPARQL end- points. In this paper, we conduct an in-depth analytical study of the queries formulated by end-users and harvested from large and up-to-date query logs from a wide variety of RDF data sources. As opposed to previous studies, ours is the first assessment on a voluminous query corpus, span- ning over several years and covering many representative SPARQL endpoints. Apart from the syntactical structure of the queries, that exhibits already interesting results on this generalized corpus, we drill deeper in the structural char- acteristics related to the graph- and hypergraph represen- tation of queries. We outline the most common shapes of queries when visually displayed as pseudographs, and char- acterize their (hyper-)tree width. Moreover, we analyze the evolution of queries over time, by introducing the novel con- cept of a streak, i.e., a sequence of queries that appear as subsequent modifications of a seed query. Our study offers several fresh insights on the already rich query features of real SPARQL queries formulated by real users, and brings us to draw a number of conclusions and pinpoint future di- rections for SPARQL query evaluation, query optimization, tuning, and benchmarking

arXiv.org e-Print Archive

HAL

Hal-Diderot

Heuristics-based query optimisation for SPARQL

Author: Boncz P.A. (Peter)
Christophides V.
Fundulaki I.
Sidirourgos E. (Eleftherios)
Tsialiamanis P. (Petros)
Publication venue: 'Springer Fachmedien Wiesbaden GmbH'
Publication date: 01/03/2012
Field of study

Query optimization in RDF Stores is a challenging problem as SPARQL queries typically contain many more joins than equivalent relational plans, and hence lead to a large join order search space. In such cases, cost-based query optimization often is not possible. One practical reason for this is that statistics typically are missing in web scale setting such as the Linked Open Datasets (LOD). The more profound reason is that due to the absence of schematic structure in RDF, join-hit ratio estimation requires complicated forms of correlated join statistics; and currently there are no methods to identify the relevant correlations beforehand. For this reason, the use of good heuristics is essential in SPARQL query optimization, even in the case that are partially used with cost-based statistics (i.e., hybrid query optimization). In this paper we describe a set of useful heuristics for SPARQL query optimizers. We present these in the context of a new Heuristic SPARQL Planner (HSP) that is capable of exploiting the syntactic and the structural variations of the triple patterns in a SPARQL query in order to choose an execution plan without the need of any cost model. For this, we deﬁne the variable graph and we show a reduction of the SPARQL query optimization problem to the maximum weight independent set problem. We implemented our planner on top of the MonetDB open source column-store and evaluated its effectiveness against the state-ofthe-art RDF-3X engine as well as comparing the plan quality with a relational (SQL) equivalent of the benchmarks

CWI's Institutional Repository

Towards Efficient Path Query on Social Network with Hybrid RDF Management

Author: Chen Wei
Gai Lei
Qiu Changhe
Wang Tengjiao
Xu Zhichao
Publication venue
Publication date: 01/01/2014
Field of study

The scalability and exibility of Resource Description Framework(RDF) model make it ideally suited for representing online social networks(OSN). One basic operation in OSN is to find chains of relations,such as k-Hop friends. Property path query in SPARQL can express this type of operation, but its implementation suffers from performance problem considering the ever growing data size and complexity of OSN.In this paper, we present a main memory/disk based hybrid RDF data management framework for efficient property path query. In this hybrid framework, we realize an efficient in-memory algebra operator for property path query using graph traversal, and estimate the cost of this operator to cooperate with existing cost-based optimization. Experiments on benchmark and real dataset demonstrated that our approach can achieve a good tradeoff between data load expense and online query performance

arXiv.org e-Print Archive

Crossref

Optimasi SPARQL Query Menggunakan Graph Database Dengan Model Labeled Property Graph

Author: Rahmansyah Nafiar
Publication venue
Publication date: 18/07/2018
Field of study

Web semantik menyediakan kerangka kerja umum yang memungkinkan datanya dibagikan dan digunakan ulang secara lintas aplikasi. Model data RDF sendiri sudah digunakan untuk berbagai macam aplikasi web semantik yang berguna untuk mesin pencarian publik, rekayasa pengetahuan, penyimpanan data hasil penelitian, dan proses-proses bisnis aplikasi lainnya. Karena data yang disimpan sangat penting dan dituntut ketahanannya, data RDF yang ada di internet saat ini ukurannya sudah sangat besar dan akan semakin membesar. Hal ini menyebabkan proses query data pada file RDF memakan waktu yang cukup lama. Pada tugas akhir ini, permasalahan tersebut akan ditangani dengan mengusulkan metode optimasi SPARQL query dengan menggunakan database graf yang menerapkan model labeled property graph. Berdasarkan model data RDF yang sudah ada, labeled property graph dapat mengurangi jumlah node yang dihasilkan dari file RDF. Oleh karena itu metode ini diharapkan dapat meningkatkan kecepatan dalam melakukan proses traverse pada data graf. Pada tugas akhir ini, penulis akan membandingkan performa model Labeled Property Graph dengan Triple Store dalam menghadapi SPARQL query. Pengujian yang dilakukan menunjukkan bahwa metode ini dapat memberikan hasil running time yang lebih singkat dengan running time pada model Labeled Property Graph yang jauh lebih cepat jika dibandingkan dengan model Triple Store saat menghadapi SPARQL query.= ========================================================================================================= The Semantik Web provides a common framework that allows data to be shared and reused across applications. The RDF data model itself already in use for variety semantik web applications which is useful to public search engines, is also used for knowledge management, and another business process. Because of the data stored is very important and demanded endurance, the RDF data that exists on internet today is already very large and will be larger.This causes the process time of querying dataset in RDF files become a considerable issue. In this research, mentioned issues will be handled by proposing SPARQL query optimization method using graph database with Labeled Property Graph model. Based on the existing RDF data model, the Labeled Property Graph model can reduce the number of nodes generated from RDF file. Therefore this method is expected to increase the speed in conducting traverse process on graph data. In this research, I will compare the performace of Labeled Property Graph model with Triple Store model in facing certain SPARQL query. Test conducted show that this method can provide faster running time with Labeled Property Graph model that is faster when compared to Triple Store model in facing certain SPARQL query

ITS Repository

Distributed Processing of Generalized Graph-Pattern Queries in SPARQL 1.1

Author: Gurajada Sairam
Theobald Martin
Publication venue
Publication date: 01/01/2016
Field of study

We propose an efficient and scalable architecture for processing generalized graph-pattern queries as they are specified by the current W3C recommendation of the SPARQL 1.1 "Query Language" component. Specifically, the class of queries we consider consists of sets of SPARQL triple patterns with labeled property paths. From a relational perspective, this class resolves to conjunctive queries of relational joins with additional graph-reachability predicates. For the scalable, i.e., distributed, processing of this kind of queries over very large RDF collections, we develop a suitable partitioning and indexing scheme, which allows us to shard the RDF triples over an entire cluster of compute nodes and to process an incoming SPARQL query over all of the relevant graph partitions (and thus compute nodes) in parallel. Unlike most prior works in this field, we specifically aim at the unified optimization and distributed processing of queries consisting of both relational joins and graph-reachability predicates. All communication among the compute nodes is established via a proprietary, asynchronous communication protocol based on the Message Passing Interface

arXiv.org e-Print Archive

Open Repository and Bibliography - Luxembourg

MPG.PuRe