2,604 research outputs found
An Analytical Study of Large SPARQL Query Logs
With the adoption of RDF as the data model for Linked Data and the Semantic
Web, query specification from end- users has become more and more common in
SPARQL end- points. In this paper, we conduct an in-depth analytical study of
the queries formulated by end-users and harvested from large and up-to-date
query logs from a wide variety of RDF data sources. As opposed to previous
studies, ours is the first assessment on a voluminous query corpus, span- ning
over several years and covering many representative SPARQL endpoints. Apart
from the syntactical structure of the queries, that exhibits already
interesting results on this generalized corpus, we drill deeper in the
structural char- acteristics related to the graph- and hypergraph represen-
tation of queries. We outline the most common shapes of queries when visually
displayed as pseudographs, and char- acterize their (hyper-)tree width.
Moreover, we analyze the evolution of queries over time, by introducing the
novel con- cept of a streak, i.e., a sequence of queries that appear as
subsequent modifications of a seed query. Our study offers several fresh
insights on the already rich query features of real SPARQL queries formulated
by real users, and brings us to draw a number of conclusions and pinpoint
future di- rections for SPARQL query evaluation, query optimization, tuning,
and benchmarking
Heuristics-based query optimisation for SPARQL
Query optimization in RDF Stores is a challenging problem as SPARQL queries typically contain many more joins than equivalent relational plans, and hence lead to a large join order search space. In such cases, cost-based query optimization often is not possible. One practical reason for this is that statistics typically are missing in web scale setting such as the Linked Open Datasets (LOD). The more profound reason is that due to the absence of schematic structure in RDF, join-hit ratio estimation requires complicated forms of correlated join statistics; and currently there are no methods to identify the relevant correlations beforehand. For this reason, the use of good heuristics is essential in SPARQL query optimization, even in the case that are partially used with cost-based statistics (i.e., hybrid query optimization). In this paper we describe a set of useful heuristics for SPARQL query optimizers. We present these in the context of a new Heuristic SPARQL Planner (HSP) that is capable of exploiting the syntactic and the structural variations of the triple patterns in a SPARQL query in order to choose an execution plan without the need of any cost model. For this, we define the variable graph and we show a reduction of the SPARQL query optimization problem to the maximum weight independent set problem.
We implemented our planner on top of the MonetDB open source column-store and evaluated its effectiveness against the state-ofthe-art RDF-3X engine as well as comparing the plan quality with
a relational (SQL) equivalent of the benchmarks
Towards Efficient Path Query on Social Network with Hybrid RDF Management
The scalability and exibility of Resource Description Framework(RDF) model
make it ideally suited for representing online social networks(OSN). One basic
operation in OSN is to find chains of relations,such as k-Hop friends. Property
path query in SPARQL can express this type of operation, but its implementation
suffers from performance problem considering the ever growing data size and
complexity of OSN.In this paper, we present a main memory/disk based hybrid RDF
data management framework for efficient property path query. In this hybrid
framework, we realize an efficient in-memory algebra operator for property path
query using graph traversal, and estimate the cost of this operator to
cooperate with existing cost-based optimization. Experiments on benchmark and
real dataset demonstrated that our approach can achieve a good tradeoff between
data load expense and online query performance
Optimasi SPARQL Query Menggunakan Graph Database Dengan Model Labeled Property Graph
Web semantik menyediakan kerangka kerja umum yang memungkinkan datanya dibagikan dan digunakan ulang secara lintas aplikasi. Model data RDF sendiri sudah digunakan untuk berbagai macam aplikasi web semantik yang berguna untuk mesin pencarian publik, rekayasa pengetahuan, penyimpanan data hasil penelitian, dan proses-proses bisnis aplikasi lainnya. Karena data yang disimpan sangat penting dan dituntut ketahanannya, data RDF yang ada di internet saat ini ukurannya sudah sangat besar dan akan semakin membesar. Hal ini menyebabkan proses query data pada file RDF memakan waktu yang cukup lama.
Pada tugas akhir ini, permasalahan tersebut akan ditangani dengan mengusulkan metode optimasi SPARQL query dengan menggunakan database graf yang menerapkan model labeled property graph. Berdasarkan model data RDF yang sudah ada, labeled property graph dapat mengurangi jumlah node yang dihasilkan dari file RDF. Oleh karena itu metode ini diharapkan dapat meningkatkan kecepatan dalam melakukan proses traverse pada data graf. Pada tugas akhir ini, penulis akan membandingkan performa model Labeled Property Graph dengan Triple Store dalam menghadapi SPARQL query.
Pengujian yang dilakukan menunjukkan bahwa metode ini dapat memberikan hasil running time yang lebih singkat dengan running time pada model Labeled Property Graph yang jauh lebih cepat jika dibandingkan dengan model Triple Store saat menghadapi SPARQL query.=
=========================================================================================================
The Semantik Web provides a common framework that allows data to be shared and reused across applications. The RDF data model itself already in use for variety semantik web applications which is useful to public search engines, is also used for knowledge management, and another business process. Because of the data stored is very important and demanded endurance, the RDF data that exists on internet today is already very large and will be larger.This causes the process time of querying dataset in RDF files become a considerable issue.
In this research, mentioned issues will be handled by proposing SPARQL query optimization method using graph database with Labeled Property Graph model. Based on the existing RDF data model, the Labeled Property Graph model can reduce the number of nodes generated from RDF file. Therefore this method is expected to increase the speed in conducting traverse process on graph data. In this research, I will compare the performace of Labeled Property Graph model with Triple Store model in facing certain SPARQL query.
Test conducted show that this method can provide faster running time with Labeled Property Graph model that is faster when compared to Triple Store model in facing certain SPARQL query
Distributed Processing of Generalized Graph-Pattern Queries in SPARQL 1.1
We propose an efficient and scalable architecture for processing generalized
graph-pattern queries as they are specified by the current W3C recommendation
of the SPARQL 1.1 "Query Language" component. Specifically, the class of
queries we consider consists of sets of SPARQL triple patterns with labeled
property paths. From a relational perspective, this class resolves to
conjunctive queries of relational joins with additional graph-reachability
predicates. For the scalable, i.e., distributed, processing of this kind of
queries over very large RDF collections, we develop a suitable partitioning and
indexing scheme, which allows us to shard the RDF triples over an entire
cluster of compute nodes and to process an incoming SPARQL query over all of
the relevant graph partitions (and thus compute nodes) in parallel. Unlike most
prior works in this field, we specifically aim at the unified optimization and
distributed processing of queries consisting of both relational joins and
graph-reachability predicates. All communication among the compute nodes is
established via a proprietary, asynchronous communication protocol based on the
Message Passing Interface
- …