7,027 research outputs found

    Processing SPARQL queries with regular expressions in RDF databases

    Get PDF
    Background: As the Resource Description Framework (RDF) data model is widely used for modeling and sharing a lot of online bioinformatics resources such as Uniprot (dev.isb-sib.ch/projects/uniprot-rdf) or Bio2RDF (bio2rdf.org), SPARQL - a W3C recommendation query for RDF databases - has become an important query language for querying the bioinformatics knowledge bases. Moreover, due to the diversity of users' requests for extracting information from the RDF data as well as the lack of users' knowledge about the exact value of each fact in the RDF databases, it is desirable to use the SPARQL query with regular expression patterns for querying the RDF data. To the best of our knowledge, there is currently no work that efficiently supports regular expression processing in SPARQL over RDF databases. Most of the existing techniques for processing regular expressions are designed for querying a text corpus, or only for supporting the matching over the paths in an RDF graph. Results: In this paper, we propose a novel framework for supporting regular expression processing in SPARQL query. Our contributions can be summarized as follows. 1) We propose an efficient framework for processing SPARQL queries with regular expression patterns in RDF databases. 2) We propose a cost model in order to adapt the proposed framework in the existing query optimizers. 3) We build a prototype for the proposed framework in C++ and conduct extensive experiments demonstrating the efficiency and effectiveness of our technique. Conclusions: Experiments with a full-blown RDF engine show that our framework outperforms the existing ones by up to two orders of magnitude in processing SPARQL queries with regular expression patterns.X113sciescopu

    An Analytical Study of Large SPARQL Query Logs

    Full text link
    With the adoption of RDF as the data model for Linked Data and the Semantic Web, query specification from end- users has become more and more common in SPARQL end- points. In this paper, we conduct an in-depth analytical study of the queries formulated by end-users and harvested from large and up-to-date query logs from a wide variety of RDF data sources. As opposed to previous studies, ours is the first assessment on a voluminous query corpus, span- ning over several years and covering many representative SPARQL endpoints. Apart from the syntactical structure of the queries, that exhibits already interesting results on this generalized corpus, we drill deeper in the structural char- acteristics related to the graph- and hypergraph represen- tation of queries. We outline the most common shapes of queries when visually displayed as pseudographs, and char- acterize their (hyper-)tree width. Moreover, we analyze the evolution of queries over time, by introducing the novel con- cept of a streak, i.e., a sequence of queries that appear as subsequent modifications of a seed query. Our study offers several fresh insights on the already rich query features of real SPARQL queries formulated by real users, and brings us to draw a number of conclusions and pinpoint future di- rections for SPARQL query evaluation, query optimization, tuning, and benchmarking

    Distributed RDF query processing and reasoning for big data / linked data

    Get PDF
    Title from PDF of title page, viewed on August 27, 2014Thesis advisor: Yugyung LeeVitaIncludes bibliographical references (pages 61-65)Thesis (M. S.)--School of Computing and Engineering. University of Missouri--Kansas City, 2014The Linked Data Movement is aimed at converting unstructured and semi-structured data on the documents to semantically connected documents called the "web of data." This is based on Resource Description Framework (RDF) that represents the semantic data and a collection of such statements shapes an RDF graph. SPARQL is a query language designed specifically to query RDF data. Linked Data faces the same challenge that Big Data does. We now lead the way to a new wave of a new paradigm, Big Data and Linked Data that identify massive amounts of data in a connected form. Indeed, utilizing Linked Data and Big Data continue to be in high demand. Therefore, we need a scalable and accessible query system for the reusability and availability of existing web data. However, existing SPAQL query systems are not sufficiently scalable for Big Data and Linked Data. In this thesis, we address an issue of how to improve the scalability and performance of query processing with Big Data / Linked Data. Our aim is to evaluate and assess presently available SPARQL query engines and develop an effective model to query RDF data that should be scalable with reasoning capabilities. We designed an efficient and distributed SPARQL engine using MapReduce (parallel and distributed processing for large data sets on a cluster) and the Apache Cassandra database (scalable and highly available peer to peer distributed database system). We evaluated an existing in-memory based ARQ engine provided by Jena framework and found that it cannot handle large datasets, as it only works based on the in-memory feature of the system. It was shown that the proposed model had powerful reasoning capabilities and dealt efficiently with big datasetsAbstract -- Illistrations -- Tables -- Introduction -- Background and related work -- Graph-store based SPARQL model -- Graph-store based SPARQL model implementation -- Results and evaluation -- Conclusion and future work -- Reference

    Optimasi SPARQL Query Menggunakan Graph Database Dengan Model Labeled Property Graph

    Get PDF
    Web semantik menyediakan kerangka kerja umum yang memungkinkan datanya dibagikan dan digunakan ulang secara lintas aplikasi. Model data RDF sendiri sudah digunakan untuk berbagai macam aplikasi web semantik yang berguna untuk mesin pencarian publik, rekayasa pengetahuan, penyimpanan data hasil penelitian, dan proses-proses bisnis aplikasi lainnya. Karena data yang disimpan sangat penting dan dituntut ketahanannya, data RDF yang ada di internet saat ini ukurannya sudah sangat besar dan akan semakin membesar. Hal ini menyebabkan proses query data pada file RDF memakan waktu yang cukup lama. Pada tugas akhir ini, permasalahan tersebut akan ditangani dengan mengusulkan metode optimasi SPARQL query dengan menggunakan database graf yang menerapkan model labeled property graph. Berdasarkan model data RDF yang sudah ada, labeled property graph dapat mengurangi jumlah node yang dihasilkan dari file RDF. Oleh karena itu metode ini diharapkan dapat meningkatkan kecepatan dalam melakukan proses traverse pada data graf. Pada tugas akhir ini, penulis akan membandingkan performa model Labeled Property Graph dengan Triple Store dalam menghadapi SPARQL query. Pengujian yang dilakukan menunjukkan bahwa metode ini dapat memberikan hasil running time yang lebih singkat dengan running time pada model Labeled Property Graph yang jauh lebih cepat jika dibandingkan dengan model Triple Store saat menghadapi SPARQL query.= ========================================================================================================= The Semantik Web provides a common framework that allows data to be shared and reused across applications. The RDF data model itself already in use for variety semantik web applications which is useful to public search engines, is also used for knowledge management, and another business process. Because of the data stored is very important and demanded endurance, the RDF data that exists on internet today is already very large and will be larger.This causes the process time of querying dataset in RDF files become a considerable issue. In this research, mentioned issues will be handled by proposing SPARQL query optimization method using graph database with Labeled Property Graph model. Based on the existing RDF data model, the Labeled Property Graph model can reduce the number of nodes generated from RDF file. Therefore this method is expected to increase the speed in conducting traverse process on graph data. In this research, I will compare the performace of Labeled Property Graph model with Triple Store model in facing certain SPARQL query. Test conducted show that this method can provide faster running time with Labeled Property Graph model that is faster when compared to Triple Store model in facing certain SPARQL query
    corecore