248 research outputs found

    Distributed Processing of Generalized Graph-Pattern Queries in SPARQL 1.1

    Get PDF
    We propose an efficient and scalable architecture for processing generalized graph-pattern queries as they are specified by the current W3C recommendation of the SPARQL 1.1 "Query Language" component. Specifically, the class of queries we consider consists of sets of SPARQL triple patterns with labeled property paths. From a relational perspective, this class resolves to conjunctive queries of relational joins with additional graph-reachability predicates. For the scalable, i.e., distributed, processing of this kind of queries over very large RDF collections, we develop a suitable partitioning and indexing scheme, which allows us to shard the RDF triples over an entire cluster of compute nodes and to process an incoming SPARQL query over all of the relevant graph partitions (and thus compute nodes) in parallel. Unlike most prior works in this field, we specifically aim at the unified optimization and distributed processing of queries consisting of both relational joins and graph-reachability predicates. All communication among the compute nodes is established via a proprietary, asynchronous communication protocol based on the Message Passing Interface

    A new filtering index for fast processing of SPARQL queries

    Get PDF
    Title from PDF of title page, viewed on October 21, 2013VitaThesis advisor: Praveen RaoIncludes bibliographic references (pages 78-82)Thesis (M.S.)--School of Computing and Engineering. University of Missouri--Kansas City, 2013The Resource Description Framework (RDF) has become a popular data model for representing data on the Web. Using RDF, any assertion can be represented as a (subject, predicate, object) triple. Essentially, RDF datasets can be viewed as directed, labeled graphs. Queries on RDF data are written using the SPARQL query language and contain basic graph patterns (BGPs). We present a new filtering index and query processing technique for processing large BGPs in SPARQL queries. Our approach called RIS treats RDF graphs as "first-class citizens." Unlike previous scalable approaches that store RDF data as triples in an RDBMS and process SPARQL queries by executing appropriate SQL queries, RIS aims to speed up query processing by reducing the processing cost of join operations. In RIS, RDF graphs are mapped into signatures, which are multisets. These signatures are grouped based on a similarity metric and indexed using Counting Bloom Filters. During query processing, the Counting Bloom Filters are checked to filter out non-matches, and finally the candidates are verified using Apache Jena. The filtering step prunes away a large portion of the dataset and results in faster processing of queries. We have conducted an in-depth performance evaluation using the Lehigh University Benchmark (LUBM) dataset and SPARQL queries containing large BGPs. We compared RIS with RDF-3X, which is a state-of-the-art scalable RDF querying engine that uses an RDBMS. RIS can significantly outperform RDF-3X in terms of total execution time for the tested dataset and queries.Introduction -- Motivation and related work -- Background -- Bloom filters and Bloom counters -- System architecture -- Signature tree generation -- Querying the signature tree -- Evaluation -- Experiments -- Conclusio

    Distributed Set Reachability

    Get PDF

    PReaCH: A Fast Lightweight Reachability Index using Pruning and Contraction Hierarchies

    Full text link
    We develop the data structure PReaCH (for Pruned Reachability Contraction Hierarchies) which supports reachability queries in a directed graph, i.e., it supports queries that ask whether two nodes in the graph are connected by a directed path. PReaCH adapts the contraction hierarchy speedup techniques for shortest path queries to the reachability setting. The resulting approach is surprisingly simple and guarantees linear space and near linear preprocessing time. Orthogonally to that, we improve existing pruning techniques for the search by gathering more information from a single DFS-traversal of the graph. PReaCH-indices significantly outperform previous data structures with comparable preprocessing cost. Methods with faster queries need significantly more preprocessing time in particular for the most difficult instances
    • …
    corecore