1,933 research outputs found
High-Performance Reachability Query Processing under Index Size Restrictions
In this paper, we propose a scalable and highly efficient index structure for
the reachability problem over graphs. We build on the well-known node interval
labeling scheme where the set of vertices reachable from a particular node is
compactly encoded as a collection of node identifier ranges. We impose an
explicit bound on the size of the index and flexibly assign approximate
reachability ranges to nodes of the graph such that the number of index probes
to answer a query is minimized. The resulting tunable index structure generates
a better range labeling if the space budget is increased, thus providing a
direct control over the trade off between index size and the query processing
performance. By using a fast recursive querying method in conjunction with our
index structure, we show that in practice, reachability queries can be answered
in the order of microseconds on an off-the-shelf computer - even for the case
of massive-scale real world graphs. Our claims are supported by an extensive
set of experimental results using a multitude of benchmark and real-world
web-scale graph datasets.Comment: 30 page
Distributed Processing of Generalized Graph-Pattern Queries in SPARQL 1.1
We propose an efficient and scalable architecture for processing generalized
graph-pattern queries as they are specified by the current W3C recommendation
of the SPARQL 1.1 "Query Language" component. Specifically, the class of
queries we consider consists of sets of SPARQL triple patterns with labeled
property paths. From a relational perspective, this class resolves to
conjunctive queries of relational joins with additional graph-reachability
predicates. For the scalable, i.e., distributed, processing of this kind of
queries over very large RDF collections, we develop a suitable partitioning and
indexing scheme, which allows us to shard the RDF triples over an entire
cluster of compute nodes and to process an incoming SPARQL query over all of
the relevant graph partitions (and thus compute nodes) in parallel. Unlike most
prior works in this field, we specifically aim at the unified optimization and
distributed processing of queries consisting of both relational joins and
graph-reachability predicates. All communication among the compute nodes is
established via a proprietary, asynchronous communication protocol based on the
Message Passing Interface
Adding Logical Operators to Tree Pattern Queries on Graph-Structured Data
As data are increasingly modeled as graphs for expressing complex
relationships, the tree pattern query on graph-structured data becomes an
important type of queries in real-world applications. Most practical query
languages, such as XQuery and SPARQL, support logical expressions using
logical-AND/OR/NOT operators to define structural constraints of tree patterns.
In this paper, (1) we propose generalized tree pattern queries (GTPQs) over
graph-structured data, which fully support propositional logic of structural
constraints. (2) We make a thorough study of fundamental problems including
satisfiability, containment and minimization, and analyze the computational
complexity and the decision procedures of these problems. (3) We propose a
compact graph representation of intermediate results and a pruning approach to
reduce the size of intermediate results and the number of join operations --
two factors that often impair the efficiency of traditional algorithms for
evaluating tree pattern queries. (4) We present an efficient algorithm for
evaluating GTPQs using 3-hop as the underlying reachability index. (5)
Experiments on both real-life and synthetic data sets demonstrate the
effectiveness and efficiency of our algorithm, from several times to orders of
magnitude faster than state-of-the-art algorithms in terms of evaluation time,
even for traditional tree pattern queries with only conjunctive operations.Comment: 16 page
TopCom: Index for Shortest Distance Query in Directed Graph
Finding shortest distance between two vertices in a graph is an important
problem due to its numerous applications in diverse domains, including
geo-spatial databases, social network analysis, and information retrieval.
Classical algorithms (such as, Dijkstra) solve this problem in polynomial time,
but these algorithms cannot provide real-time response for a large number of
bursty queries on a large graph. So, indexing based solutions that pre-process
the graph for efficiently answering (exactly or approximately) a large number
of distance queries in real-time is becoming increasingly popular. Existing
solutions have varying performance in terms of index size, index building time,
query time, and accuracy. In this work, we propose T OP C OM , a novel
indexing-based solution for exactly answering distance queries. Our experiments
with two of the existing state-of-the-art methods (IS-Label and TreeMap) show
the superiority of T OP C OM over these two methods considering scalability and
query time. Besides, indexing of T OP C OM exploits the DAG (directed acyclic
graph) structure in the graph, which makes it significantly faster than the
existing methods if the SCCs (strongly connected component) of the input graph
are relatively small
Reasoning & Querying – State of the Art
Various query languages for Web and Semantic Web data, both for practical use and as an area of research in the scientific community, have emerged in recent years. At the same time, the broad adoption of the internet where keyword search is used in many applications, e.g. search engines, has familiarized casual users with using keyword queries to retrieve information on the internet. Unlike this easy-to-use querying, traditional query languages require knowledge of the language itself as well as of the data to be queried. Keyword-based query languages for XML and RDF bridge the gap between the two, aiming at enabling simple querying of semi-structured data, which is relevant e.g. in the context of the emerging Semantic Web. This article presents an overview of the field of keyword querying for XML and RDF
Labeling Workflow Views with Fine-Grained Dependencies
This paper considers the problem of efficiently answering reachability
queries over views of provenance graphs, derived from executions of workflows
that may include recursion. Such views include composite modules and model
fine-grained dependencies between module inputs and outputs. A novel
view-adaptive dynamic labeling scheme is developed for efficient query
evaluation, in which view specifications are labeled statically (i.e. as they
are created) and data items are labeled dynamically as they are produced during
a workflow execution. Although the combination of fine-grained dependencies and
recursive workflows entail, in general, long (linear-size) data labels, we show
that for a large natural class of workflows and views, labels are compact
(logarithmic-size) and reachability queries can be evaluated in constant time.
Experimental results demonstrate the benefit of this approach over the
state-of-the-art technique when applied for labeling multiple views.Comment: VLDB201
- …