Reachability queries checking the existence of a path from a source node to a
target node are fundamental operators for querying and processing graph data.
Current approaches for index-based evaluation of reachability queries either
focus on plain reachability or constraint-based reachability with alternation
only. In this paper, for the first time we study the problem of index-based
processing for recursive label-concatenated reachability queries, referred to
as RLC queries. These queries check the existence of a path that can satisfy
the constraint defined by a concatenation of at most k edge labels under the
Kleene plus. Many practical graph database and network analysis applications
exhibit RLC queries. However, their evaluation remains prohibitive in current
graph database engines.
We introduce the RLC index, the first reachability index to efficiently
process RLC queries. The RLC index checks whether the source vertex can reach
an intermediate vertex that can also reach the target vertex under a recursive
label-concatenated constraint. We propose an indexing algorithm to build the
RLC index, which guarantees the soundness and the completeness of query
execution and avoids recording redundant index entries. Comprehensive
experiments on real-world graphs show that the RLC index can significantly
reduce both the offline processing cost and the memory overhead of transitive
closure while improving query processing up to six orders of magnitude over
online traversals. Finally, our open-source implementation of the RLC index
significantly outperforms current mainstream graph engines for evaluating RLC
queries