6,251 research outputs found

    Evaluation and Enumeration Problems for Regular Path Queries

    Get PDF
    Regular path queries (RPQs) are a central component of graph databases. We investigate decision- and enumeration problems concerning the evaluation of RPQs under several semantics that have recently been considered: arbitrary paths, shortest paths, and simple paths. Whereas arbitrary and shortest paths can be enumerated in polynomial delay, the situation is much more intricate for simple paths. For instance, already the question if a given graph contains a simple path of a certain length has cases with highly non-trivial solutions and cases that are long-standing open problems. We study RPQ evaluation for simple paths from a parameterized complexity perspective and define a class of simple transitive expressions that is prominent in practice and for which we can prove a dichotomy for the evaluation problem. We observe that, even though simple path semantics is intractable for RPQs in general, it is feasible for the vast majority of RPQs that are used in practice. At the heart of our study on simple paths is a result of independent interest: the two disjoint paths problem in directed graphs is W[1]-hard if parameterized by the length of one of the two paths

    Joining Extractions of Regular Expressions

    Get PDF
    Regular expressions with capture variables, also known as "regex formulas," extract relations of spans (interval positions) from text. These relations can be further manipulated via Relational Algebra as studied in the context of document spanners, Fagin et al.'s formal framework for information extraction. We investigate the complexity of querying text by Conjunctive Queries (CQs) and Unions of CQs (UCQs) on top of regex formulas. We show that the lower bounds (NP-completeness and W[1]-hardness) from the relational world also hold in our setting; in particular, hardness hits already single-character text! Yet, the upper bounds from the relational world do not carry over. Unlike the relational world, acyclic CQs, and even gamma-acyclic CQs, are hard to compute. The source of hardness is that it may be intractable to instantiate the relation defined by a regex formula, simply because it has an exponential number of tuples. Yet, we are able to establish general upper bounds. In particular, UCQs can be evaluated with polynomial delay, provided that every CQ has a bounded number of atoms (while unions and projection can be arbitrary). Furthermore, UCQ evaluation is solvable with FPT (Fixed-Parameter Tractable) delay when the parameter is the size of the UCQ

    Shortest Distances as Enumeration Problem

    Full text link
    We investigate the single source shortest distance (SSSD) and all pairs shortest distance (APSD) problems as enumeration problems (on unweighted and integer weighted graphs), meaning that the elements (u,v,d(u,v))(u, v, d(u, v)) -- where uu and vv are vertices with shortest distance d(u,v)d(u, v) -- are produced and listed one by one without repetition. The performance is measured in the RAM model of computation with respect to preprocessing time and delay, i.e., the maximum time that elapses between two consecutive outputs. This point of view reveals that specific types of output (e.g., excluding the non-reachable pairs (u,v,)(u, v, \infty), or excluding the self-distances (u,u,0)(u, u, 0)) and the order of enumeration (e.g., sorted by distance, sorted row-wise with respect to the distance matrix) have a huge impact on the complexity of APSD while they appear to have no effect on SSSD. In particular, we show for APSD that enumeration without output restrictions is possible with delay in the order of the average degree. Excluding non-reachable pairs, or requesting the output to be sorted by distance, increases this delay to the order of the maximum degree. Further, for weighted graphs, a delay in the order of the average degree is also not possible without preprocessing or considering self-distances as output. In contrast, for SSSD we find that a delay in the order of the maximum degree without preprocessing is attainable and unavoidable for any of these requirements.Comment: Updated version adds the study of space complexit

    A Trichotomy for Regular Trail Queries

    Get PDF
    Regular path queries (RPQs) are an essential component of graph query languages. Such queries consider a regular expression r and a directed edge-labeled graph G and search for paths in G for which the sequence of labels is in the language of r. In order to avoid having to consider infinitely many paths, some database engines restrict such paths to be trails, that is, they only consider paths without repeated edges. In this paper we consider the evaluation problem for RPQs under trail semantics, in the case where the expression is fixed. We show that, in this setting, there exists a trichotomy. More precisely, the complexity of RPQ evaluation divides the regular languages into the finite languages, the class T_tract (for which the problem is tractable), and the rest. Interestingly, the tractable class in the trichotomy is larger than for the trichotomy for simple paths, discovered by Bagan et al. [Bagan et al., 2013]. In addition to this trichotomy result, we also study characterizations of the tractable class, its expressivity, the recognition problem, closure properties, and show how the decision problem can be extended to the enumeration problem, which is relevant to practice

    Enumerating Subgraph Instances Using Map-Reduce

    Full text link
    The theme of this paper is how to find all instances of a given "sample" graph in a larger "data graph," using a single round of map-reduce. For the simplest sample graph, the triangle, we improve upon the best known such algorithm. We then examine the general case, considering both the communication cost between mappers and reducers and the total computation cost at the reducers. To minimize communication cost, we exploit the techniques of (Afrati and Ullman, TKDE 2011)for computing multiway joins (evaluating conjunctive queries) in a single map-reduce round. Several methods are shown for translating sample graphs into a union of conjunctive queries with as few queries as possible. We also address the matter of optimizing computation cost. Many serial algorithms are shown to be "convertible," in the sense that it is possible to partition the data graph, explore each partition in a separate reducer, and have the total computation cost at the reducers be of the same order as the computation cost of the serial algorithm.Comment: 37 page

    Dynamic Set Intersection

    Full text link
    Consider the problem of maintaining a family FF of dynamic sets subject to insertions, deletions, and set-intersection reporting queries: given S,SFS,S'\in F, report every member of SSS\cap S' in any order. We show that in the word RAM model, where ww is the word size, given a cap dd on the maximum size of any set, we can support set intersection queries in O(dw/log2w)O(\frac{d}{w/\log^2 w}) expected time, and updates in O(logw)O(\log w) expected time. Using this algorithm we can list all tt triangles of a graph G=(V,E)G=(V,E) in O(m+mαw/log2w+t)O(m+\frac{m\alpha}{w/\log^2 w} +t) expected time, where m=Em=|E| and α\alpha is the arboricity of GG. This improves a 30-year old triangle enumeration algorithm of Chiba and Nishizeki running in O(mα)O(m \alpha) time. We provide an incremental data structure on FF that supports intersection {\em witness} queries, where we only need to find {\em one} eSSe\in S\cap S'. Both queries and insertions take O\paren{\sqrt \frac{N}{w/\log^2 w}} expected time, where N=SFSN=\sum_{S\in F} |S|. Finally, we provide time/space tradeoffs for the fully dynamic set intersection reporting problem. Using MM words of space, each update costs O(MlogN)O(\sqrt {M \log N}) expected time, each reporting query costs O(NlogNMop+1)O(\frac{N\sqrt{\log N}}{\sqrt M}\sqrt{op+1}) expected time where opop is the size of the output, and each witness query costs O(NlogNM+logN)O(\frac{N\sqrt{\log N}}{\sqrt M} + \log N) expected time.Comment: Accepted to WADS 201
    corecore