An Invertible Transform for Efficient String Matching in Labeled Digraphs

Abstract

Let G = (V, E) be a digraph where each vertex is unlabeled, each edge is labeled by a character in some alphabet ?, and any two edges with both the same head and the same tail have different labels. The powerset construction gives a transform of G into a weakly connected digraph G\u27 = (V\u27, E\u27) that enables solving the decision problem of whether there is a walk in G matching an arbitrarily long query string q in time linear in |q| and independent of |E| and |V|. We show G is uniquely determined by G\u27 when for every v_? ? V, there is some distinct string s_? on ? such that v_? is the origin of a closed walk in G matching s_?, and no other walk in G matches s_? unless it starts and ends at v_?. We then exploit this invertibility condition to strategically alter any G so its transform G\u27 enables retrieval of all t terminal vertices of walks in the unaltered G matching q in O(|q| + t log |V|) time. We conclude by proposing two defining properties of a class of transforms that includes the Burrows-Wheeler transform and the transform presented here

    Similar works