2 research outputs found

    Online Suffix Trees with Counts ∗

    No full text
    We extend Ukkonen’s online suffix tree construction algorithm to support substring frequency queries, by adding count fields to the internal nodes of the tree. This has applications in the field of sequential data compression, for example in the implementation of an efficient PPM-style compression algorithm with unbounded context length. Slightly differently than suggested in [1], we let the count field of a node invariantly equal the number of occurrences of the concatenation of the consecutive edge labels going from the root to that node. We show that under this invariant, due to the onlineness requirement, the algorithm’s worst case time complexity is O(n 2). However, its average case performance is ∼ nlogn under reasonable assumptions, so it may well be an acceptable solution in practice. One major problem that we address is the fact that Ukkonen’s online construction algorithm does not maintain explicit end of string markers in the tree. Consider a suffix tree that corresponds to a string T. If we want to know the number of occurrences of a string u in T, we have to follow a path from the root such that the consecutive edge labels spell out u. Since the tree is path compressed (compact), we may end up somewhere halfway along some edge. Letting v be the remainder of the edge label, we can find the number of occurrences of uv by looking at the count field of the node that corresponds to uv. We know that no copy of u in T has ever been followed by something other than v, or there would be a corresponding branching point in the tree. But since there are no end markers in the tree, there may be a proper prefix w of v such that uw is a suffix of T, in which case u occurred more often than uv. b
    corecore