6 research outputs found
Compressed Subsequence Matching and Packed Tree Coloring
We present a new algorithm for subsequence matching in grammar compressed
strings. Given a grammar of size compressing a string of size and a
pattern string of size over an alphabet of size , our algorithm
uses space and or time. Here
is the word size and is the number of occurrences of the pattern. Our
algorithm uses less space than previous algorithms and is also faster for
occurrences. The algorithm uses a new data structure
that allows us to efficiently find the next occurrence of a given character
after a given position in a compressed string. This data structure in turn is
based on a new data structure for the tree color problem, where the node colors
are packed in bit strings.Comment: To appear at CPM '1
On the size of DASG for multiple texts
We present a left-to-right algorithm building the automaton accepting all subsequences of a given set of strings. We prove that the number of states of this automaton can be quadratic if built on at least two texts
The minimum dawg for all suffixes of a string and its applications
Abstract. For a string w over an alphabet ÎŁ, we consider a composite data structure called the all-suffixes directed acyclic word graph (ASDAWG). ASDAWG(w) has |w | + 1 initial nodes, and the dag induced by all reachable nodes from the k-th initial node conforms with DAWG(w[k:]), where w[k:] denotes the k-th suffix of w. We prove that the size of the minimum ASDAWG(w) (MASDAWG(w)) is Î(|w|) for |ÎŁ | = 1, and is Î(|w | 2) for |ÎŁ | â„ 2. Moreover, we introduce an on-line algorithm which directly constructs MASDAWG(w) for given w, whose running time is linear with respect to its size. We also demonstrate some application problems, beginning-sensitive pattern matching, regionsensitive pattern matching, and VLDC-pattern matching, for which AS-DAWGs are useful.
S.: Discovering best variable-length-donât-care patterns
Abstract. A variable-length-donât-care pattern (VLDC pattern) is an element of set Î =(ÎŁ âȘ{â}) â , where ÎŁ is an alphabet and â is a wildcard matching any string in ÎŁ â. Given two sets of strings, we consider the problem of finding the VLDC pattern that is the most common to one, and the least common to the other. We present a practical algorithm to find such best VLDC patterns exactly, powerfully sped up by pruning heuristics. We introduce two versions of our algorithm: one employs a pattern matching machine (PMM) whereas the other does an index structure called the Wildcard Directed Acyclic Word Graph (WDAWG). In addition, we consider a more generalized problem of finding the best pair â©q, kâȘ, where k is the window size that specifies the length of an occurrence of the VLDC pattern q matching a string w. We present three algorithms solving this problem with pruning heuristics, using the dynamic programming (DP), PMMs and WDAWGs, respectively. Although the two problems are NP-hard, we experimentally show that our algorithms run remarkably fast.