Abstract. Many string manipulations can be performed efficiently on suffix trees. In this paper a CRCW parallel RAM algorithm is presented that constructs the suffix tree associated with a string of n symbols in O(log n) time with n processors. The algorithm requires O(n 2) space. However, the space needed can be reduced to O(n 1+~) for any 0< e ~-1, with a corresponding slow-down proportional to 1/e. Efficient parallel procedures are also given for some string problems that can be solved with suffix trees.
The main problem addressed in this paper is the parallel construction of the suffix tree Tx associated with input string x. For fixed alphabet size, the sequential algorithms in and construct Tx in linear time. The time bound becomes O(n logllI) if the alphabet size is not a constant. Suffix trees and their companion structures support many string manipulations, such as performing on-line string matching , finding the longest repeated substring in a string, testing square-freedom of a string , , finding all the squares or repetitions in a string , computing substring statistics with or without overlap , , and performing exact or approximate pattern matching. A more detailed list of applications is given in lAp-85]. In the context of parallel computation, various open problems revolve around Tx . The only previous parallel algorithm for constructing suffix trees is given in . It runs in time O(log n) and uses n2/log n processors.
We adopt the concurrent-read concurrent-write (CRCW) parallel random access machine (PRAM) model of computation. We use n processors which can simultaneously read from and write to a common memory with O(n z) locations. In case several processors seek access to the same memory location for write purposes, one of them succeeds but we do not know in advance which. See for a survey of results concerning PRAMs. The overall processors • time cost of our algorithm is O(n log n), which is optimal when loglI I is of the same order of magnitude as log n. Although the algorithm requires quadratic space, only O(n log n) locations need initialization. Moreover, we show later that the space can be reduced to O(n~+~), for any chosen 0< e -< 1, with a corresponding slow-down proportional to 1/e. Our approach to the construction of Tx consists of two main parts. In the first part, describhd in Section 2, an approximate version of the tree is built, called the skeleton. This part of the construction is reminiscent of an early approach to subquadratic pattern matching . The second part, described in Section 3, consists of refining the skeleton to transform it into T~. The processor allocation technique that is used for the refinement is of independent interest. Allocating processors to jobs is often a crucial task in the design of efficient parallel algorithms, and there are papers mainly devoted to overcoming allocation problems. For example, solved the allocation problem in the algorithm of for finding the maximum among n elements, and solved the allocation problem in the algorithm of for merging. , [CV86b] , and [Vi-84] gave deterministic and randomized allocation schemes for list ranking.
Section 4 contains a brief analysis of the various allocation techniques that can be used for a suffix tree. In Section 5 we show how the space used in our construction can be reduced. Finally, we describe in Section 6 how our suffix tree construction leads to the design of efficient parallel algorithms for on-line string matching, finding a longest repeated substring in a string, and performing approximate pattern matching.
2. Constructing the Skeleton Tree. From now on we will assume without loss of generality that n is a power of 2. We also extend x by appending to it n -1 instances of the symbol #. We use x# to refer to this modified string. We now list some salient features of the skeleton tree Dx of x, and then give a constructive definition of Dx. The basic structure of the skeleton for the string of Figure 1 is shown in Figure 2 . The skeleton Dx of x is a tree with n leaves. Each internal node of Dx has at least two children. The edges in Dx point from each node to its parent. Each leaf or internal node of D~ is labeled with the descriptor of some substring of x# having starting positions in [1, n] . If node /~ is labeled with descriptor (i, 1), then I = 2 q for some q, 0-q ---log n. If Ix is a leaf then 1 = n. If /x is an internal node other than the root, then q is the stagenumber of/~. If the label of/x corresponds to substring w of x, then we write w --W(/x), and we call /x the locus of w. A constructive definition for Dx is as follows:
(i) The root of D~ is the locus of the empty word. The root has [I I sons, each one being the locus of a distinct symbol of L (ii) Assume that all nodes of stagenumber up to l-1 >-0 have been inserted in Dx. To expand Dx to stagenumber l ~ log n, consider the nodes of stagenumbet 1-1 one by one. For a generic such node/x, let w = W(/~). Now dothe following: 1. If w = z# for some string z over I, then make/x the (unique) leaf labeled (i, n), where i is the first component of the old label of/.~. 2. Assume instead that w cannot be written as z# for some string z over I.
Let {sl, s2,..., Sk} be a set of maximum cardinality among the sets formed by distinct substrings of x# with the properties: [s, I = 2]w I and w is a prefix of s,, t = 1,2,..., k. (Thus, if i is the starting position of an Fig. 2 . Basic structure of the skeleton tree D~ for the string of Figure 1 . Solid points are used to mark nonbranching nodes. Such nodes are introduced while constructing D~, but they are also removed during the construction. Node labels are not reported in the figure. occurrence of w in x#, then there is some st also starting at i. In the string of Figure 1 , for example, we have that each occurrence of w = ab in x# extends into either s~ = abaa, or s2 = aba#, or s3 = abab. On the other hand, w = aa occurs in x# only as a prefix of Sl = aaba. Note that, in general, an s, may occur more than once in x#.) We distinguish two cases. (a) k > 1. We create k sons of/x, vl, v2 .... , /)k, and make v, the locus of st, t = 1, 2,..., k. (b) k = 1, i.e., w occurs always as a prefix of the same substring s~. We make/., the locus of sl.
Observe that no two nodes of Dx can have the same label. A natural parallel construction of Dr is based on the above definition. We describe such a construction in detail, to acquaint the reader with the basic concurrent steps which are used throughout this paper.
We use n processors Pl,P2, ... ,pn, where i is the serial number of processor Pi. At the beginning, processor pi is assigned to the ith position of x, i = 1, 2 .... , n.
It is convenient to think of each processor as being assigned two segments of the common memory, each segment consisting of log n + 1 cells. The segments assigned to Pi are called IDi and NODE, respectively. By the end of the computation, IDa [q] (i = 1, 2,..., n; q =0, 1,..., log n) contains (the first component of) a descriptor for the substring of x# of length 20 which starts at position i in x#, with the constraint that all the occurrences of the same substring of x get the same descriptor. If, for some value of q <log n, NODE~ [q] is not empty, then it represents a node/z of stagenumber q in Dx, as follows: the field NODE~ [q] .LABEL is a replica of IDi [q] , and the field NODE~ [q] .PARENT points to the location of the parent of/z. Finally, NODEi[log n] stores the leaf labeled (i, n) and thus is nonempty for i = 1, 2,..., n. For convenience, we extend the notion of ID to all positions i > n through the convention: IDa[q] = n + 1 for i > n. The computation makes crucial use of a bulletin board (BB) of n x (n + 1) locations in the common memory. All processors can simultaneously write to BB and simultaneously read from it. We use the following concurrent-write convention. In case several processors try simultaneously to write into the same memory location, one of them succeeds but we do not know in advance which. In the following we call winner(i) the index of the processor which succeeds in writing to the location of the common memory attempted by p~.
Procedure Skeleton-Tree takes as input the string x and a location of the common memory called ROOT, and computes the entries of the arrays NODEi [ q] , ID~[q] (i= 1,2,..., n, q =0, 1,...,log n). The procedure consists of some initializations, that implement point (i) in the definition of Dx, and log n main iterations, implementing point (ii).
The initializations are as follows. In parallel, all processors initialize their NODE and ID arrays. Next, processors facing the same symbol of I attempt to write their serial number in the same location of BB. Say, if x~ = s ~ I, processor p~ attempts to write i in BB [1, s] . The correctness of the procedure follows by straightforward induction. Since no two n-symbol substrings of x# are identical, processor p~ (i = 1, 2,..., n) must be occupying the "leaf" NODE~ [Iog n] at the end of the computation. The time complexity is obviously O(log n). Note that NODE~ [q] .LABEL not empty implies NODE~ [q] .LABEL = (i, 2q) , that is, the label of a node, when defined, is nothing but the address of that node. Although the LABEL fields are entirely redundant so far, assuming this node format from the start simplifies the rest of our presentation. Finally, we remark that BB need not be initialized.
3. Refining Dx. By the end of the construction of Dx, processor Pi will be occupying leaf i, i--1, 2,..., n. Prior to starting the transformation of Dx into Tx, the labels of all nodes of Dx have to be modified as follows. Recall that the current LABEL of a node g is a starting position of W(/.,) in x# which is also the address of this node. The modified label (m-label) to be constructed for g is any pair (i, l) such that, letting W(/~) = W(parent(t.Q) 9 w, it is l = Iw] and i is the starting position of an occurrence of w in x#. In the following, we call the m-labeled skeleton the tree that is obtained by substituting every label of Dx with a consistent m-label. Setting aside the orientation of edges, the main difference between Tx and the m-labeled skeleton Dx is that in Tx there cannot be two sibling nodes such that their labels describe two substrings of x having a common prefix (i.e., D~ is not a trie). However, the m-labeled D~ shares with T~ the properties (1), (3), and (4) listed in defining the latter, provided x# is used there in the place of x.
A processor can trivially compute the m-label of/~ in constant time knowing the LABEL of/~, and the stagenumbers, say q and q', of g and parent(l~), respectively. Formally, ifj is the LABEL of/x, then (j + 2 q', 2 q -2 a') is the m-label of g. The n processors can produce all m-labels in log n parallel steps. Using the parent pointers, the processors migrate toward ROOT with a synchronous pace based on stagenumbers: the m-labels of all children of nodes with the same stagenumber are computed at the same time. (Recall that the difference in stagenumber between a node and its parent is not necessarily 1.) At the beginning, all processors occupying leaves which are children of nodes of stagenumber log n-1 change the labels of these nodes into m-labels. Next, the processors compete for the common parent node, say, by attempting to simultaneously write on it the labels (addresses) of the nodes which they currently occupy. The winners are marked "free": they ascend to the parent node where they will perform the necessary label adjustment at the appropriate stage. The losers simply take a record of the (old) label used by the winner. The (q -1)th iteration involves all free processors on nodes with a stagenumber of q or higher. The operation is the same as above.
A by-product of the m-label construction process is a mapping that assigns some leaves and internal nodes to processors in such a way that the following property is met. PROPERTY 1. If a node other than ROOT has k children, then precisely k-1 of the children have been assigned a processor. Moreover, each one of the k-1 processors knows the address of the unique sibling without a processor.
The proof of Property 1 is straightforward. Let now (i, !) and (L m) be the m-labels of two sibling nodes/z and z, of Dx, and let q be the stagenumber of parent(lz ) = parent(u). Assuming a fixed-size alphabet, the transformation of the m-labeled Dx into Tx is carried out in two steps. First, a tree is produced that is identical to Tx save the fact that all edges are directed upward, as in Dx. Next, the directions of all edges are reversed.
The first and more important step is actuated by producing log n -1 consecutive refinements of Dx = D (~~
The qth such refinement is denoted by D (l~
is a labeled tree with n leaves and no unary nodes which has much the same structure of the m-labeled Dx. In particular, properties (1), (3), and (4) of the definition of Tx hold for any refinement of D~. The refinement D (~ is identical to T~ except for the edge directions. Figure 3 shows the second refinement for our example skeleton. We now give rigorous definitions for D (~~ q = 1, 2,..., log n -1. We do so by specifying how D ~176 is obtained from D ~176 for q= 1, 2,..., log n-1. For simplicity, we use k henceforth to denote log n-q-1. First, two more definitions are needed. A nest is any set formed by all children of some node in D (k). Let (i, l) Assume now that all refinements down to D (k), log n -1 -k < 0, have already been produced, and that D ~k) meets the following condition(k):
is a labeled tree with n leaves and no unary nodes.
(ii) O (k) enjoys properties (1), (3), and (4) of the definition of Tx. (iii) D (k) is labeled in such a way that no pair of labels of nodes in the same nest admits a refiner of size 2 k.
Observe that condition(log n -1) is met trivially by Dx. Moreover, part (iii) of condition (0) implies that reversing the direction of all edges of D <~ would change D ~~ into a digital-search tree that stores the collection of all suffixes of x. Clearly, such a trie fulfills precisely the definition of Tx.
We now define D (k-l) as the tree obtained by transforming D (k~ as follows. The manipulations that transform D (k) into D ~k-1) are performed synchronously on all and only the eligible nests of D (k), i.e., on those nests that might admit a refiner of size 2 (k-l). Clearly, the only eligible nests in D~ are those whose parent nodes have stagenumber log n -1. There is only one such nest in the skeleton of Figure 2 , namely, that formed by leaves 1 and 9 (however, this nest does not have a refiner of size 2 (l~ ~-n/4). The nests of nodes whose parents have stagenumber log n -2 become eligible at the inception of the second refining stage (see Figure 3) , and so on.
Assume that, in D (k), all nodes that are parents of currently eligible nests are suitably marked. Let (il, l~), (i2, 12), ..., (i, , , Ira) be the set of all labels in some eligible nest of D ~k). Let v be the parent node of that nest. The nest is refined in two steps.
Step I. (1) Create a new parent node/x for the nodes in that class, and make/z a son of v.
(2) Set the LABEL of /z to (i, 2Ck-~)), where i is the first component of the split-label of all nodes in the class.
(3) Consider each child of/x. For the child whose current LABEL is (/j,/j),
Step 2. If more than one class resulted from the partition, then stop. Otherwise, let C be the unique class resulting from the partition. It follows from assumption (i) on D (k~ that C cannot be a singleton class. Thus a new parent node /z as above was created for the nodes in C during step 1. Make/z a child of the parent of v and set the LABEL of/~ to (i,/+2~k-1)), where (i, l) is the label of v. Assume now that, in D (k), both the nest of u and that of parent(v) are eligible.
We claim that, in/5 (k~, either the parent of u has not changed and it is a branching node, or it has changed but is still a branching node. Indeed, by definition of In order to specify which nests of D ck-~) are eligible, we need to complete the rules for eligibility. In the light of the preceding discussion, it is easy to see that, once a node has become the parent of an eligible nest, it will not lose this property through the subsequent refinements (as long as it is not eliminated from the tree), even though the nest itself may undergo changes. Moreover, the nests of nodes created in producing D ~k-l) are eligible for the transition from D ~k-~) to D ck-2).
If the nest of D Ck) rooted at v had a row R of BB all to itself, then the transformation undergone by this nest in step 1 can be accomplished by m processors in constant time, m being the number of children. Each processor handles one child node. It generates the split-label for that node using its LABEL and the ID tables. Next, the processors use the row of BB assigned to the nest and the split-labels to partition themselves into equivalence classes: each processor in the nest whose split-label has first component i competes to write the address of its node in the ith location of R. A representative processor is elected for each class in this way. Singleton classes can be trivially spotted through a second concurrent write restricted to losingprocessors (after this second write, a representative processor which still reads its node address in R knows itself to be in a singleton class). The representatives of each nonsingleton class now create the new parent nodes, label them with the first component of their split-label, and make each new node accessible by all other processors in the class. To conclude step 1, the processors in the same class update the labels of their nodes.
For step 2, the existence of more than one equivalence class needs to be tested. This is done through a competition of the representatives which uses the root of the nest as a common write location, and follows the same mechanism as in the construction of D~. If only one equivalence class was produced in step I, then its representative performs the adjustment of the label prescribed by step 2.
The above discussion suggests that, once each node of, say, Dx = D ~176 is assigned to a distinct processor, D ~176 could be produced in constant time. The difficulty, however, is how to assign the nodes (notably, the newly inserted ones) of D (~~ in constant time. It turns out that bringing fewer processors into the game leads to a crisp (re-)assignment strategy.
By definition, D (g~ does not have unary nodes. It is seen then that the manipulations of steps 1 and 2 can be operated in constant time by assigning m-1 processors, rather than m to a nest of m nodes. The only additional assumption to be made is that, at the beginning, all m -1 processors have access to the unique node which lacks a processor of its own. Before starting step 1, the processors elect one of them to serve as a substitute for the missing processor. After each elementary step, this simulator "catches-up" with the others.
In view of Property 1, this shows that n processors can achieve the first refinement of Dx. As to the assignment of the rows of BB to the nodes of D (k~, simply assign the ith row to processor Pi. Then, whenever p; is in charge of the simulation of the missing processor in a nest, its BB row is used by all processors in that nest.
For any given value of k, let a legal assignment of processors to the nodes of O (k) be an assignment that enjoys Property 1.
THEOREM 2. Given a legal assignment of processors for D (k), a legal assignment of processors for D Ck-D can be produced in constant time.
PROOF. We give first a constant-time policy that reaUocates the processors in each nest of O (k) on the nodes of/)(k). We then show that our policy leads to a legal assignment for D ck-~.
Let v be the parent of a nest of D (k).
A node to which a processor has been assigned is called pebbled. By hypothesis, all but one of the children of v are pebbled. Also, all children of v are nodes of/5 (k). In the general case, some of the children of v in O (•) are still children of v in /5 ~k~, while others became children of newly inserted nodes/x~,/z2 .... ,/z,. Our policy is as follows. At the end of step 1, for each node ~r of j~(k) such that all children of/zr are pebbled, one pebble (say, the representative processor) is chosen among the children and passed on to the parent. In step 2, whenever a pebbled node v is removed, then its pebble is passed down to the (unique) son/~ of v in /)Ck~. Clearly, our policy can be implemented in constant time. To prove its correctness, we need to show that it generates a legal assignment for D (k-~. It is easy to see that if node v is removed in the transition from/)(k~ to O (k-n, then the unique son /z of v in /~(k) is unpebbled in /)(k~. Thus, in step 2, it can never happen that two pebbles are moved onto the same node of D (k-l).
By definition of D (k), the nest of node u cannot give rise to a singleton class.
Thus at the end of step 1, either (Case 1) the nest has been refined in only one (nonsingleton) class, or (Case 2) it has been refined in more than one class, some of which are possibly singleton classes. Before analyzing these two cases, define a mapping f from the children in the nest of the generic node 9 of D ck) into nodes of D ok-l) as follows. If node/z is in the nest of 9 and also in D ~k-~) then set/z' =f(/z)=/z; if instead/z is not in D ~k-l), let/z'=f(/z) be the (unique) son of/~ in/5 (k).
In Case 1, exactly one node ~ is unpebbled in /5 ~k). All the nodes /z"s are siblings in D ck-~) and, by our policy,/x' is pebbled in
In Case 2, node 9 is in D (k-l). Any node/~ in the nest of 9 is in/5 (k). At the end of step 2, the pebble of node/~ will go untouched unless/~ is in a nonsingleton equivalence class. Each such class generates a new parent node, and a class passes a pebble on to that node only if all the nodes in the class were pebbled. it. Such labels supply the branching information needed in the course of a downward search in Tx of a string w. We examine two different ways of defining such information. More precisely, let (i, I) be the label of the upward edge (/z, 9). One way is to label the matched downward edge (9, tz) with the symbol of I that corresponds to xi. This entails that the branching decision at each node be driven by the symbol that occupies a certain position of w. The second way is to use the value of IDi [0] . To use this information during a search, an auxiliary table must have been precomputed that maps each symbol of I into its corresponding
ID.
In either case, the set of downward labels of each internal node of Tx can be stored using a linear list, a binary trie, or an array. I log[I I ) extends also to the list implementation of the symbol-based downward labels. We describe below the trie implementation of symbol-labels and the array implementation of/D-labels, since all the others can be derived from one of these two quite easily.
We show how to implement symbol-based downward labels with tries, i.e., how to replace each original internal node of D (~ with a binary trie indexing to a suitable subset of L This transformation can be obtained in O(logtlI) time using the legal assignment of processors that holds on D (~ at completion. We outline the basic mechanism and leave the details as an exercise. We simply perform log[I[ i'urther refinements of D (m, for which the ID tables are not needed.
In fact, the best descriptor for a string of log[l[ bits or less is the string itself. Thus, we let the processors in each nest partition their associated nodes into finer and finer equivalence classes, based on the bit-by-bit inspection of their respective symbols. Clearly, a processor occupying a node with upward label (i, l) will use symbol x; in this process. Whenever a new branching node v is created, one of the processors in the current nest of v climbs to /z =father(v) and assigns the appropriate downward label to/z. At the end, the processors assign downward labels to the ultimate fathers of the nodes in the nest.
Finally, we discuss the array implementation of/D-based downward labels. This representation' is needed in Section 6. We assign a vector of size n, called 5. Reducing the Space. Both the preparation of Dx and its subsequent refinements need O(n 2) space. Procedure Skeleton-Tree needs O(n 2) space due to the array BB, which is used at each iteration q to partition the composite labels (TIDs) into equivalence classes. In any refining stage, the nest of each node v needs a distinct array of n locations for partitioning the split-labels of the nodes in the nest into equivalence classes. In this section we show that both problems can be solved using only O(n ~+~) space, for any 0< e-< 1, at the expense of a corresponding slow-down proportional to 1/e.
We analyze the procedure Skeleton-Tree first. Consider some substring w of x of length 2 q, with q> 0, and let w = w~w2 with Iwd = Iw21 = 2q-'. Let N 1 and N2 be the IDs assigned by the procedure to w~ and w2, respectively. Recall that each of N~, N2 is an integer between 1 and n. The difficulty in creating the ID for w is that the pair (N1, N2) may assume n 2 values.
We show how to solve this problem using only O(n ~ § space. We assume for simplicity that n ~ is an integer, but it is easy to generalize our solution to the cases where n ~ is not an integer. We focus on computing the 119 of the string w above. The same manipulations are performed in parallel for all substrings of x of length 2q. The idea is to express N2 by its representation in the base nL The coefficients (a~, a2,..., a~/~) (least-significant The output of subiteration 8 is an ID for the pair consisting of the left substring and the 8-tuple (al,..., as) . This ID is a number between 1 and n.
The concurrent-write contests that take place within any subiteration of iteration q of the Skeleton-Tree procedure are similar to the original ones. The only difference is that now an auxiliary array of size (n + 1) x n ~ suffices. Detaiis are left to the reader. For any fixed 0 < e -< 1 the total space requirements are bounded by O(n ~+~) and the running time by O((1/e) log n) = O(log n).
Our space reduction technique extends easily to the refining stages. We outline the main changes and omit the tedious details. With reference to the generic intermediate tree D <k), we focus on the processors that handle the nest of some node u. Recall that, in order to refine this nest, the processors partition their underlying nodes into equivalence classes, according to the first component of the split-labels. For this purpose, a row of BB was used in our original construction, namely, the row assigned to the representative processor of the nest. Assume instead that processor Pl, i = 1, ..., n, is assigned only an array LITTLE-BBi consisting of n ~ locations of the common memory, and let pj be the representative of the nest of u. We perform the partition of the nest in 1/e subiterations as follows. First, all processors in the nest compute the representation of the first component of their split-labels in the base nL There are n ~ possible values for the first coefficient of this representation. Thus, the processors in the nest can partition themselves in n ~ classes through a concurrent-write contest on LITTLE-BB). In this way, each class elects a representative processor. The LITTLE-BB arrays associated with these representatives is similarly used to obtain a second refinement of the classes. This refinement is based on the second coefficients in the representations of the split-labels in base n ~. It should be clear how to proceed with the remaining 1/e-2 subiterations. For any fixed 0 < e-< 1 the total space requirements are bounded by O(n l+~) and the running time by O((1/e) log n) = O(log n).
If the suffix tree is implemented by OUT~ vectors, as needed in the next section, it would require O(n2) space. However, we can reduce the space to 0((1/e) n 1+ ~) = O(n 1 § ising the ideas of the space reduction described above.
6. Applications. In this section we describe some applications of our parallel suffix tree construction in the design of efficient parallel algorithms. Preprocessing. Construct the suffix tree of x#. In the course of the computation We save~
(1) The log n BBs used in log n iterations of the procedure Skeleton-Tree. The computation of this step takes O(log n) time using n processors.
On-line Processing of the Queries.
Step 1. Recall that in Section 2 we computed IDi[q] (i = 1 .... , n; q = 0,..., log n) for the string x#. Step 3. We find a node v in the suffix tree such that z is a prefix of W(v) ( iterations is that v is a node in D ~q+l) and z' is a substring whose length is less than 2 q+l. Our goal is to check whether z' follows an occurrence of W(v). We work on D (q). There are two possibilities:
1. The node v appears in D (q). Possibility 1 has two subpossibilities:
1.1. 2 q is larger than the length of z'. In this case we do nothing and the input parameters of the present iteration become the input parameters of the next iteration. 1.2. 2 q is less than or equal to the length of z'. Assume that z' starts at position j ofz and b is the value stored in PIDj [q] . If the entry OUT~ [b] is empty then z does not occur in x. Otherwise, the input parameters of the next iteration will be the suffix of z' starting at position 2 q + 1 and the node pointed to by OUT~ [b] . 2. The node v does not appear in D ~q). This means that v had only one son in /)(q+l) and so it was omitted from D (q) (in step 2 of refining /5(q+1)). Let/z be the single son of v in /)(q+~). Possibility 2 has two subpossibilities: 2.1. 2 q is larger than the length of z'. Assume that the LABEL of # in D Cq) is (i,l). In this case z' occurs in x if and only if z' is a prefix of xi+1-2%~,..., xi+l. We check this letter by letter in log m time using m/log m processors.
2.2. 2 q is less or equal to the length of z'. We compare ID~+l_2,+~ [q] (the unique name of x~+1-2',+1, 9 9 9 xs+~) to the unique name of the prefix of z' whose length is 2 q. If these names are different then z does not occur in x. Otherwise, the input parameters of the next iteration will be the suffix of z' starting at position 2 q -F 1 and the node/z.
Remarks. (a)
We did not initialize the vectors OUT~, therefore it could be that we will get a wrong positive answer. To avoid mistakes, every time we get a positive answer we explicitly check whether z really appears in x at the position given in the answer. This can be done in [log mJ time using m/log m processors as a last step.
(b) The on-line computation can be extended to obtain additional information about z. For example:
(1) What is the number of occurrences of z in x? (2) In case there is more than one occurrence, what is the starting position of the first (or last or all) occurrence(s) of z in x? (3) What is the longest prefix of z which occurs in x?
Complexity. The preprocessing takes O(log n) time using n processors. Answering a query takes O(log m) time using m/log m processors. PROBLEM 2. Finding the longest repeated substring in a string. Given a string x find the longest substring which occurs in x more than once.
SOLUTION. W(1.,) is defined in Section 2. Let IW(v)l be the length of W(v).
Step 1. Construct the suffix tree of x# and find IW(v)} of each node v.
Step 2. Find the internal node v with the maximum I W(v)l field. The substring represented by the path from the root to v is the longest repeated substring in x.
Step 2 can be carried out using the parallel algorithm for finding the maximum given in .
Complexity.
Step 1 takes O(log n) time using n processors.
Step 2 takes O(log log n) time using n/log log n processors. PROBLEM 3. Approximate string matching. Suppose a string x, a pattern z, and a parameter k are given. (Let n (resp. m) be the length of x (resp. z).) Find occurrences of z in x with at most k differences. We distinguish three types of differences:
(a) A letter (b) A letter (c) A letter in z corresponds to a different letter in x. in z corresponds to "no letter" in x. in x corresponds to "no letter" in z.
SOLUTION. gave both a serial and a parallel algorithm for the problem. This paper enables us to design an alternative parallel algorithm which essentially consists of parallelizing the serial algorithm of . The alternative parallel algorithm is based on both the parallel prefix tree construction, of this paper, and parallel algorithm for answering Lowest Common Ancestor (LCA) queries of . In order to keep this presentation short we refrain from describing this alternative algorithm in detail. This alternative parallel algorithm for the approximate string matching problem runs in time O(k+log n) using n+m processors. Note that the parallel algorithm of [LV-86] consists of two parts: (1) analysis of the pattern and (2) analysis of the text. Part 1 runs in O(log m) time using m z processors. Part 2 runs in O(k+log m) time using n processors. So, comparison of the performance of these two parallel algorithms depends on the relative values of n and m and also on whether the pattern is given in advance for preprocessing.
