Parallel Log-time Construction of Suffix Trees by Apostolico, Alberto & Iliopoulos, Costas
Purdue University 
Purdue e-Pubs 
Department of Computer Science Technical 
Reports Department of Computer Science 
1986 





Apostolico, Alberto and Iliopoulos, Costas, "Parallel Log-time Construction of Suffix Trees" (1986). 
Department of Computer Science Technical Reports. Paper 549. 
https://docs.lib.purdue.edu/cstech/549 
This document has been made available through Purdue e-Pubs, a service of the Purdue University Libraries. 






Parallel Log-time Construction of Suffix Trees
Albeno Apostolico
Department of Computer Sciences
Purdue University






Many string matching applications are based on suffix trees. Linear
time sequential algorithms are available for constructing such trees. A
CReW Parnllel RAM algorithm is presented here which takes 0 (logn)
time with n processors, n being the length of the input string. The algo-
rithm requires ern2) space, but only 0 (n iogn) cells need to be initialized.
- 2-
1. INTRODUCTION
Let x be a string of n = Ix I symbols from a finite alphabet I and # a special symbol
not in I. Given a substring w of x. a descripTor of w is any pair (i. Iwi) such that i is
the starting position in x of an occurrence of w. The suffix tree Tz associated with x is
the nie (digital search tree) with n leaves and at most n-l internal nodes such that: (1)
each arc is labeled with a descriptor of some substring ofx#. (2) each leaf is labeled with
a position of.r and (3) the concatenation of the labels on the path from !.he root to leaf i
describe Ihe suffix of x# starting at position i (see Fig. 1 for an example). In practice, the
label of the arc connecting node Il to its parent node is stored in Il.
For fixed alphabet size, the sequential algorithm in [MC] constructs T
x
in linear
time. The time bound becomes 0 en logn) if the alphabet size is not a conslaIlt. Refer-
ences to alternative equivalent structures and constructions as well as [0 several applica-
tions can be found in [AA]. In the context of parallel computation, various open prob-
lems revolve around Tx [GA].
Here we address the problem of constructing Tx in parallel, within the following
CRCW PRAM model of computation. We use n processors which can simultaneously
read from and write to a common memory with e(n 2) locations. The overall
processorsxtime cost of our algorithm is O(nlogn), which is optimal for unbounded
alphabers. Although the algorithm requires quadratic space, only 0 (n logn) locations
need initialization. Our approach consists of two main parts. In the firSt part, an approxi-
mate version of the l:ree is built, called. the skeleton. This part of the construction is rem-
iniscent of an early approach to subquadratic pattern matching [KMR). The second part




2. CONSTRUCTING THE SKELETON TREE
From now on. we will assume w.l.o.g. that n is a power of 2. We also extend x by
appending to it n-1 instances of symbol #. We still use xii to refer to this modified
string. The skeleton Dx of x is a tree with n leaves and each internal node of which has
at least two children. The links in Dx point from each node to its parent Each leaf or
internal node of Dx is labeled with the descriptor of some substring of x# having starting
positions in [I.n]. If vertex J.L is iabeied with descriptor (i ,I). then 1=2q for some
q • 05</ ,;jogn. If J.L is a leaf then 1=n. If J.L is an internal node other than the root, then q is
the sragenwnber of jl. If the label of ~ corresponds to substring w of x . then we write
w=WO.J.), and we call jl the locus of w. Figure 2 shows the skeleton for the string of Fig.
1. A constructive definition for Dx is as follows.
(i) The root of D;r; is the locus of the empty word. The root has IJ I sons, each one
being the locus of a distinct symbol of I .
(ii) Assume that all nodes of stagenumber up to 1-1~O have been inserted in D
z
. To
expand Dz to sragenumber I sJogn. consider the nodes of sragenumber I-lone by
one. For the generic such node Jl., let w:;;:W (j..L). Now do the following.
1 If w:;;:z# then make Jl the (unique) leaf labeled (i,n). where i is the first component
of the old label of J.L.
2 If W'#Z# and there are k~2 distinct substrings sl.sZ•... ,sk of x# all of which have w
as a prefix and such that Is, 1=21 w 1.1=1 ,2 k. then create k sons of J.L.
v 1,v2' ... ,Vb and make VI the locus of sf> /:;;:1,2, .k. Otherwise (i.e., if w:;t:z# but
k=I). then make J.L the locus of SI'
-4-
Observe that no two nodes of Dx can have the same label. The parallel construction
of D;c is an easy task. We describe it in detail so as to acquaint the reader with the basic
concurrent steps which are used throughout this paper.
We use n processors p 1,p2•... ,pn > where i is the serial number of processor Pi' At
the beginning, processor Pi is assigned to the i -th position of x, i =l,2, ...,n. It is con-
venient to think: of each processor as being assigned. two segments of the cornmon
memory, each segment consisting of logn+l cells. The segments assigned to Pi are
called IDi , and NODEi , respectively. By the end of the computation, IDi[q]
(i=1,2•...n; q =0,1 ,...•logn) contains (the first component of) a descriptor for the substring
of :dt of length 2q which Stans at position i in x#. with the constraint that all the
occurrences of the same substring of x get the same descriptor. If, for some value of
q <logn. NODE j [q] is not empty, then it represents a node J1. of stagenumber q in D
x
' as
follows: the field NODEi[q]LABEL is a replica of IDi[q], and the field
NODEi[q]·PARENT points to the location of the parent of ~ Finally, NODEi[logn]
stores the leaf labeled (i ,n) and thus is nonempty for i =1,2•...•n. For convenience. we
extend the notion of ID to all positions i >n through the convention: ID j [q]=n+l for
i >n. The computation makes crucial use of a bulletin board (BB) of n Xn locations in
the common memory. All processors can simultaneously write to BB and simultaneously
read from it. If more than one processor tries to write [0 the same location, only one
succeeds. In the following, we call winner (i) the index of the processor which succeeds
in writing to the location of the common memory attempted by Pi.
- 5-
Procedure Skeleton-Tree
input: x ; the ROOT ofDx •
output: NODEi [q]; lDi[q]; (i=I.2•...•n. q=O.I,....logn).
begin
Each processor initializes irs NODE and ID array. Next, processors facing the
same symbol of I attempt [0 write their serial number in the same location of BB .
Say, if xj=sel. processor Pi attempts to write i in BB [1,s]. TIuough a second
reading from the same location, Pi reads j=winner(i) and sets ID.-[O]i-j. (Thus
U,I) becomes the descriptor for every occurrence of symbol s). For all i such that
winner (i)=i • processor Pi sets NODEi[O].PARENT f- ROOT and copies
lD i [OJ=i into NODEi [O]LABEL. Hence NODEi[O] becomes the locus of s.
for q=O to logn-l pardo
begin
Processor Pi creates a composite label TID j • as follows.
TIDi f- (lDi [q].ID i+2.[q]).
Processor Pi attempts to write i in BB[TIDi ] = BB[IDi [q]JDi+2.[q]].
Thus all processors with the same TID attempt to write in the same loca-
tion of BB. The processors read: If BB [TIDj]=i. then Pj sels IDj [q+1] to
i. Formally: lDi [q +1] f-winner (i) ,i=1.2.....n.
The processors that are not winners become idle for the remainder of the
stage. The successful processors now first create new locuses in their asso-
ciated NODE locations. Whenever a node ~ is created that has no
siblings, then the pointer from parent(~) is removed and copied into ~.
This avoids the formation of chains of unary nodes. The condition that a
node has no siblings can be checked easily, as explained below. Formally,
each successful processor Pi performs the following.
NODEi [q+1].PARENT f- NODEJD;[qj[q]
NODEi [q+1]LABEL f- IDi [q+1]







The existence of the siblings can be checked as follows. Assume that for each row r
of BB , there is a distinct memory location, say AUX [r], known to all processors. At each
srage, there are siblings iff two or more successful processors write to different locations
of the same row of BB . To find out whether this is the case, all successful processors
- 6-
writing in the same row r of BB attempt to write their index in AUX[r]. Next, all the
processors in that row except the winner write a special marker in AUX [r]. Finally, all
the processors in the same row check the status of AUX[r]. Clearly, processor Pi was the
only successful processor in row r iff, at the time of checking, AUK [r ]==i .
The correctness of the procedure follows by straightfoIV.fard induction. Since no two
n -symbol subscrings ofx# are identical, processor Pi (i =1,2•...•n) must be occupying the
"leaf' NODE j [logn] at the end of the computation. The time complexity is obviously
O(1ogn). Note tha' NODEi [q]1ABEL not empty implies NODEi [q]1ABEL = Ci.2<).
that is, the label of a node, when defined, is nothing but the address of that node.
Although the LABEL fields are entirely redundant so far, assuming this node format from
the start simplifies the rest of our presemation. Finally I we remark that BB needs not to
be initialized.
3. REFINING Dx
In this section, we assume for simplicity that 111=2. We will show later that our
solution can be extended to alphabets of arbitrary size with no substantial penalty.
By the end of the construction of Dx ' processor Pi will be occupying leaf
i, i=l,2, ... ,n. Prior to Starting the transfonnation of Dx in Tx ' the labels of all vertices of
Dx have to be modified as follows. Recall that the current LABEL of a node ~ is a startw
ing position of W (~) in x# and also the address of this node. The modified iabel (mlabel)
to be constructed for ~ is any pair (i,l) such that. letting W(~)=W(parenl(~))·w. it is
/ =Iw I and i is the starting position of an occurrence of w in x# . Set aside the orienta-
tion of links, the main difference between Tx and the m-Iabeled skeleton Dx is that in Tx
-7 -
there cannot be two sibling nodes such that their labels describe two substrings of x hav-
ing a common prefix (i.e., D:;r; is not a trie). However, the m-Iabeled Dx shares with Tx
the properties (1-3) listed in defin1ng the latter.
A processor can trivially compute the mlabel of J..l in constant time knowing the
LABEL of J.I.. and Ihe stagenumbers, say q and q', of)...L and parent (I-l), respectively. For-
mally, if j is the LABEL of ~ then U+2Q',2Q-2Q') is the mlabel of J1- The n processors
can produce all mlabels in logn parallel steps. Using the parent pointers, the processors
migrate towards ROOT with a synchronous pace based on stagenumbers: the rnlabels of
all children of nodes with the same stagenumber are computed at the same time. (Recall
that the difference in stagenumber between a node and its parent is not necessarily 1.) At
the beginning, all processors occupying leaves which are children of nodes of
stagenumber logn -1 change the labels of these nodes into mlabels. Next, the processors
compete for the common parent node, say, by attempting to simultaneously write on it
the labels (addresses) of the nodes which they currently occupy. The winners are marked
"free": they ascend to the parent node where they will perform the necessary label adjust-
mem at the appropriate stage. The losers simply take a record of the (old) label used. by
the winner. The (q-l)-th iteration involves all free processors on nodes with a
sragenumber of q or higher. The operation is the same as above.
A byproduct of the mlabel construction process is a mapping that assigns some
leaves and internal nodes to processors in such a way that the following property is met.
PROPERTY 1. If a node other than ROOT has k children, then precisely k-I of the
children have been assigned. a processor. Moreover, each one of the
k-l processors knows the address of the unique sibling without a pro-
cessor.
- 8-
The proof of properry 1 is straightforward. Let now (i ,I) and U ,m) be the mlabels
of two sibling nodes J..l and v of D;(, and let q be the sragenumber of
parent (jJ.)=parent (v).
FACT 1. The substrings of xU whose descriptors are the labels of J..l and v have a com-
mon prefix of length at most 2q-1.
FACT 2. If k is the length of the longest cornmon prefix of x#U.i+l-l] and
x#U ,j+m-l], then Ini [llogkJ] = IDjl[logkJ].
Fact 1 follows from the definition of Dx • Fact 2 holds by the construction of the
ID's.
Assuming a binary alphabet, the transformation of the mlabeled version of Dx into
T;r is done in two steps. First, a tree is produced that is identical to T;r save the fact that
all arcs are directed upward. as in Dx . Next, the directions of all arcs are reversed.
The first and more important step is actuated by producing logn-l consecutive
refinements of D..:c=D1og(rz-l). The q-th such refinement, denoted by D log(n-q-l). is a
labeled tree with n leaves and no unary nodes which has much the same structure of the
mlabeled Dx • In particular, properties (1-3) of the definition of Tx hold for any
refinement of Dx ' In particular, D(O) is identical to Tx except for the arc directions. To
specify the labels in the generic D (k) I let a nest be any set formed by all children of some
node in D(kl, and let (i ,/) and U,k) be the lahels of two nodes in some nest of D(kl. An
integer t is a refiner for (1,/) and U ,k) iff x#[1 ,I+t-l]=x# U J+t-l]. D (kl is labeled in
such a way that no pair of labels of nodes in the same nest of D (k) admits of a refiner of
size 2*. This latter property is similar, though not identical, to the propeny in Fact 1.
-9-
The update transforming D(k) into n(k.-I) affects all and only the eligible nests of
n(k). i.e., those nests which might admit of a refiner of size 2(1:.-1). Let
(i 1.1 1),(i2,!iJ,••. ,(im .1m ) be the set of all labels in some such nest of D (k) The transfonna-
tion of the nest is performed in two steps.
STEP I. Use the LABEL and ID rabIes to modify the nest rooted at v, as follows. With
Now partition the children of v into equivalence classes, putting in the same class all
nodes with the same first component of their split-labels. For each non-singleton class
which results, perform the following three operations.
(1) Create a new parent node I-l for the nodes in that class. and make I-l a son of v.
(2) Set the LABEL of ~ to (i ,ik- 1», where i is the first component of the split-label of
all nodes in the class.
(3) Consider each child of J..L For the child whose current LABEL IS (ijlij ), change
STEP 2. If more than one class resulted from the partition, then stop. Otherwise, let C be
the unique class resulting from the partition. It follows from the definition of D" that C
cannot be a singleton class. Thus a new parent node ~ as above was created for the nodes
in C during STEP I. Make 11 a child of the parent of v and set the LABEL of ~ to
(i ,/+2(k-l», where (i ,I) is the label of the parent of v.
Theorem I The synchronous application of Steps I and 2 to all eligible nests of D (k)
correctly produces D (k-l).
Proof. We prove that D(k-l) is a tree with no unary nodes. The correcUless of the labels
of D (k-l) relies on Fact 2: we leave the details as an exercise.
-10 -
Clearly, the nest of the children of the root is not eligible for any k >0. Thus for any
parent node y of an eligible nest of D(Ie), parenr(parent(v)) is defined. By definition of
D(Ie). V has more than one child, and so does parent (v). Let nCIe ) be the strllCnrre result-
ing from application of Step 1 to D (Ie).
If, in nO;), the nest of parenr(v) is not eligible, then v is a node of n(k-i), and v
may be the only unary node in D (Ie) between any child of v in D (Ie) and the parent of Vin
n(k). Node v is removed in STEP 2, unless v is a branching node in DO:), Hence no
unary nodes result in this pan of DO~.-l).
Assume now that, in DCk), both the nest of v and that of parent (v) are eligible. We
claim that, in D (Je), either the parent of v has not changed and it is a branching node, or it
has changed but still is a branching node. Indeed, by definition of D (k), neither the nest
of V nor that of parent (v) can be refined in only one singleton equivalence class. Thus,
by the end of STEP 1, the following alternatives are left
1. The parent of v in jj(k) is identical to parenr(v) in n(k). Since the nest of parenr(v)
could not have been refined into only one singleton class, then parent(v) must be a
branching node in D(k-t). Thus this case reduces to that where the nest of parenr(v) is
not eligible.
2. The parent of v in D(k) is not the parent of v in n(k). Then parent (v) in n(k) is a
branching node, and also a node of D (k-l). If v is a branching node in D (k), then there is
no unary node between v and parent (v) in D(k), and the same holds true between any
node in the nest of v and v. If v is an unary node in D (k). then the unique child of v is a
branching node. Since the current parent of v is also a branching node by hypothesis,
then removing Vin STEP 2 eliminates the only unary node existing on the path from any
- 11 .
node in the nest of v to the closest branching ancestor of that node. D
If the nest of D (k) rooted at v had a row R of BB all to itself, then the transforma-
tion undergone by this nest in Step 1 can be accomplished by m processors in constant
time, m being the number of children. Each processor handles one child node. It gen-
erates the split-label for that node using its LABEL and the ID tables. Next, the proces-
sors use the row of BB assigned to the nest and the split-labels to partinon themselves
into equivalence classes: each processor in the nest whose split-label has first component
i competes to write the addresses of its node in the i -th location of R . A representative
processor is elected for each class in this way. SingletOn classes can be trivially spotted
through a second concurrent write restricted to loosing processors (afrer this second
write, a representative processor which still reads its node address in R knows to be in a
singleton class). The representatives of each nonsingleton class create now the new
parent nodes, label them with the first component of their split-label, and make each new
node accessible by all other processors in the class. To conclude STEP I, the processors
in the same class update the labels of their nodes.
For STEP 2, the existence of more than one equivalence class needs to be tested.
This is done through a competition of the representatives which uses the root of the nest
as a common write location, and follows the same mechanism as in the construction of
D z . If only one equivalence class was produced in STEP I, then its representative per-
forms the adjustment of label prescribed by STEP 2.
The above discussion suggests that, once each venex of, say, Dz=D(logn-l) is
assigned to a distinct processor, D (logn-2) could be produced in constant time. The
difficulty, however, is now with how to assign the vertices (notably, the newly insened
- 12-
ones) of n(logn-2) in constant time. It turns out that prafusing less processors into the
game leads to a crisp (re-)assignment strategy.
By definition, D (k) does not have unary nodes. It is seen then that the manipulations
of Steps 1-2 can be operated in constant time by assigning m-l processors. rather than m
to a nest of m nodes. The only additional assumption to be made is Ihat, at the beginning,
all m-l processors have access to the unique node which lacks a processor of irs own.
Before starting STEP 1, the processors elect one of them to serve as a substitute for the
missing processor. Mter each elementary step, this simulator "catches-up" with the oth-
ers.
In view of Propeny 1, this shows that n processors can achieve the first refinement
of Dx . As for which row of BB is assigned to which node of n(k), simply assign the i-th
row to processor Pi' Then, whenever Pi is in charge of the simulation of the missing pro-
cessor in a nest, its BB row is used by all processors in that nest.
For any given value of k, let a legal assignment of processors to the nodes of D (.1:)
be an assignment that enjoys Propeny 1.
Theorem 2. Given a legal assignment of processors for D(.I:), a legal assignment of
processors for D (.1:-1) can be produced from it in constant time.
Proof. We give first a constant-time policy that re-allocates the processors in each nest of
D (.1:) on the nodes of 5(1:). We show then that our policy leads [0 a legal assignment for
Let then v be the parent of a nest of D (k). A node which has a processor assigned to
it will be called pebbled. By hypothesis, all children of v but one are pebbled. Also, all
children of v are nodes of D (.1:). In the general case, some of the children of v in D (.1:) are
-13 -
still children of V in n(k), whereas others became children of newly inserted nodes
111'~' ... , 11,· Our polley is as follows. At the end of STEP 1, for each node 11, of D (k)
such that all children of Ilr are pebbled, one pebble (say, the representative processor) is
chosen among the children and passed on to the parent In STEP 2, whenever a pebbled
node v is removed, then irs pebble is passed down to the (unique) son J.1 of v in D (k).
Our policy can be clearly implemented in constant time. To prove its correctness,
we need to show that it generates a legal assignment for D (k-1),
It is easy to see that if node v is removed in the transition from D (k) to D (..1::-1), then
the unique son J.1 of v in iJCk) is unpebbled in D (k). Thus, in STEP 2, it can never happen
that two pebbles are moved onto the same node of D (k-l).
By definition of n(k), the nest of node v cannot give rise to a singleton class. Thus
at the end of STEP I, either (Case 1) !:he nest has been refined in only one (nonsingleton)
class, or (Case 2) it has been refined in more than one class, some of which are possibly
singleton classes.
Before analyzing !:hese two cases, define a mapping f from !:he children in !:he nest
of the generic node v ofD (k) into nodes of D (k-l), as follows. If node Jl is in the nest of V
and also in D (k-I) then set 11' =f (11) =11; if instead 11 is not in D (k-l), let 11' =f (11) he
the (unique) son of 11 in 5(k)
In Case I, exactly one node Jl is unpebbled in D (k). All the nodes Jl"s are in a single
nest of D (k-l) and, by our policy, 11' is pebbled in D (k-I) iff 11 is pebbled in D(k)
In Case 2, node v is in D(k-l). Any node Jl in the nest of v is in n(k). At !:he end of
STEP 2, the pebble of node Jl will go untouched unless Jl is in a nonsingleton
equivalence class. Each such class generates a new parent node, and a class passes a
-14 -
pebble on to that node only if all the nodes in the class were pebbled. Thus, in D (k-l), all
the children of v except one are pebbled by the end of STEP 1. Moreover, for each 000-
singleton equivalence class, all nodes in that class but one are pebbled. At the end of
STEP 2, for each node ~ which was in the nest ofv in D(k), node JlI is pebbled iff ~ was
pebbled at the end of STEP 1, which concludes the proof. 0
For a binary alphabet, DCO) is a ternary tree (due to the symbol #). Since the proces-
sors are legally assigned to the vertices of D (0) at the end of the compuracion, then the
concurrent reversal of all arcs is srraightfoxwardly achievable in constant-time.
For general alphabets. each node of nCO) must be changed into a binary tree before
arc reversal can take place. Such a change can be obtained through a series of log III
refinements of D (0) quite similar to those already discussed. The major difference is that
now the ID tables are useless, since a more compact descriptor for a substring of x of
log II I bits or less is the substring itself. We leave the details of this pan as an exercise.
References
[AA] A. Apostolico, The Myriad Virtues of Suffix Trees, Combinatorial Algorithms on
Words (A. Apostolico and Z. Galil, eds.), pp. 85-96, Springer Verlag (1985).
[GA] Z. Galil, Open Problems in Stringology, Combinatorial Algorithms on Words (A.
Apostolico and Z. Galil, eds.), pp. 1-10, Springer Verlag (1985).
[KMR]R. M. Karp, R. E. Millet and A.L. Rosernbetg, Rapid Identification of Repeated
Patterns in Strings, Trees and Arrays, Proceedings of the 4-th ACM STOC, pp.
125-136 (1972).
- 15-











14 '5" ', :7
"-







T. 1 -\ ,",0.'T ....1.:... -, I ,\-..t. S ,H-- ~'>( T"-U.
h'r uu...J~.e.........u.. /lA.c..l. ~
~ ~0 .....
r \ ,
-r,)'r 'f..=- 01...::.0... Q... ~;l,.~a...
e.J,..,~\ No r<~ '~ ~'"
'.:2... '::: Q.. a..':::Q.. \0 Q....
'-:v\.......k.:...... ..,( ,::;:.~Q- .
• 6(l...
6 ~ 'l-
~ "'- 10,,-C- b"'- ; b- 4 _ '".'"
!, ~ " bY ~ b '~ll. ~/~0- " ~ ~ (~ b b !~.~b ,. d:# .I- I,blet "<l. i- ,. b Ii. \, a.. itt \0.-~~~ 0..• -- • r • , •" Ia.. I~ • 4,' \~ "- '~ I \~": i. I' 1, .J!' \~ :it~ b b • .. ~:l ,I 0-
"- b 0..; "
b
...
'".. I ~ ~ <l.
. I , .. it
'"
b I I-







I 10 1~ ~:..!J ~ r:m mill m \zl @
Il~ ,,~o~
\ ....dtl'..k. ~tJ.w.
iV-<. ~o '1. f0"'-
'...LOn~ _ I Jod....
