An Efficient All-Parses Systolic Algorithm for General Context-Free Parsing by Ibarra, Oscar H & Palis, Michael A
University of Pennsylvania 
ScholarlyCommons 
Technical Reports (CIS) Department of Computer & Information Science 
October 1988 
An Efficient All-Parses Systolic Algorithm for General Context-Free 
Parsing 
Oscar H. Ibarra 
University of Pennsylvania 
Michael A. Palis 
University of Pennsylvania 
Follow this and additional works at: https://repository.upenn.edu/cis_reports 
Recommended Citation 
Oscar H. Ibarra and Michael A. Palis, "An Efficient All-Parses Systolic Algorithm for General Context-Free 
Parsing", . October 1988. 
University of Pennsylvania Department of Computer and Information Science Technical Report No. MS-CIS-88-86. 
This paper is posted at ScholarlyCommons. https://repository.upenn.edu/cis_reports/768 
For more information, please contact repository@pobox.upenn.edu. 
An Efficient All-Parses Systolic Algorithm for General Context-Free Parsing 
Abstract 
The problem of outputting all parse trees of a string accepted by a context-free grammar is considered. A 
systolic algorithm is presented that operates in O (m n) time, where m is the number of distinct parse 
traces and n is the length of the input. The systolic array uses n2 processors, each of which requires at 
most O(log n) bits of storage. This is much more space-efficient than a previously reported systolic 
algorithm for the same problem, which required O (n log n) space per processor. The algorithm also 
extends previous algorithms that only output a single parse tree of the input. 
Comments 
University of Pennsylvania Department of Computer and Information Science Technical Report No. MS-
CIS-88-86. 
This technical report is available at ScholarlyCommons: https://repository.upenn.edu/cis_reports/768 
An #Efficient All-Parses Systolic Algorithm 
for General Context-free Parsing 
Oscar H. Ibarra 1 
Dcpartment of Computer Science 
University of Minnesota 
Minneapolis, MN 55455 
Michael A. ~ a l i 2  
Department of Computer and Information Science 
University of Pennsylvania 
Phildclphia, PA 19104 
.Abstract: The problem of outputting all parse trees of a string accepted by a context-free grammar is 
considered. A systolic algorithm is presented that operates in 0 (m n )  time, where m is the number 
of distinct parse trces and n  is the length of the input. The systolic array uses n2 processors, each of 
which requires at most O(1og n )  bits of storage. This is much more space-eificicnt than a previously 
reportcd systolic algorithm for the same problem, which rcquircd 0 (n log n )  space pcr processor. The 
algorithm also extends previous algorithms that only output a single parse trec of the input. 
1 Research supported in part by NSF Grants DCR-8420935 and DCR-8604603. 
2 Research supported in part by ARO Grant DAA29-84-9-0027, NSF Grants MCS-8219116-CER, MCS-82-07294, 
DCR-84-10413, MCS-83-05221, and DARPA Grant N00014-85-K-0018. 
AN EFFICIENT ALL-PARSES 
SYSTOLIC ALGORITHM FOR 
GENERAL CONTEXT-FREE 
PARSING 
Oscar H. lbarra 
Michael A. Palis 
MS-CIS-88-86 
LlNC LAB 137 
Department of Computer and Information Science 
School of Engineering and Applied Science 
University of Pennsylvania 
Philadelphia, PA 19104 
October 1 988 
Acknowledgements: This research was supported in part by DARPA grant NO001 4-85- 
K-0018, NSF grants MCS-82-07294, MCS-8219196-CER, DCR-84-10413, MCS-83-05221, 
IR184-10413-A02 and U.S. Army grants DAA29-84-K-0061, DAA29-84-9-0027. 
1. Introduction 
General context-free language (CFL) recognition is an important problem with a wide range of appli- 
cations: formal language thcory, pattern recognition, natural language processing, compiler dcsign, to name 
a few. To date, thc Cockc-Kasami-Youngcr (CKY) algorithm [YOUN67] and Earlcy's algorithm 
EARL701 remain thc bcst known practical mcthods for solving this problcm, both having a worst-casc 
time complexity of 0 (n3) for inputs of lcngth n . (In WALI751, Valiant presented an asymptotically fastcr 
algorithm; however, the constant of proportionality is too large for practical applications.) 
Kosaraju [KOSA75] first considcrcd the problcm of parallcl CFL rccognition and presented a pard- 
lelization of the CKY algorithm on a two-dimensional itcrativc array of n2 processors. The array operates 
in linear time and only rcquires finitc-statc processors (i.e., the processor stores information whose size is 
indcpendcnt of the lcngth of thc input). Anothcr algorithm, using a systolic array, is also implied by the 
work of Guibas, Kung and Thompson [GUIB79], who gave a parallel implcmcntation of thc dynamic pro- 
gramming algorithm (similar to the CKY algorithm) for computing the cost of an optimum binary scarch 
tree. Both algorithms are optimal; the speed-up is linear in the number of proccssors uscd. A parallel 
algorithm which has a fastcr running timc (in fact, 0 (log2n)) has been prescnted by Rytter [RY?T85]; 
howcvcr, the algorithm is implcmcnted on parallcl random-access machinc (PRAM), a hypothetical model 
that ignores communication costs, and uses morc proccssors (n6). 
In [CHIAM], Chiang and Fu considered thc more gcncral problcm of CFL parsing, which unlike 
rccognition, also rcquircs a parsc ace as output. Thcy gave a parallcl implementation of Earlcy's algorithm 
on a systolic array of n2 processors. Bcsidcs recognizing thc input, thc array also outputs a parse tree in 
linear time. Howcvcr, the processors arc no longcr finite-state since each is required to storc 0 (log n) bits 
of information. A fully finitc-state systolic m y  for recognition and parsing was latcr given in [CHAN871; 
the array uscs n2 proccssors and runs in linear time. 
An interesting extension to h e  CFL parsing problcm is that of outputting a11 parse trees of the input 
string. In some applications such as natural language parsing, the undcdying grammar is usually ambigu- 
ous. Typically, one would bc intcrcstcd in gcncrating all parsc trccs of thc givcn string, which latcr can be 
disambiguated by applying some scmantic rulcs. In &ANG86], Langlois considcrcd the all-parses problcm 
and gave a systolic algorithm based on thc systolic architccture of [GUIB79]. The systolic array uscs 
0 (n2) processors. However, each processor is rcquired to store 0 (n log n )  bits of information, resulting 
in a total space complcxity of 0 (n3 log n). If the underlying grammar is unambiguous, thc spacc com- 
plexity rcduces to 0 (n2 log n). Langlois poscd indirectly the question of whcther O(n2 log n )  spacc is 
sufficient to output all parscs for an arbitrary CFL. In this papcr, wc scttle this question in thc affirmative. 
In particular, we give a systolic CFL parsing algorithm that outputs all parses in time 0 (m - n)  using n2  
processors, each of which rcquircs only O(log n )  bits of storage. Thus, the total space complcxity is 
0 (n2 log n). The systolic algorithm is an extension of the one described in [CHAN87]. It should be 
pointed out that the algorithm in [CHAN87] docs not give an explicit systolic array implementation, but 
rathcr givcs an algorithm that runs on a sequential machine characterization of a systolic array. This papcr 
gives the explicit "systolic version" of the algorithm in [CHAN83, and cxtends it to generate all parse 
trees of the input string with only a factor log n increase in the space complexity. 
The paper is organized as follows. In Section 2, we first describe a squcntial parsing algorithm on 
which the systolic algorithm is based. In Section 3, we introduce the systolic array model that implements 
the algorithm. Sections 4 and 5 describe the two phases of thc systolic algorithm: the recognition and 
parse generation phasc, respectively. Finally, Section 6 givcs an analysis of thc time and space complexity 
of thc algorithm. 
2. A Sequential Context-Free Parsing Algorithm 
We first describe the sequential parsing algorithm on which the systolic parsing algorithm is bascd. 
Wc assumc familiarity with context-frce grammars (CFG's); sec, e.g., [AH0721. Lct G = <VN ,VTS S> be 
a CFG where VN and VT arc iinite sets of nontcrrninal and terminal symbols, respectively. S E VN is the 
start symbol, and P is a finite set of productions in Chomsky normal form. That is, every production in P 
is either of the form A -, BC or A -, a ,  whcrc A ,  B , C E VN and a E VT. The languagc gcncratcd by 
G is L(G) = {w E VT+IS => w ] .  
Given an input string w = a ~ a z  . . . a,, ai E VT, the sequential algorithm starts by constructing sets 
R ( i j ) . l I i  I j  In ,suchthat  
R(i j ) =  {[A 4 a] E P IA =>ai 
.a , ) .  
The sets R (i j) are computed according to the following variant of the CKY dynamic programming algo- 
rithm VOUN671: 
R(i,i) = ([A 4 ail  E P ]  1 5 i  I n ,  
R(i j )  = u R(i&) * R(k+l j )  l S i c j I n ,  
i G c j  
where R * R = ([A + BC] E P I there are productions xl E R and E R2 such that LHS(xl) = B and 
LHS(lc2) = C ) .  ('LHS' stands for 'left-hand side'.) Thus, w E L(G) iff R ( 1 p )  contains a production 
whose LHS is the start symbol S .  
An example of a CFG G and thc corresponding mauix of R (i j ) ' s  for thc string w = abaa is illus- 
trated in Figure 2.1. Henceforth, the matrix R = ( R  (i j )  I 1 < i I j < n ) shall bc rcfcrrcd to as the recog- 
nition matrix. For the given examplc, we see that abaa E L (G) since R (1,4) conlains a production whose 
LHS is S . 
If w E L (G) thcn w has one or more parse trees, whcre a parse trce is a binary tree of productions 
uscd in the dcrivation S => w . For thc examplc in Figurc 2.1, the string abaa has five distinct parse trccs. 
as shown in Figure 2.2. For cach production, thc pair of numbers (i j )  dcnotes Ihc matrix cntry R (i j )  to 
which the production bclongs. 
We now describc a procedure PARSE for generating all parse trees of the input string. PARSE is a 
recursive procedure that takes four argumcnts (A ,i J ,tag), whcre A E VN, 1 l i I j I n and tag E 
(FIRST, CURRENT, NEXT). Informally, PARSE (A ,i,j,tag) rcturns a parsc uce for thc dcrivation 
A => ai . . . a,. The parse tree is rcprescnted as follows: if a production TC in the parse trcc bclongs to 
R (i j ) ,  then Lhc occurrence of x in R (i j )  is "marked" by some special symbol, say *. (There is no ambi- 
guity here since all productions in a parse uce bclong to distinct R (i j)'s.) For example, the first parsc tree 
in Figure 2.2 would be rcpresentcd as shown in Figure 2.3. Note that the actual trce can be retrieved since 
for every marked production, its Icft (right) child in the actual tree is simply the next marked production 
above it along the same column (diagonal). 
P = [S +AA],  [A + AC], [B + BC], IC + CCI, 
[S + ABI, [A+CBl ,  [ B + b l ,  [C + a 1  1 
[A +a I, 
Recognition Matrix for w = abaa 
2 h a a 
Fiyre 2.1. A CFG G and the recognition matrix R for w = abaa. 
The argument rag dictates which parse tree is returned. If tag - FIRST, then PARSE (A ,i J ,tag) 
returns an initial parse tree for A & ai - . - a,. If tag = NEXT, then it returns the next (distinct) parse 
tree following the one last generated. Finally, if tag = CURRENT, then it returns the current parse tree. 
To keep track of the order of parse tree generation, the procedure makes use of a number of auxiliary 
variables. For each (i ,j),  1 5 i 5 j 5 n ,  there are boolean variables done (i j )  and last-id (i j ) ,  and an 
integer variable id(i,j). The variables are utilized as follows: Let t be the tree that results after a call to 
PARSE (A ,i j ,tag ). Then, 
(1) done (i,j) = true iff t is the last parse tree for A & ai - - . a,. 
(2) id(i , j)  = k,  i 5 k < j ,  iff the root o f t  has a left subtree whose root is a production in R (i,k) and a 
right subuee whose root is a production in R (k+l,j). (id stands for "index of decomposition".) 
(3) fat-id(i,j) = true iff id(i,j) is the largest integer k satisfying (2). 
Procedure PARSE is given below. In the procedure, each R (i j )  is treated as an ordcrcd subset of produc- 
tions, so that we can refer to the first, second, etc., production in the set. 
LA - 8  ACI [ A  -B a1 
(1.31 14.41 6-77 
1 1 
ID -P RCI 
(2.4) 
In - P  b~ I C  - B  CCI 
(2.21 (3.4) 
Figure 2.2. Parse trees for S % abaa . . 
Figure 23. Representing a parse tree in the recognition matrix. 
procedure PARSE ( A  ,i , j ,tag ); 
begin 
if (i = j )  then 
if R (i ,i ) has a marked production then UNMARK (i ,i ) endif; 
mark the production [A + ail in R ( i  ,i ); 
i d ( i , i )  t 0; done(i, i)  t last - i d ( i , i )  t frue 
else 
case tag of 
CURRENT : 
I* there is a marked production in R (i j )  *I 
let [A + BC]  be the marked production in R (i , j ) ;  
k t id ( i , j ) ;  
PARSE (B ,i ,k ,CURRENT); 
PARSE (C ,k+l,j,CURRENT); 
FIRST : 
if R (i , j ) has a marked production then UNMARK (i ,j ) endif; 
mark the first production sc = [A + BC] in R (i , j )  whose LHS = A ; 
(id (i , j ) ,  last-id (i J ) )  t MATCH (B ,C ,i ,j ,i); 
PARSE (B ,i ,id (i ,j),FIRST); 
PARSE (C ,id (i , j )+1 , j ,FIRST ); 
NEXT: 
I* there is a marked production in R ( i  j )  */ 
let [A 4 B C ]  be the marked production in R (i , j ) ;  
k t i d ( i , j ) ;  
if not done (k+l , j )  then 
PARSE (B ,i ,k ,CURRENT); 
PARSE (C,k+l,j ,NEXT) 
elseif done (k + 1 J )  and not done (i  ,k)  then 
PARSE (B ,i ,k ,NEXT); 
PARSE (C,k+l J,FIRST) 
else I* done ( i  ,k) and done (k+l , j )  *I 
UNMARK(i,k); UNMARK(k+l,j);  
if not last - id (i , j )  then 
( id ( i , j ) ,  last-id(i,j)) t MATCII (B ,C,i , j ,k+l); 
PARSE (B ,i ,id (i ,j),FIRST); 
PARSE (C ,id ( i  j )+l,k ,FIRST) 
eIse 
unmark the currently marked production in R (i , j ); 
mark the next production x' = [A 4 D E ]  whose L H S  = A ; 
( id ( i , j ) ,  last - i d ( i , j ) )  t MATCfI(D ,E,i,j,i); 
PARSE (D ,i ,id (i , j ),FIRST); 
PARSE (E  ,id (i , j )+ 1 J ,FIRST) 
endif; 
endit 
endcase; 
temp c done (i ,id ( i , j ) )  and done ( id( i  J ) + l , j )  and last-id (i J ) ;  
if (temp) and (x is the last production in R ( i j )  whose LHS = A )  then 
done (i , j )  t frue 
else 
done (i , j ) t f alse 
endif; 
end if; 
end PARSE. 
In the procedure, subroutine UNMARK(i,j)  deletes all marks on productions in the subset of entries 
{R (a ,b)  1 i I a I b I j } .  This has the effect of delering the subtree whose root is a production in R (i , j )  
(this subtree no longer belongs in the parse tree being generated). 
Subroutine MATCH(B ,C,i j , k )  returns a pair of values (I,last), where 1 is an integer satisfying 
k I I < j and last E {[rue fa l se} .  Specifically, MATCH does the following: It looks at the pairs 
[R (i ,I ), R (1 + l , j ) ] ,  k I 1 < j ,  in increasing value of 1 then returns the least 1 such that 
(*) there is some production in R (i ,l ) whose LHS = B and 
there is some production in R (I +l , j )  whose LHS = C . 
In addition, if there is no other integer > 1 satisfying (*), it returns last = m e ;  otherwise, it returns 
last = f a l s e .  
The main program that calls PARSE is given below: 
begin 
if there is a production in R (1,n) whose LHS = S then 
PARSE (S , l  ,n FIRST) 
endif; 
while not done (1,n) do 
PARSE (S , 1 ,n AEXT ) ; 
endwhiIe; 
end. 
One can verify that running the main program using the recognition matrix of Figure 2.1 outputs the 
parse trees of w = abaa in the order shown in Figure 2.2. 
For the time complexity, it is clear that constructing the recognition matrix takes 0 (n3) time. Each 
call to PARSE(S ,l,n,tag) in the main program takes 0 (n2) steps. This follows from the fact that since 
the grammar is in Chomsky normal form, a parse tree has 2n-1 nodes (productions). For each production, 
at most one call to subroutine MATCH is performed to determine its children, and this takes 0 (n) time. 
Moreover, all calls to UNMARK within PARSE takes at most 0 (n2) steps. Thus, the total running time is 
0 (n3 + mn2), where m is the number of distinct parse trees of the input string. Note that the second term 
dominates when m = R(n ). 
3. The Systolic Array Model 
The systolic parsing algorithm is essentially a parallelization of the sequential algorithm described in 
the previous section. The systolic array that implements the algorithm is illustrated in Figure 3.1. It con- 
sists of two triangular arrays: the P-array (the square nodes) and the Q-array (the circular nodes). Both 
triangular arrays have n processors along each dimension, where n is the length of the input string to be 
parsed. The processors are assumed to be indexed as shown. For the P -array, P ( i , j )  denotes the proces- 
sor in the i-th leftmost column, of the j-th row. For the Q -array, Q (i J) denotes the processor in the i -th 
righunost column, of the (j-i+l)-st row. For convenience, we call a processor of the P-array (Q-array) as 
a P -processor (Q -processor). The processors are interconnected as shown in the figure. All communication 
links are assumed to be bi-directional (i.e., data can travel in either direction). 
The operation of the systolic array is synchronous, i.e., computations take place at distinct clock 
cycles. The input is the suing a ,a2  . - . a, to be parsed, followed by an end-of-input marker $. This 
input is fed serially to processor P (1,l) of the P -array; ai is input at clock cycle i ,  I I i I n,  and $ at 
clock cycle n+l.  The parse trees (if any) of the input suing are generated in "stages". At the end of each 
stage, a new parse tree would be stored "on-the-fly" in the Q - m y ;  more precisely, if the parse tree con- 
tains a production from R (i ,]), then this production would be stored in processor Q (i ,j). 
Figure 3.1. Systolic array model. 
Each processor has a local memory consisting of fixcd number of rcgisters. In describing the systolic 
algorithm, it is convenicnt to give namcs to some of thcse rcgistcrs, as shown in Figure 3.2. A P - 
processor has six registers r,, and tp (p, q E (0,1)), each capable of holding an ordered subset of produc- 
tions of thc underlying grammar. In addition, it has four cells, Cpq (p, q E (0,1]), wherc a cell is a col- 
lcction of thrce registcrs: tag, sym, and pset. Register tag can hold a value from the sct (FIRST, 
CURRENT, NEXT 1, sym can hold a single nonterminal symbol, and pset can hold an ordered subset of 
productions. A Q -processor has five registers: p , done, ldone, rdone , id and last-id . Registers done , 
Idone, rdone and last-id can hold boolean values; p can hold a single production. Finally, id can hold 
valucs of the form (I ,b) wherc 1 is an intcgcr in the rangc 0 I 1 I n and b E {0,1]. We shall explain the 
usc of thcsc registers in subsequent sections. 
As in the sequential case, the systolic parsing algorithm consists of two phases: a recognition phase 
which computes thc rccognition matrix, and a parse generation phase which outputs the parse trccs. Thc 
rccognition phasc is similar to thc onc dcscribcd in [CHAN87]; the dilfcrcncc is that the algorithm in 
[CHAP4871 was givcn in tcrms of a scqucntid machine characterization of the systolic array. The algo- 
rithm prcsentcd hcre is the "systolic vcrsion" of the sequential machine in [CHAN87]. Using the same 
sequential machine, [CHAN87] also dcscribcs how to output a single parsc trce of thc input string. Here, 
we prcscnt a parsc gcncntion phasc hat  outputs all such parsc Lrccs with only a small incrcasc in thc space 
complexity. 
4. The Systolic Recognition Phase 
The systolic recognition phase computes the recognition matrix R and determines whether the input 
string a l a z  . . . a, is in the language generated by the grammar. During this phase, only the processors of 
the P -array take part in the computation; the Q -array is not used 
Figure 3.2. Memory organization of (a) a P -processor and (b) a Q -processor. 
The recognition phase has the property that the movement of data in the P -array is only from lower- 
indexed to higher-indexed processors (i.e., from left to right and from top to bottom). We take advantage 
of the uniformity of the data flow by introducing the notion of a forward sweep, which simplifies the 
description of the computational steps involved. For a processor p of the P -array, let dp be the rectilinear 
distance (i.e., counting only horizontal and vertical links) of p from processor P (1,l). Then, p is said to 
be at forward sweep s iff it is at clock cycle dp + s.  For example, forward sweep 1 is clock cycle 1 for 
P (1,1), clock cycle 2 for P (1,2), clock cycle 3 for P (2,2) and P (1,3), etc. The important thing to note is 
that in a given forward sweep, a processor is "viewed" one cIock cycle earlier than the neighboring proces- 
sors to its right or below it. Thus, a computation that takes place in the former processor can affect the 
latter processors also at the same forward sweep. 
Conceptually, the recognition phase starts, for all processors, at forward sweep 1 and ends at forward 
sweep n+l. With respect to processor P (1,1), these correspond to the first n+ l  clock cycles during which 
it reads the input alaz  - . . a,$. During each forward sweep, the P -array computes a new portion of the 
recognition matrix; in particular, at forward sweep s ,  1 5 s 5 n ,  only the set of entries {R(a,s) I 
1 I a I s } are computed. 
Matrix entries are computed only at processors P (j J), 1 5 j I n, henceforth called primary proces- 
sors. A primary processor computes one or more such entries but at lfferent forward sweeps. More pre- 
cisely, P , j )  computes R (s-j+l,s) at forward sweep s , j I s I n . For example, P (3,3) computes 
R (1,3), R (2,4), - - - , R (n-2,n ) at forward sweeps 3, 4, . . . , n. 
The secondary processors P (i , j ) ,  1 4 i < j 4 n ,  play a different role. Suppose that primary proces- 
sor P ( j j )  is assigned to compute entry R (a ,b) at some forward sweep. Then, at the same forward sweep, 
the secondary processors to the left of P u j )  would have stored in their local memory the set of "convolv- 
ing pairs" {[R (a ,c), R (c+l,b)] I a I c < b} which are needed to compute the value of R (a ,b). The map- 
ping from convolving pairs to secondary processors is best explained by means of an example. Consider 
the case when processor P ( 5 3 )  wishes to compute R (2,6) at forward sweep 6. Then the required convolv- 
ing pairs {[R(2,c), R (c+1,6)1 1 2 5 c c 6) would be stored in processors P(1,5), . e e , P(4,5) as shown in 
Figure 4.1-(a). Intuitively, the mapping is obtained by first listing the convolving pairs 
{[R (a,c), R (c+l,b)]} in increasing order of c ,  then "folding" the list about the middle as shown in Figure 
4.1-(b). (As we shall see later, this "folded mapping guarantees that data can be routed among processors 
using only nearest-neighbor connections.) 
Figure 4.1. Mapping from convolving pairs of R (2,6) to secondary processors. 
The formal mapping is given by Invariants 4.1 and 4.2 below. The processors use the four r,, regis- 
ters to store the enmes. The notation rp,(i j , s )  means the contents of register rpq of processor P (i , j )  at 
forward sweep s . 
Invariant 4.1. For 1 5 i < j 5 s 5 n, 
if 2i < j 
r ~ ( i  J ,s) = 
R (s-j+l,s-i) otherwise 
if2 < j  
rol(i,j,s) = 
R (s-i+l,s) otherwise 
if 2i I j 
r~o(i , j ,s)  = 
R (s-j+l,s-j+i) otherwise 
if 2i I j 
rll(i,j,s) = 
R (s-j+i+l,s) otherwise. 
Invariant 4.2. For 1 I j I s I n ,  
rooti,j,s) = r l l t i J , s )  = 0, 
rolti,j,s) = rloti,j,s) = R (s-j+l,s). 
Invariants 4.1 and 4.2 specify the register values for secondary and primary processors, respectively. 
All registers are assumed to be initialized to the empty set 0. Observe from Invariant 4.1 that some secon- 
dary processors may have some registers permanently set to 0; this indicates that no matrix entry is 
mapped onto the register. Moreover, for primary processors (see Invariant 4.2), roo and r l l  are always 0, 
and rol and rl, hold the computed entry. Although one register should be sufficient, this mapping 
simplifies the routing of data (to be explained later). Finally, the invariants define the register values of 
P ( i j )  only for forward sweeps s 2 j. If s < j ,  the registers of P (i , j )  retain their initial values 0. Fig- 
ure 4.2 illustrates the register values for a 4 x 4 P -array at forward sweeps 1 through 4. 
It is easy to see how Invariant 4.2 can be realized for every primary processor given that Invariant 
4.1 holds for secondary processors. For a given forward sweep, Invariant 4.1 states that all the convolving 
pairs required to compute the enuy at the primary processor are available in the secondary processors to its 
left Thus, the desired value is simply the union, over alI secondary processors, of (rm*rol) u (rlo*rll). 
This value can be computed as follows: Each processor has a left input terminal IN, and a right output ter- 
minal OUT, (for a processor in the leftmost column other than P(1,1), IN, is assumed to be permanently 
set to 0) .  At the start of each forward sweep, the processor receives a value from IN,, computes IN, u 
(rw*rol) u (rl0*rI1) then sends the result to OUT,. The output from OUT, then travels with unit-delay to 


the IN, terminal of the next processor. It is clear that the value that arrives at the primary processor is the 
desired matrix entry. The primary processor then stores this value in its rol and rlo registers. Processor 
P (1,l)  is a special case: we let IN, be the terminal from which it receives the input string a l a z  . . . a,$. 
At forward sweep i ,  1 I i I n ,  P (1,l) reads ai from IN,, computes the set {[A -+ a;] E P }, then stores 
the result in its rol and r registers. 
Once computed by a primary processor, an entry is routed to various secondary processors to partici- 
pate in the computation of new entries. Invariant 4.1 gives the &sired mapping. We now specify the 
required data routing steps. Each processor has four input terminals IN, and four output terminals OUT,, 
@, q E (0.1)) connected to neighboring processors as shown in Figure 4.3. More precisely, the INoo and 
IN terminals of processor P (i j ) receive data from the OUT, and OUTl1 terminals, respectively, of pro- 
cessor P (i-1,j-l), and the INol and INlo  terminals receive data from the OUTol and OUTio terminals, 
respectively, of processor P(i,j-I). (For processors with non-existent neighbors along the directions 
shown, the relevant inputs are assumed to be 0.) Data items travel through the communication links at 
different speeds. In particular, outputs from terminals OUToo, OUTol, OUTlo and OUTl, reach their desti- 
nations 3, 1, 2, and 2 clock cycles later, respectively (indicated in the figure by the number of black 
squares in each link). 
Figure 43. The IN,, and OUT, terminals of a P -processor and their interconnections. 
For a secondary processor, data arriving at the IN, terminals are used to update its local registers, as 
depicted in Figure 4.4. For processors P (i , j )  satisfying 2i # j, register r,, is updated to the value 
received from IN,,; similarly, OUT, gets the value of r,,. For processors P ( i , j )  satisfying 2i = j, the 
input terminals are switched for roo and rlo, and the output terminals are switched for rol and rll. For a 
primary processor, inputs (if any) arriving at the IN, terminals are ignored. After storing the newly com- 
puted entry in its registers, the processor routes the register contents to the associated output terminals the 
same way as described. 
IN00 WTOO 
IN01 WTOl 
IN I0 WTIO 
IN1 1 
Figure 4.4. Updating the r, registers of processor P (i , j )  for the case (a) 2i # j and (b) 2i = j . 
For processor P ( i , j ) ,  the above data routing step (and the associated computational step which com- 
putes the convolutions) is performed at every forward sweep s 2 j. For forward sweeps s c j ,  the proces- 
sor is "inactive". The processors can be activated at the right forward sweeps as follows: At clock cycle 1 
(when the first input symbol is read), processor P(1,l) generates a "start" control signal which travels 
downwards with Zdelay (i.e., hops from processor to processor every 2 clock cycles) and to the right with 
unit-&lay. One can easily verify that the "start" signal reaches processor P( i , j )  at forward sweep s = j. 
At this point, we explain the use of registers t o  and t1  in each processor (see Figure 3.2). At the 
clock cycle when a processor receives the "start" signal, it also copies into its t o  and f l  registers, the 
updated contents of its rol and rll registers, respectively. In subsequent clock cycles, the contents of t o  
and f I are left unchanged. The information stored in these registers will be used later in the parse genera- 
tion phase. 
The computational and data routing steps previously described guarantee that Invariants 4.1 and 4.2 
hold for aIl processors of the P -array. In particular, at the end of forward sweep n ,  processor P (n,n) 
would have computed the value of R (1,n). The proof is straightforward induction (on the sweep number 
and processor index) and is left to the reader (see also [CHAN87]). 
Forward sweep n+l  (at which processor P (1,l) reads the end-of-input marker $) is used to terminate 
the recognition phase for all processors. When $ is read, processor P(1,l) issues a "halt" signal which Irav- 
els downwards and to the right with unit-delay. When received by a processor other that P (n ,n), the pro- 
cessor terminates its computation. For processor P(n,n), it checks if R(1,n) (which is stored in its rol and 
r l o  registers) contains a production whose LHS is the start symbol S. If there is no such production, it 
sends a "reject" signal back to processor P (1,l) and the systolic array halts. Otherwise, P (n ,n ) initiates 
the parse generation phase described in the next section. 
Remark 4.1. We have some final rcmarks about the recognition phase. If one wishes only to determine 
whether the input string is in the language generated by the grammar, then the systolic array nced not exe- 
cute the ncxt phase. In this case, onc gets the answer from processor P (n 4) at the end of forward sweep 
n+l, which corresponds lo clock cycle 3n-1. Furthermore, observe that evcry processor stores in its regis- 
ters, values which are dcpcndent only on the size of the grammar and not on the length of the input (is., 
the processor is finite-state). It is also a simple exercise to modify the systolic algorithm just described so 
that cach processor docs not nccd to know its indcx (e.g., as is required to distinguish processors P (i j )  
such that 2i = j). 
5. The Systolic Parse Generation Phase 
The systolic parse generation phase is essentially a parallelization of procedure PARSE described in 
Section 2. During this phase, both P -array and Q -array take part in the computation. Conceptually, the 
phase is divided into m stages, where m is the number of distinct parse trees of the input string. At the 
end of each stage, a new parse tree is stored "on-the-fly" in the Q -array; more precisely, if the parse tree 
contains a production from R (i ,j), then this production would be stored in processor Q (i J).  
Every stage begins with processor P (n,n) issuing a "begin-parse" control signal which reaches all 
other processors of the P -array and Q -array by moving upwards and to the left with unit-delay. Thus, a 
processor a (rectilinear) distance d away from P (n  ,n) receives the signal d clock cycles later. For a pro- 
cessor, let reverse sweep 1 (of the current stage) be the clock cycle at which it receives the "begin-parse" 
signal. Then, reverse sweep 2 is the next clock cycle, reverse sweep 3 the clock cycle after reverse sweep 
2, etc. A reverse sweep is just like a forward sweep, the only difference being that in a given reverse 
sweep, a processor is "viewedn one clock cycle earlier that the neighboring processors to its lefr and above 
i t  
The parse tree that is eventually stored in the Q -array at the end of each stage is output from the P - 
array. Informally, the P-array identifies and "marks" the productions making up the parse tree from the 
recognition mamx entries stored in its primary processors. The mapping described in the previous section 
is especially suited for carrying this "marking" process since at every forward sweep, the convolving pairs 
of the entry computed at a primary processor are all stored in the secondary processors to its left. Thus, if 
a production, say [A + BCI, has already been identified as part of the parse tree at some primary proces- 
sor, then the children of this production in the parse tree can be obtained by performing a "search" of the 
convolving pairs stored in the secondary processors (i.e., find a register-pair [rpO, rp l ]  such that B is the 
LHS of some production in rpo and C is the LHS of some production in rp l ) .  To do this, however, the 
flow of information should now be from right-to-left (rather than from left-to-right as is the case for a for- 
ward sweep). Moreover, since at the end of forward sweep n the primary processors only hold the set of 
entries {R (a ,n ) I 1 I a I n }, the "lost" entries should somehow be recovered. 
The mck is to be able to "reconfigure" the P-array such that at reverse sweep 1, 2, ..., n ,  every pro- 
cessor holds the same memory contents that it had at forward sweep n ,  n-1, ..., 1, respectively. That this 
can be accomplished follows from the observation that during the recognition phase, every newly computed 
entry starts from an r, register of a primary processor then follows a unique directed path through the P - 
array. Moreover, the path always ends either at a t, register at some forward sweep s S n (after which the 
fp register is no longer changed) or at an r, register at forward sweep n.  Thus, the r,, and tp registers at 
the end of the recognition phase contain all the entries computed in all n forward sweeps; in n reverse 
sweeps these entries can be sent back to their previous locations by routing them along the paths opposite 
to what they took during the recognition phase. 
In order not to lose the information stored in the r, and tp registers at the end of the recognition 
phase (they will be required at the start of each new stage), we instead use the cells of the P -array for stor- 
ing and routing the data (see Figure 3.2). In particular, we let register psel of cell C, (or pset (C,) for 
short) take the place of register r,, . For example, for n = 4, the contents of the pset registers of the P - 
array at reverse sweeps 1 through 4 would be identical to those shown in Figure 4.2, except that reverse 
sweep 1 corresponds to forward sweep 4, reverse sweep 2 corresponds to forward sweep 3, etc. 
The "routing scheme" for cells is essentially the reverse of that shown in Figure 4.4: simply replace 
"r,," by "C," and reverse the directions of all the arrows. The delays associated with the links (see Fig- 
ure 4.3) remain the same. (To route a cell we mean to route the contents of the three registers tag, sym 
and pset that make up the cell.) Processor P (i,j) performs the routing step for its cells at every reverse 
sweep. There are two exceptions: The first is reverse sweep 1, when processor P (I, j ) updates pset (Coo) 
and pser (CI0) to roo and rlo, respectively, instead of getting the data as inputs (which turn out to be non- 
existent at reverse sweep 1). The second exception is reverse sweep n-j+l, when processor P ( i , j )  instead 
updates pset (Col) and pset (Cll) to t o  and t l ,  respectively; this has the opposite effect of copying rol and 
r l l  into t o  and t l ,  respectively, at forward sweep j. (We shall explain later how processor P ( i , j )  would 
know when it is at reverse sweep n-j+l). 
We now describe the computational steps performed by the P -array. At each reverse sweep, a pro- 
cessor cames out the computational steps only after it has updated its cells. The heart of the computation 
is an "instruction" called MATCI-I which is issued by a primary processor to all secondary processors to its 
left. MATCH can be thought of the "systolic equivalent" of subroutine MATCII in procedure PARSE. In 
general, this instruction has the form MATCII (x, (tag l,fag2), id, last - id) where 
x is a production in P ,  
rag tag 2 E {FIRST, CURRENT, NEXT, NULL }, 
id = (1 ,b) where 1 is an integer such that 0 S 1 S n and b E {0,1}, and 
last-id E {true f alsc }. 
For a primary processor P G,j), the cells of the secondary processors to its left can be thought of as a 
"chain" of cell-pairs, as depicted in Figure 5.1. When P u j )  issues a MATCH instruction, say, 
MATCH([A -+ BC], (fa~l,tag2), (I ,b) ,  last-id), the cell-pairs are "searched in the order shown in the 
figure starring at cell-pair [CbO,Cbl] of processor P (1 ,j) (processors prior to P (I , j )  simply propagate the 
instruction unchanged to the next processor with unit-delay). Now, let [C1,C2] be the first cell-pair 
satisfying the property that 
(*) there is a production in pset (C ,) whose L;HS = B and 
there is a production in psef (C2) whose LHS = C. 
If [C [,CA is cell-pair [CbnOICbel] of processor P (1' , j )  then: 
(1) sym (CbsO) and sym (Cbp [) are set to nonterminals B and C, respectively, 
(2) tag (Cbd and tag (Cbtl) are set to tag and tag2, respectively, and 
(3)  processor P(l',j) modifies the instruction it sends to its left to MATCIi([A + BC], (NULL , 
NULL), (l',b'), last - id). 
Figure 5.1. The cells of secondary processors to the left of P U,j)  depicted as a "chain" of cell-pairs. 
Updating (fag r,fagz) to (NULLflULL,) indicates that a match has already been found; the place where the 
match occurred is given in the new id = (l',b'). The rest of the cells-pairs following the one where the 
match occurred continue to be tested for property (*), this time to determine whether last-id needs to be 
updated. If another match occurs, then lust - id is updated to f alse ; otherwise, it retains its old value. 
The MATCH instructions leaving the leftmost column of the P-array serve as input to the Q -array. 
The steps performed by a Q -processor are simple: at each reverse sweep, it shifts the contents of its local 
registers p ,  id and last-id into the corresponding registers of the processor to its left, then updates its own 
registers to those it receives from the processor to its right. For a Q -processor in the rightmost column, 
the new contents of its p ,  id, and last - id registers are obtained from the x, id, and last-id arguments, 
respectively, of the MATCH instruction (if any) it receives from the corresponding processor in the P - 
array. (If no MATCH instruction is received, the Q -processor simply clears the three registers.) 
For processors of both the P-array and Q -array, the data routing steps and the computational steps 
associated with the MATCII instruction are executed at every reverse sweep starting at reverse sweep 1 
(which is when they receive the "begin-parse" signal). For all processors on the j-th row (from the top), 
reverse sweep n-j+l is the last reverse sweep when these steps are performed. A processor on the j-th 
row can know when it is at reverse sweep n-j+l as follows: At reverse sweep 1, processor P(n,n) issues 
an "end-parse" conuol signal which travels upwards with 2-delay and to the left with unit-delay. A proces- 
sor receives this signal at reverse sweep n-j+l. 
We now explain how the MATCH instructions are used to generate a parse tree of the input suing. 
The actions performed by the systolic array for the first stage are slightly different from those of the 
succeeding stages. We first describe what happens during stage 1. 
Stage 1. Stage 1 begins when the "halt" signal indicating the end of the recognition phase reaches proces- 
sor P (n ,n). At this clock cycle, processor P (n ,n) issues the "begin-parse" signal to all other processors to 
start reverse sweep 1 of the stage. At the start of reverse sweep 1, the data routing steps described earlier 
would place the value of R (1,n) into registers pset(Col) and pset (Clo) of primary processor P(n,n). 
Moreover, the cells of the secondary processors to its left would hold the convolving pairs of R (1,n). Sup- 
pose that pset (Col) (or pset (Clo)) has a production whose LHS is the start symbol S. Then, P (n ,n) first 
sets sym(Co1) to nonterminal symbol S and rag (Col) to FIRST, then does the following: 
(1) Locate the first production x E psct (Co,) such that LHS(x) = sym(Col). Moreover, if R is the last 
such production, distinguish TC by some special symbol, say E (this information will be used later in 
the Q -array); 
(2) Send MATCH(x (or E), (FIRST,FIRST), (n-1,0), true) to the processor to its left. 
The MATCH instruction would search for the lirst cell-pair [Cl,C;l which contains a pair of productions 
that match the right-hand side of x. The cell-pair is then "marked" by updating their sym and tag regis- 
ters. In addition, new id and last - id values would be computed and, together with production IT (or X), 
shifted into the Q -array. Now, the routing scheme for cells would eventually bring the marked cells to pri- 
mary processors (either as Col or Clo) at some reverse sweep. When this happens, the secondary proces- 
sors to the left of the primary processor would again hold the convolving pairs of the entry stored in the 
pset register of the marked cell. The process then repeats. More precisely, a primary processor P ( j J )  
receiving a marked cell C (at most one marked cell would arrive at any reverse sweep) checks tag (C) and 
sym(C). If tag (C) = FIRST, then it performs step (1) above for C ,  and issues a MATCH instruction as in 
step (2), except that the fourth argument is ti-1,O). For primary processors not receiving a marked cell, no 
MATCH instruction is generated. 
The marking process continues until the processors receive the "end-parse" signal. For the sample 
grammar and input string given in Figure 2.1, the configurations of the P-array and Q-array for the n 
reverse sweeps (n = 4) is shown in Figure 5.2. In particular, at the end of reverse sweep n,  the Q-array 
would have stored in its p registers a parse m e  of the input string. This parse tree can then be read off 
directly from the Q -array or pipelined out of the Q -array to a host computer. We omit the steps involved 
as they are relatively straightforward. 
The clock cycle at which the "end-parse" signal is received represents the end of the stage for each 
processor of the P-array. On the other hand, the Q-array performs another step which involves the update 
of the ldone, done and rdone registers of its processors. This is accomplished as follows: At reverse 
sweep n (which is also when it receives the "end-parse" signal), processor Q(n,n) sends out an "update" 
control signal to all other processors of the Q-array, this signal traveling diagonally downwards with 2- 




delay and to the right with unit-delay. For processors on the top row of the Q-array (i.e., processors 
Q(j,j), 1 S j l n), the following is performed when they receive the "update" signal: set 
ldone = rdone = done = frue and send the contents of done diagonally downwards with 2delay and verti- 
cally downwards with unit-delay. For a processor in a lower row, one diagonal input and one vertical 
input would arrive at the time it receives the "update" signal. The processor then does the following: 
(1) If its p register does not contain a production, then clear its done, ldone and rdone registers and 
route the vertical and diagonal inputs to the next vertical and diagonal processors below it, respec- 
tively. 
(2) If its p register contains a production, then set l h n e  to the value of the vertical input and rdone to 
the value of the diagonal input. Set rdone to true iff (i) ldone = r h n c  - true, ( i i )  last-id - true, 
and (iii) the p register contains a distinguished production K. Otherwise, set done to false. Route 
the contents of done vertically and diagonally downwards. 
For example, after the update step, the Q-array would have the configuration shown in Figure 5.3. After 
the update step for processor Q(l,n), it sends the contents of all of its local registers to processor P (n,n) 
of the P -array to begin the next stage. In addition, processor Q (1,n) sends a signal to all processors of the 
Q -array, this signal traveling upwards and to the left with unit-delay. When received by a Q -processor, it 
sends the contents of all its local registers to the processor to its left and receives the updated values from 
the processor to its right. The effect is that the entire parse tree is shifted out of the Q -array and pipelined 
into the primary processors of the P -may (using the toroidal connections; see Figure 3.1). 
Stage k > 1. Each subsequent stage after stage 1 effectively starts at the clock cycle when processor 
P (n ,n) receives an input from processor Q (1,n). At this clock cycle, processor P (n ,n ) sends the "begin- 
parse" to all other processors to start reverse sweep 1 of the new stage. The data routing and compua- 
tional steps performed during the stage are identical to those in stage 1, except for primary processors 
which now receive inputs from the Q-array. For convenience, we assume that the input to a primary pro- 
cessor is of the form I = @, Idone, done, rdone , id, lasf - id ). The MATCH instruction issued by a pri- 
mary processor now depends on this input. The main thing to note is that if a primary processor holds an 
entry R (a ,b) then input I represents the register contents of processor Q(a ,b) after the update step of the 
preceding stage. In particuIar, if argument p of the input holds a production n, then n is in R (a ,b) and is 
part of the parse tree last generated. The rest of the arguments of the input are used by the primary pro- 
cessor to determine how the next parse tree would be generated, in a manner similar to that performed by 
procedure PARSE. 
The steps executed by a primary processor are as follows. (It is instructive to compare these steps 
with procedure PARSE). At reverse sweep 1, processor P(n,n) sets sym(Col) = S as before. This time, 
however, it checks the value of done from input I. If done = true, then the parse tree from the previous 
stage is the last one and P (n ,n ) sends a signal to all processors to halt all computation. If done = false , 
then there is a next parse tree, in which case processor P (n ,n) sets fag (Col) to NEXT. 
The following steps are aIso performed by every primary processor P u , j )  that has a marked cell C 
(C is CO1 for processor P (n ,n )): 
Figure 5.3. Register contents of the Q -array after the update step of stage 1. 
( 1 )  If tag (C ) = CURRE'NT, then output MATCII @ , (CURRENT,CURREhT), i d ,  last - id ) ,  where p , i d ,  
and Iastjd are from input I .  
(2) If tag ( C )  = FIRST, then locate the first production x E psct ( C )  such that LHS(n) = sym(C). If this 
also the last such production, distinguish n as R. Output MATCI-l(n (or it), (FIRSTJIRST), 0'-1,0), 
true ). 
(3) If tag ( C )  = NEXT, check input I and do the following: 
(a) If rdone = f a l s e ,  output MATCH @, (CURRENTJVEXT), i d ,  lait - id ) .  
(b) If rdone = true and ldone = false , output MATCH @ , (NEXT ,FIRST), i d ,  lasf-id ). 
(c) If rdone = [rue and ldone = [rue, check last - id. If last - id = false, then output 
MATCfi(p, (FIRST ,FIRST), id', [rue ) where i d  is defined as follows: if id = (1,O) then 
id' = (1 ,I); if id = (1,l) then id' = (l-1,O) (this simply moves the starting point of the search 
to the next cell-pair). If /as[ - id = [rue, then locate the next production x in psel (C), after the 
one stored in p ,  such that LHS(x) = sym(C). If this is also the last such production, distin- 
guish x as E.  Output MATCH (x (or E ) ,  (FIRST FIRST), (j-LO), true ). 
If a primary processor does not receive a marked cell, then it ignores input I and does not issue a MATCH 
instruction; this produces the same effect as subroutine UNMARK in procedure PARSE. 
Figure 5.4 illustrates the configurations of the systolic array for the n reverse sweeps of the second 
stage. At the end of reverse sweep n ,  a new parse uee would be stored in the Q -array. As in stage 1, an 
update step is performed for the done, ldone and rdone registers of the Q-array; the result is shown in 
Figure 5.5. After this update step, the next stage is ready to begin. 
Remark 5.1. In general, the systolic algorithm generates the parse trees in an order different from pro- 
cedure PARSE. The reason is that, because of the "folded" mapping from convolving pairs to secondary 
processors, the pairs are considered in a different order. Nevertheless, each stage always generates a new 
parse tree. 
6. Complexity Analysis 
Since the underlying context-free grammar is in Chornsky normal form, every parse tree has size 
(number of nodes) 2n-1, where n is the length of the input string. We show that the systolic parsing algo- 
rithm runs in time 0 (m . n), where m is the number of distinct parse trees of the input string. The recog- 
nition phase is completed after 3n-1 clock cycles (see Remark 4.1). One can also verify that the "begin- 
parse" signal from processor P (n ,n) that starts each stage occurs every 6n-3 clock cycles. Thus, the run- 
ning time of the systolic array is 3n -1 + m . (6n -3) = 0 (m - n ). 
The systolic array has 0 (n2) processors. Each processor requires at most 0 (log n )  space. Thus, the 
total space complexity is 0 (n2 log n). This is considerably more space-efficient than the systolic parsing 
I 
algorithm given in [LANG~~], which uses 0 (n3 log n )  space. In fact, one can do better for certain special 
cases. As mentioned in Remark 4.1, each processor uses only constant space if only the recognition phase 
is performed. This is in fact also true if only one parse tree is required as output. The id registers of the 
Q-processors, which are the only registers that hold log n bits, are not necessary since the information 
stored in these registers are only used after stage 1. Thus, to output the first parse tree, 0 (n2) total space 
is sufficient. 




Figure 5.5. Register contents of the Q -array after the update step of stage 2. 
References 
Aho, A. V. and J. D. Ullman, The Theory of Parsing, Translation and Compiling, Vol. 1, 
Parsing, Prcnticc-Hall, Englcwood Cliffs, NJ., 1972. 
Chang, J. H., 0. H. Ibarra, and M. A. Palis, "Parallcl parsing on a one-way array of finitc- 
state machincs", IEEE Transactions on Computers, 36: 1 (1987), 64-75. 
Chiang, Y. T. and K. S. Fu, "Parallcl parsing algorithms and VLSI implcmcntations for syn- 
tactic pattcrn recognition", IEEE Transaclions on Pattern Analysis and Machine Intelligence, 
6:3 (1984). 302-3 14. 
Earley, J., "An efficient context-free parsing algorithm", Communications of the ACM, 13:2 
(1970), 94-102. 
Guibas, L. J., H.-T. Kung, and C. D. Thompson, "Direct VLSI implcmentation of combina- 
torial algorithms", Proceedings Caltech Cogerenee on VLSI, 1979. 509-525. 
Kosaraju, S. R., "Spccd of rccognition of contcxt-free languages by array automata", SIAM 
Journal on Computing, 4:3 (1975). 331-340. 
Langlois, L., "Parallcl parsing on an array of processors", Internal Report CSR-200-86, 
Department of Computer Science, Universily of Edinburgh, July, 1986. 
Ryttcr, W., Thc complcxity of two-way pushdown automata and rccursivc programs, in Com- 
binatorial Algorithms on Words, A. Apostolico and Z. Galil (cds.), NATO AS1 Series F:12, 
Springer-Vcrlag: Ncw YorWcrlin. 
Valiant, L., "Gcncral contcxt-lrcc recognition in less than cubic time", Journal of Computer 
and Systems Sciences, 102 (1975). 308-315. 
Youngcr, D. H., "Recognition and parsing of context-free languages in tirnc n3", Information 
and Control, 10:2 (1967), 189-208. 
