This letter presents the design and analysis of a bit-sequential array for pattern matching applications. The architecture makes multiple use of each data sample, has built-in concurrency and pipelining, and is based on a highly modular design with only nearest neighbor connections between array modules. The array computes all occurrences of a pattern of length m, in the string of length n, in O(m + n) time and O(m) hardware. The pattern and the string are fed in sequentially and the match indicators come out in the same fashion, leading to a significant reduction in silicon area.
I. INTRODUCTION
Text-editing, visual processing, and signal reconstruction often require searching through a string of characters or bits, looking forinstancesofagiven"pattern"string.Theobviouswaytosearch for a matching pattern is to begin searching from the first position in the text and proceeding till a mismatch is found, in which case the starting position is advanced by one. This approach is very inefficient and if all possible matches in the string are to be determined, the worst case time needed is O(mn), where m is the length of the pattern and n is the length of the string.
Researchers In this letter, a bit-sequential systolic pattern matching architecture is described. Based on a linear array, this structurecan find all occurrences of a pattern of length m in a string of length n, in a m + n) time and O(m) hardware complexity. The characters in both the pattern and the string are assumed to be bits here but the hardware can be easily replicated for parallel matching in case the characters are words and the same time efficiency is desired.
Let the pattern P, the string S, and the output flag vector R be, respectively, represented as ( 2) rj values for j = 0,1, . , m -2 are ignored and other rj's are re-
loss of generality we can assume m = 2M and express (2) as
where o stands for exclusive-or, 1 is the logical not, and ll is a by AT&T Information Systems.
Department, Lehigh University, Bethlehem, PA 18015-3084, USA.
Manuscript received March 31, 1986. This work was partially supported
The authors are with the Computer Science and Electrical Engineering multioperand AND. Architecture to evaluate (3) is presented in the next section.
II. BIT-SEQUENTIAL LINEAR ARRAY ARCHITECTURE
Let the input bit-string { x j } be fed into the architecture bit-serially, with x, first and the output bit-string {rj} be extracted bitserially, with ro first. At this point we assume tht the pattern string {aj) is already loaded in the architecture and does not move as do the { x j ) and {rj} strings. Later we will show that even { a j } could beloaded bit-seriallywithoutcompromisingonthetimeefficien~. The advantages of such "Bit-Sequential" architectures are that they can be pipelined to the bit level permitting a very high clocking rate, they use repeated modules with nearest neighbor intermodular communication paths of width 1, and they have the fewest possible input and output pins. These characteristics make these architectures ideal for Very Large Scale Integration NLSI) implementation [SI, [6] .
Let Z denote the delay operator associated with data input rate. Then (3) may be transformed into
M -1 i = O
Since the pattern string {aj} is time-invariant, i.e., it does not flow in the array with time, the delay operator can be distributed as
M-1 , = 0
This expression shows that each flag F( j ) is computed by Awing outputs of two multioperand ANDS. The i t h operand for each multioperand AND is obtained, in turn, by delaying by i time units the result of the comparison of a(2i) (or a(2i + 1)) and x( j ) delayed by i time units. This immediately suggests the architecture shown in Fig. 1 . We now show that the output of this architecture provides us the desired flag vector. Theorem: The output of this architecture O ( j ) at timej is related to the flags as q j + 2M) = f ( j ) .
Proof: Let the i t h module of the upper bank of Fig. 1 hold a(2(M -i) -2) and the lower one, a(2 (M -11 -1) . Further, let
o s i I M -1, j 2 0, k = upper or lower be the value of the variable VAR for the i t h module at thejth clock. Index & associates the variable with the upper or the lower array. The geometry of the architecture then yields
and similarly,
We now have
Using recursion on (4)
Finally (5)- (7) together yield
Q.E.D.
DISCUSSION AND CONCLUSIONS
The pattern matching architecture proposed here can be very easily implemented. The function f [a, b] is built as -(a e b), where 1 represents ahrorand an Exclusive OR. Each modulewill contain this logic to compute f besides three flip-flops (delays) to hold a componentofa(stationary),acomponentofx(movingtotheright), and a component of the flag (moving to the left). The architecture can thus be clocked at speeds determined by the delay in the computation of fand the flip-flop hold and set up times. Thus the clocking speed is independent of the size of the array.
The concept of "wildcards" (pattern characters that match with everything) can be incorporated in this architecture by using a 2-bit representation for each pattern component and modifying function f as
It was assumed earlier that the a-sequence is stationary and is already resident in the architecture when x(0) comes in. But this is not necessary. If one provides an additional path between modules, the acoefficients may be fed serially (even acomponents to the upper bank and odd to the lower) along with the first M components.of x. This is an ideal situation because it does not affect the output delay (the first M clock periods cannot, even otherwise, be used to compute any useful flag component) but gets rid of any loading time for the array. Thus all the matches between an mlength pattern and an n-length string can be obtained in this q m ) hardware in O(m + n) time. Control of this architecture is very simple since no flip-flops need to be cleared or preset.The modularity, nearest neighbor connections, data buses of width 1, and the simple control makes this architecture suitable for VLSl implementation.
