Sequence information signal processor by Chow, Edward T. et al.
I11111 111ll Il1 Il11 III III III 11111 III III 1ll1111111 
US005964860A 
United States Patent [19] [ i l l  Patent Number: 5,964,860 
Peterson et al. [45] Date of Patent: Oct. 12,1999 
SEQUENCE INFORMATION SIGNAL 
PROCESSOR 
Inventors: John C. Peterson, Alta Loma; Edward 
T. Chow, San Dimas; Michael S. 
Waterman, Culver City; Timothy J. 
Hunkapillar, Pasadena, all of Calif. 
Assignee: California Institute of Technology, 
Pasadena, Calif. 
Appl. No.: 08/831,798 
Filed: Apr. 8, 1997 
Related U.S. Application Data 
Continuation of application No. 081154,633, Nov. 18, 1993, 
Pat. No. 5,632,041, which is a continuation of application 
No. 071518,562, May 2, 1990. 
Int. C1.6 ...................................................... G06F 19/00 
U.S. C1. ................................ 712/19; 7121898; 702120 
Field of Search ................................... 702120; 43516, 
4351132.1; 3641495, 496; 3951800.19, 898 
References Cited 
U.S. PATENT DOCUMENTS 
4,698,751 1011987 Parvin ..................................... 3951800 
4,760,523 711988 Yu et al. ................................. 3951800 
4,845,610 711989 Parvin ..................................... 3951800 
5,129,077 711992 Hillis ....................................... 3951500 
5,632,041 511997 Peterson et al. ................... 3951800.19 
OTHER PUBLICATIONS 
“A Systolic Array for Rapid String Comparison” Lipton, 
1985. 
“Identification of Common Molecular Subsequences.” 
Waterman and Smith. 
“Bio Scan: A VLSI-Based System for Bio Sequence Analy- 
sis”, White et al. 1991. 
The Design of Special-Purpose VLSI Chips Foster et al, 
1980. 
“On High-speed Compating with a Programmable Linear 
Array”, Lee et al, 1988. 
“Why Systolic Architecture” Kung 1982. 
“The BioScan Project: An Interdisiplinary Approach to 
Biosequence Analysis” White et al, Oct. 1989. 
“A New Algorithm for Best Subsequence Alignment with 
Application to tRNA-rRNA Comparisons” Waterman et al. 
1987. 
Primary ExaminerAarry D. Donaghue 
Attorney, Agent, or F i r m C i s h  & Richardson P.C. 
[571 ABSTRACT 
An electronic circuit is used to compare two sequences, such 
as genetic sequences, to determine which alignment of the 
sequences produces the greatest similarity. The circuit 
includes a linear array of series-connected processors, each 
of which stores a single element from one of the sequences 
and compares that element with each successive element in 
the other sequence. For each comparison, the processor 
generates a scoring parameter that indicates which segment 
ending at those two elements produces the greatest degree of 
similarity between the sequences. The processor uses the 
scoring parameter to generate a similar scoring parameter 
for a comparison between the stored element and the next 
successive element from the other sequence. The processor 
also delivers the scoring parameter to the next processor in 
the array for use in generating a similar scoring parameter 
for another pair of elements. The electronic circuit deter- 
mines which processor and alignment of the sequences 
produce the scoring parameter with the highest value. 
14 Claims, 18 Drawing Sheets 
https://ntrs.nasa.gov/search.jsp?R=20080004483 2019-08-30T02:28:29+00:00Z
U S .  Patent Oct. 12,1999 Sheet 1 of 18 5,964,860 
U S .  Patent Oct. 12,1999 Sheet 2 of 18 5,964,860 
rr) n 
(u n 
*! 
P 
W 
U S .  Patent Oct. 12,1999 Sheet 3 of 18 5,964,860 
P n 
rr) n 
N - - 
I 
U S .  Patent Oct. 12,1999 Sheet 4 of 18 5,964,860 
, 
U S .  Patent Oct. 12,1999 Sheet 5 of 18 5,964,860 
N - - 
I 
C 
U S .  Patent Oct. 12,1999 Sheet 6 of 18 
m, f!! 
u" z" 
5,964,860 
U S .  Patent Oct. 12,1999 Sheet 7 of 18 5,964,860 
(\I .. - 
I 
W -- f-l / 2 -  r" 
*? 
-r 
W 
U S .  Patent Oct. 12,1999 Sheet 8 of 18 5,964,860 
N - - 
I 
A /  
U S .  Patent Oct. 12,1999 Sheet 9 of 18 5,964,860 
W I  w r  W I  w x  
LL LL LL 
n n n 
r I I 
U S .  Patent Oct. 12,1999 Sheet 10 of 18 
+ 
3 
W 
5,964,860 
U S .  Patent Oct. 12,1999 Sheet 11 of 18 5,964,860 
U S .  Patent Oct. 12,1999 Sheet 12 of 18 5,964,860 
MAX (O,E,F) '' 
el MUX SA 
1 
3 
U S .  Patent Oct. 12,1999 Sheet 13 of 18 5,964,860 
5 :  
T 
A B 
MAX GT emf log 
I 
/ A B 
/ 16 
MAX GR 
sc maxx I 
FIG. 13 
I 
0 
0 
m 
i 
U S .  Patent 
PROCESSOR CLK CHAR IN 
Oct. 12,1999 
a A 
MAX ENAELE 
b -_j 
Sheet 14 of 18 
' CONTROL 
LOGIC 
5,964,860 
MAXEN > 
LOCATION > 
MA x IMU M > 
CHAR 
INPUT) 
F) 
H )  
FROM dBUS I 
MAX EN MAXEN - 
- LDCATlON LOCATlON . 
MAX MAX 
CHAR IN CHAR OUT 
' F  F .  
H H 
PROCESSOR 1 
L, 
- 
I DATA IN u+v I 
I V 
u+v I 
I V 1 UTVE UTVF V E  VF 
& & & &  
TO A U  PROCESSORS 
FROM d@US 
din 
MOD-I6 COUNTER 
I 
FIG. 14 
U S .  Patent 
C’ 
Oct. 12,1999 
FROM dBUS 
Sheet 15 of 18 
SEQ THRES 
OUT 
TO dBUS 
PFEu)oD T H E S  
aJT 
5,964,860 
- 
L 
I 1 I 
BLOCK COUNTER 
ENABLE OUT 
SEQ COUNTER 
ENABLE OUT 
w I 
1 
r 
A IS GREATER 
B A 
>REAL THRESHOLD 
i 
---MAXEN MAX EN 
- LOCATION LOCATION 
- MAX MAX 
L-CHCHAR IN CHAR WT. 
- F  F .  
-H H 
c 0 . .  
PROCESSOR 2 
- 
\ 
MAX EN 
LDCATDN 
MAX 
*CHAR IN CHAR CUT- 
J F  
- H  
PROCESSOR 16 
1 
FIG. 15 
e 1 
U S .  Patent Oct. 12,1999 Sheet 16 of 18 5,964,860 
-+ 
1 
U S .  Patent Oct. 12,1999 Sheet 17 of 18 5,964,860 
/ '  
0 0 0 0 0 0 0  0 0 0 0 0 0  0 0 0 0 
0 
0 
0 
0 
0 
/ O  0 
0 0 
0 0 
0 0 
0 0 
0 0 
0 0 
0 0 
0 0 
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0  
O O O O ~ O O O O O O O ~ O U ~ O R ~ O O U U ~  
e . .  0 . .  
0 0 0 0 0 0 0 0 0  
0 
0 
0 
0 
0 
0 
0 
0 
0 
0 
0 
0 
0 
0 
0 
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0  
0 
0 
0 
0 
0 
0 
0 
0 
0 
0 
0 
0 
0 
0 
0 
. . a  
00ClOnoonnnoooooonooooooo 
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0  
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0  
FIG. 17 
U S .  Patent Oct. 12,1999 
0 0 0 0 0 0 0 0 0  
0 
0 
0 
0 
0 
0 
0 
0 
0 
0 
0 
0 
0 
0 
0 0 0 0 0 0 0 0 0  
Sheet 18 of 18 5,964,860 
0 0 0 0 0 0 0 0 0  
0 
0 
0 
0 
0 
0 
0 
0 
0 
0 
0 
0 
0 
0 
0 
H,F,b LOC MAX, 
MAXEN, PIPEN Y 
i 
3 BUS 
FIG. 18 
5,964,860 
1 2 
SEQUENCE INFORMATION SIGNAL 
PROCESSOR 
process will grow not only at the national level of coordi- 
nating the efforts of many researchers, but also at the level 
of individual laboratories that must deal with the increasing 
load of raw data generated by the development of automated 
filed Nov. 18, 1993, now U.S. Pat. No. 5,632,041 which is 5 sequencing 
a continuation of Ser. No. 07,518,562, filed May 2, 1990. The ability of individual investigators to analyze their 
The invention described herein was made in the perfor- own data is limited by the power of the computers they have 
mance of work under the following contracts: NASA con- available, as well as the limited software tools capable of 
tract NAs7-918; DOE contract DEFG03-88er60683; NSF dealing with the entire sequence library. The amount of total sequence data generated to date is still less than 50 million 
contracts D1R-8809710 and DMS-8815106; and NIH ‘On- lo character equivalents. However, this amount of data already 
Law 96-517 (35 eral use computers to conduct the needed comparative 
analysis of new data to the collected total. The data libraries elected to retain title. 
have been doubling in size every year. The program that is 
envisioned to characterize complete genomes, will soon 
cause the data libraries to increase exponentially. Such 
This is a continuation of application Ser. No. 081154,633, 
tract GM 36230, and is subject to the provisions Of taxes the ability of currently available algorithms and gen- 202) in which the Contractors have 
ORIGIN OF INVENTION 
1. Technical Field 
The present invention relates generally to an integrated 
circuit developed primarily in support of the human genome 
sequencing the human genome, The present invention 2o 
programs 
data and 
change the basic nature Of the 
the requirements for effective 
effort which is a molecular genetic analysis for mapping and 
relates specifically to an integrated circuit co-processor entry can span Over One bases. Many Of 
the current methods of analyzing this data are based on the which may be used for carrying out an algorithm for notion that each entry represents a discrete genetic element. identifying maximally similar sequences or subsequences However, this scenario does not adequately represent the 
and for locating highly similar segments of such sequences 25 diffuse and complex organization of a eukaryotic 
genome, where the coding and regulatory elements of a or subsequences. 
2. Background Art simple gene can span more than one million bases. More 
Release 63.0 of the national nucleic acid data base, complex loci, such as those coding for the rearranging 
Genbank, contains over forty million nucleotides represent- receptors of the immune system, can span over one million 
ing about thirty-three thousand separate entries. Similarly, 30 bases and include hundreds or thousands of identifiably 
the current protein information resource (PIR) has close to related elements. As more and larger sequencing efforts are 
six thousand entries with Over one and one-half million undertaken, the complexity of information contained in 
amino acids, These data reflect primarily the efforts of the single entries will require a novel set of maintenance and 
molecular biology community over the last decade. The rate analytical tools. 
at which new data are being added to this total demonstrates 35 The human beta globin locus is a good example. Its entry 
that the available computing resources are already inad- in Genbank is over 73 thousand bases long and has been 
equate for thorough and timely analysis of the data. constructed from over 70 overlapping contributions. This 
Recently, an international commitment has been made to single entry contains the coding and regulatory information 
map and sequence the entire human genome in the next 10 for at least 4 genes and 1 pseudogene. The repetitive nature 
to 20 years. Such a program will generate at least 3.4 billion 40 of much of the genome will also severely complicate the 
nucleotides of final data and maybe ten times that amount of alignment and melding problems. With megabase sequenc- 
raw sequencing data. This constitutes about three orders of ing projects, the current concept of data entry will become 
magnitude more data than has been collected to date. In obsolete. Not only will faster algorithms to compare 
addition, the sequences from other animal and plant sequences be needed as the amount of data increases, but 
genomes will also accumulate. In the near term, the 40 45 these new tools will also have to be designed to better deal 
million nucleotides currently available and already proving with longer strings of data that more directly reflect true 
burdensome, will become trivial by comparison to the total. genomic organization. Accordingly, novel schemes to 
Novel computer resources must be developed if these data handle and define these data and the biological information 
are to be adequately understood and their unique potential associated with them must be developed if this resource is to 
for enhancing our understanding of human genetics and 50 be useful to the scientific community. 
diseases are to be realized. Of the many pressing and analytical needs concerning the 
A required adjunct to any program designed to charac- current sequence data libraries, as well as the genome 
terize the human genome is the development of computer project, initially the most significant is the ability to survey 
hardware and software systems capable of maintaining and the existing collection of data for sequences related to the 
analyzing the vast amounts of information that will be 5s new data. In its simplest form, this need is illustrated by 
and amino acid sequence data as well as extensive annota- that are “similar” to a discrete piece of new data. The 
It is critical for the complete and timely analysis of new are critical for completely understanding the structural, 
sequence data, that they be thoroughly compared to the 60 functional and evolutionary characteristics of any sequence. 
published data contained in the national data libraries. This Furthermore, in the case where large portions of the human 
analysis is important for determining and defining the func- genome are known, it will also be necessary to have the 
tional and evolutionary relationships between sequences. ability to find the precise genetic location of physiological 
Significantly, such sequence comparison is also critical to markers in those cases where there may be only limited 
the task of constructing the complete genome sequence from 65 CDNA or protein sequence data available. 
millions of partially overlapping fragments, the so-called Such searches are complicated by the fact that related 
melding process. The computational load of this melding sequences may be quite divergent. This means that it is 
for its 
In the latest Genbank the average length Of an 
generated. This information will consist of both nucleotide 
tion necessary to provide a biological context for these data. 
searching the collection of gene or protein sequences for any 
comparative analyses possible between related sequences 
5,964,860 
3 
essential to define some measure of similarity between pairs 
of sequences that can then be tested statistically. The explicit 
series of minimal evolutionary events (substitutions, 
deletions, insertions) between two sequences must be deter- 
mined; i.e., the sequences must be aligned. Traditionally, the 
most common method of alignment has been by eye, relying 
on the researcher’s ability to recognize conserved patterns. 
This method can be rapid and effective when the sequence 
distance is relatively small and/or the researcher has a priori 
information about the probable nature of the alignment. For 
example, many new members of the immunoglobulin gene 
superfamily have been identified and aligned to other mem- 
bers on the basis of a very limited, but well-defined set of 
conserved features. However, it is certainly no longer pos- 
sible for any investigator to reliably compare a novel 
sequence against a significant portion of the existent data 
base. 
It is possible in theory to generate every possible combi- 
nation of genetic events between two sequences, score each 
one and discover the most similar. This is in practice, 
impossible for all but the shortest sequences however, as the 
combinations increase exponentially with the length of the 
sequences. Some investigators have implemented rule-based 
methods by which, given a reasonable starting alignment 
point, gaps and insertions are included according to a very 
restricted set of possibilities. These methods can be rela- 
tively rapid, but, like manual alignment, are non-rigorous 
methods as they cannot predictably guarantee that the results 
represent the optimal minimum distance, that is, the mini- 
mum evolutionary distance between two sequences or the 
series of events that provides the smallest weighted sum 
required to transform one sequence into the other. 
When the assumption is that two sequences are generally 
similar along their entire length, the alignment process is 
considered to be global in nature. However, an alignment 
proceeding from this premise can fail to recognize more 
limited regions of similarity between two otherwise unre- 
lated sequences. What is required then is the ability to find 
all regions of local alignment. For example, if an investi- 
gator has a new sequence related to a human beta globin 
gene, such as one from another species, the need is to be able 
to find the local alignment of that, more limited sequence to 
some particular portion of the 73 thousand base of the 
known beta globin locus. The same concerns are manifest in 
the melding problem. By definition, most overlapping 
sequences will only share a limited region of identity, 
illustrating a local alignment problem. 
In 1970, S. B. Needleman and C. D. Wunsch authored a 
paper entitled “A General Method Applicable To The Search 
For Similarities In The Amino Acid Sequence Of Two 
Proteins”, which was published in the Journal of Molecular 
Biology, Volume 48, Page 444. Their paper has had a great 
deal of influence in biological sequence alignment. Its 
particular advantage is that an explicit criterion for optimal- 
ity of alignment is stated and an efficient method of solution 
is given. Insertions, deletions and mismatches were allowed 
in the alignments. The method of Needleman and Wunsch fit 
into a broad class of algorithms, commonly referred to as 
dynamic programming. The general category of dynamic 
programming alignment of two sequences is discussed at 
length in a text entitled “Mathematical Methods for DNA 
Sequences” and particularly Chapter 3 thereof, entitled 
“Sequence Alignments” written by Michael S. Waterman, of 
the University of Southern California, a co-inventor of the, 
present invention. 
In 1980, Dr. Waterman, then with the Los Alamos Sci- 
entific Laboratory, collaborated with T. F. Smith, then a 
4 
Professor at Northern Michigan University, in publishing a 
letter entitled “Identification of Common Molecular Subse- 
quences” which appeared in the Journal of Molecular 
Biology, Volume 147, pages 195-197, 1981. In this letter, 
s Waterman and Smith defined a new algorithm, the intention 
of which was to find a pair of segments, one from each of 
two long sequences, such that there was no other pair of 
segments with greater similarity (or “homology”). The algo- 
rithm produced a similarity measure which allowed for 
i o  arbitrary length, deletions and insertions. 
In a more recent publication, entitled “A New Algorithm 
for Best Subsequence Alignments With Application to 
tRNA-rRNA Comparisons”, Waterman and Mark Eggert, in 
the Journal of Molecular Biology, Volume 197, pages 
15 723-728, (1987), describe the efficiency of the algorithm of 
Smith and Waterman for identification of maximally similar 
subsequences. The article describes the use of the algorithm 
in which alignments of interest are produced first for the best 
alignment and then making small modifications to the matrix 
20 for producing non-intersecting subsequent alignments. The 
algorithm is applied to comparisons of tRNA-rRNA 
sequences from Escherichia coli. A statistical analysis 
therein shows results which differ substantially from the 
results of an earlier analysis by others and furthermore, that 
25 the algorithm is much simpler and more efficient than those 
previously in use. 
The need for low cost, high speed data sequence com- 
parisons cannot be met even with current supercomputers 
because of existing data base size. There is therefore an 
30 existing need to provide an electronic circuit device for 
carrying out subsequence alignments of molecular 
sequences or global alignment thereof and more specifically 
for a sequence information signal processor designed to 
carry out a dynamic programming algorithm which is both 
35 effective and efficient in identifying subsequence or global 
alignments of molecular information. 
SUMMARY OF THE INVENTION 
40 The present invention comprises a sequence information 
signal processing integrated circuit chip designed to perform 
high speed calculation of a dynamic programming algorithm 
based upon Waterman and Smith. The signal processing chip 
of the present invention is designed to be a building block of 
45 a linear systolic array, the performance-of which can be 
increased by connecting additional sequence information 
signal processing chips to the array. The chip provides a high 
speed, low cost linear array processor that can locate highly 
similar segments or contiguous subsequences from any two 
50 data character streams (sequences) such as different DNAor 
protein sequences. The chip is implemented in a preferred 
embodiment using CMOS VLSI technology to provide the 
equivalent of about 400,000 transistors or 100,000 gates. 
Each chip provides 16 processing elements, operating at a 
55 12.5 MHz clock frequency. The chip is designed to provide 
16 bit, two’s compliment operation for maximum score 
precision of between -32,768 and +32,767. It is designed to 
provide a comparison between sequences as long as 4,194, 
304 elements without external software and between 
60 sequences of unlimited numbers of elements with the aid of 
external software. 
The sequence information signal processor chip of the 
present invention permits local and global similarity 
searches, that is subsequence and full sequence alignment. It 
65 provides user definable gapsiinsertion penalties; user defin- 
able similarity table contents; user definable threshold val- 
ues for score reporting; user definable character set of up to 
5,964,860 
5 6 
128 characters; user definable sequence control characters 
for streamline data base processing; variable block size for 
low or high resolution similarity searches; makes possible 
unlimited sequence length and numbers of b l o c k  On-chiP 
block maximum score calculation; and on-chip maximum s 
score buffer to relieve control processor data collection. It 
provides linear speedup by being configured for cascading 
more such chips and it provides threshold control with 
FIG. 11 is a generalized, functional block diagram of a 
processor of the present invention; 
FIGS, 12 and 13, when taken together, represent a block 
diagram of an actual processor of the present invention; 
FIGS, 14 and 15, when taken together, a 
schematic block diagram of the chip circuit of the present 
invention; 
boundary score reset, The chip also provides for program- 
mable data base operation support; block maximum value 
l6 is a layout schematic the physical 
processing chip Of the invention; 10 Of the 
and location calculation and buffering; user-definable query 
threshold and preload threshold and built-in self test and 
fault bypass. 
It will be seen hereinafter that each of sixteen processor 
elements on a sequence information signal processing inte- 15 
grated circuit chip of the present invention, provides the 
circuitry to compare the sequence characters of a matrix H, 
Algorithm for two sequences. Circuitry is also Provided for 
and 
FIGS. 17 and 18 taken together Provide a dependence 
graph mapping for multiple chips representing a total of 34 
processors Of the present invention. 
DETAILED DESCRIPTION OF A PREFERRED 
EMBODIMENT 
based upon a novel modification of the Smith and Waterman The information signal processor integrated circuit chip of 
defining the degrees of similarity of two sequences SO that 20 such as two molecular sequences, and to determine their 
different linear deletion functions can be defined for each of similarity by ascertaining the best Score of any alignment 
the two sequences and different similarity weights can be between such sequences, A preferred embodiment of the 
defined for each character of the query sequence. invention illustrated herein is designed to perform this 
In its preferred embodiment, the chip of the present sequence comparison by carrying out the previously iden- 
invention is configured as a 208 pin, CMOS VLSI integrated 2s tified Smith and Waterman algorithm. Accordingly, the 
circuit device. method and apparatus of the present invention may be best 
understood by first understanding the algorithm on which it 
is based and which comprises the following: 
It is therefore a principal object of the present invention For two sequences A=a,a, , , , a, and B=b,b, , , , b,, the 
to Provide a sequence information signal Processing system 30 best (largest) score from aligning A and B is S(A,B). HL, is 
on a single integrated circuit chip for Performing a best defined as the best score of any alignment ending at a, and 
subsequence and global alignments algorithm at high speed, b, or 0. S O ,  
at low cost and with optimum parameter control. 
provide an integrated circuit chip having highly integrated 
segments of two different DNA or protein sequences by 
performing a best subsequence alignment algorithm. 
provide an integrated circuit chip having a plurality of 
processors thereon, each such processor being designed to 
carry out an algorithm for providing scoring of the relative 
alignments of sequence segments for such uses as biological 
information signal processing, speech recognition, 4s 
recognition, large text database searches and other applica- 
tions which require the comparison of multiple sequences of 
data. 
the present invention is designed to compare two sequences, 
OBJECTS OF THE INVENTION 
It is an additional object of the present invention to 
VLSI technology for ascertaining the similarity between two 
H,,l=max{O; S(a,a,+l . . . a,, byby+l . . . bl); l<=x<=i, l<=y<=j}. 
3s 
The between sequence letters a and 
is 
s(a,b)>O if a=b 
s(a,b)<O for at least Some cases of a not equal to b. It is still an additional object of the present invention to 4o 
The similarity algorithm is started with: 
H,,,=H,,l=O, 1 <=i<=n, 1 <=j<=m. 
Then: 
cryptology, geological strata analysis, handwriting H,,l=max{O> H,-l,l-l+s(a,kJl)> E,,l> F J  
where: 
E,=m..{H~,l~l-(u€+v€), Ec,l-l-v€l 
BRIEF DESCRIPTION OF THE DRAWINGS so 
The aforementioned objects and advantages, as well as F, ,=m..{H,-, l-(++vF)> F,-l 
additional objects and advantages thereof, will be more fully From the above, it will be Seen that each processor for 
understood hereinafter as a result of a detailed description of determining the best H,, of an alignment ending at a, 
following drawings in which: H,,,,; and HI+,,+,. This requirement for generating param- 
FIG. 1 is a graphical illustration ofthe matrix elements of eters for subsequent best score calculation processes may be 
the algorithm of the Present invention and illustrating a better understood by reference to FIG. 1, which for purposes 
Projection technique for reducing the number of real time of example, illustrates a four-by-four matrix of calculations 
processors for carrying out the algorithm; 60 for n=4 and m=4. It will be seen in FIG. 1 that each 
FIGS. 2-9 illustrate sequential snapshot representations alignment comparison process is represented by a circle 
of the algorithm steps of the present invention in a four-by- having within it elements of the two sequences, A and B, at 
four exemplary matrix; which the respective alignments are being scored. It will also 
FIG. 10 is a graphical schematic illustration of the manner be seen in FIG. 1, that parameters are passed either from left 
in which the architecture of a processor of the present 65 to right or from top to bottom or diagonally from upper left 
invention performs the algorithmic steps for a particular to lower right from each alignment process circle to the 
matrix element; others in the matrix in order to carry out the algorithm of the 
a preferred embodiment when taken in conjunction with the 5s and b, must provide parameters for the calculation of H,,,,,; 
5,964,860 
7 
present invention. Thus for example, it will be seen in FIG. 
1, that the best score for the alignment ending at a, and b,, 
receives the parameter H,,, from the a,,b, comparison 
process; receives H,,, and F,,, from the a,,b, comparison 
process; and receives the H,,, and E,,, parameters from the 
a,,b, process. All of these parameters are, in accordance 
with the Waterman and Smith algorithm, required to gener- 
ate H,,, which is defined as the best score of the alignment 
of the A and B sequences ending at a, and b,. 
It will also be seen in FIG. 1, that as a result of the 
computation carried out by the process at a,,b, parameters 
H,,,, E,,, and F,,,, all resulting from the best score align- 
ment computation at a,,b, are transferred as required to each 
of the three subsequent comparisons a2,b3,a3,b2 and a,,b,. 
Based upon the need for the generation of parameters for 
best score alignment comparisons for previous values of ai 
and b, in the sequences of A and B, it will be seen that not 
all of the best score alignment computation processes can be 
carried out simultaneously. Thus for example, best score 
computation for a,,b, and a,,b, must await the results of the 
computation process for a,,b,. Similarly, the computation 
process for a,, b, must await the results of the computation 
processes for a,,b, a,,b, and a,,b,. Consequently, it would 
be entirely inefficient to perform the algorithm depicted in 
FIG. 1 for an exemplary four-by-four matrix with a separate 
processor for each combination of ai and b,. On the contrary, 
it would be most efficient to use only that number of 
processors which equals to the maximum number of pro- 
cessors being used at any one time, based upon the sequence 
of parameter generation required, as shown in FIG. 1. 
Accordingly, as seen in the right most portion of FIG. 1, the 
Smith and Waterman algorithm for a four-by-four matrix, 
that is for A=a1,a2,a3,a4 and B=b,,b,,b, and b,; may be 
carried out by four computation processors with appropriate 
interconnections to assure the transfer of necessary param- 
eters from processor to processor. 
In the language of VLSI array processor design, the 
left-most portion of FIG. 1 is referred to as a systolic parallel 
processor array and the right-most portion of FIG. 1 is 
referred to as a signal flow graph. The technique for mapping 
algorithms into systolic parallel processor arrays and the 
technique for projecting such graphs into signal flow graphs 
may be understood best by referring to the text entitled VLSI 
Array Processors by S. Y. Kung, published by the Signal and 
Image Processing Institute of the University of Southern 
California, Copyright 1986. 
The signal flow graph of the right side of FIG. 1, illus- 
trates that the systolic processor array graph on the left side 
may be horizontally projected into a signal flow configura- 
tion which requires only four processor elements to carry out 
the four-by-four matrix algorithm. For the example, as 
shown in FIG. 1, each such processor on the right-most 
portion of FIG. 1 is permanently associated with an element 
of the A sequence, namely a1,a2,a3, and a,, respectively. On 
the other hand, the B sequence elements, namely, b,,b,,b, 
and b, , respectively, are sequentially applied in a serial 
manner through the elements so that the first alignment best 
score computation occurs at a,,b,. 
The lines with arrow heads associated with each of the 
elements in the right-most portion of FIG. 1, represent 
parameter values that are either transferred from element to 
element in series or are fed back and used in the same 
element for the next computation. More specifically, FIG. 2 
represents a combined systolic array graph and horizontal 
projection graph at a “snapshot” in time at which the a,,b, 
alignment computation is taking place as represented by the 
dashed line through the a,,b, processor in the left portion of 
8 
FIG. 2. The b, signal has been applied to the first processor 
to permit the computation of the score ending at a,,b,. The 
parameter values emanating from this first sequence com- 
putation are represented by the arrow head lines emanating 
5 from the first processor element shown therein at the right 
most portion of FIG. 2. As seen therein, E,,, and H,,, are 
both fed back into the a element for the subsequent com- 
putation. In addition, the H,,, the F,,, and the b, signals are 
transferred to the next processor element with which a is 
The next subsequent snapshot of sequence operation is 
shown in FIG. 3, and as illustrated by the dashed line in the 
left most portion of FIG. 3, this snapshot finds the top-most 
sequence processor in the right-most portion of FIG. 3, 
operating on the a,,b, computation and the processor below 
the first operates on the a,,b, computation. Each of these first 
two element processors generates appropriate parameter 
signals required by computations in the next snapshot period 
which is shown in FIG. 4, each element with a new value of 
2o b, entering the top-most element and the value of b, pro- 
cessed by the top most element being transferred to the next 
element along with the other required parameters for the 
algorithm. 
This process continues, snapshot after snapshot, as rep- 
resented by FIGS. 5,  6 , 7  and 8. This example illustrates that 
the four-by-four matrix of processors for calculating the best 
score of any alignment between sequences A and B in the 
Smith and Waterman algorithm can be achieved with only 
3o four actual processors operating in an appropriate sequence. 
It, of course, requires the appropriate signals representing 
parameters required by the algorithm to be transferred from 
processor to processor as illustrated in snapshot to snapshot 
sequence of FIGS. 2 to 8. 
The signal flow through four processors represented by 
the right-most portion or signal flow graph portion of FIG. 
9, may be used to carry out all the required steps of the 
algorithm for a four-by-four matrix in seven snapshots or 
clock periods represented by the seven dashed lines of the 
40 left-most portion or systolic processor array portion of FIG. 
9. It will be understood however, that the four-by-four 
matrix of processors of FIGS. 2-9, are presented herein by 
way of illustration only. It would be highly preferable to 
provide many more than four processors in order to be able 
45 to compare sequences having a great deal more than just four 
elements. In fact, it will be seen hereinafter that the inte- 
grated circuit (IC) of the present invention provides sixteen 
such processors. In addition, the architecture of each such IC 
permits the serial interconnection of the sixteen processors 
so on one chip with the sixteen processors on another chip, so 
that a large number of such processors can be tied together 
from chip to chip to provide a long sequence of intercon- 
nected processors. In the present invention, up to 512 such 
processors can be tied together to form a block and up to 
55 8,192 such blocks or 4,194,304 such processors can be 
effectively interconnected without external software. The IC 
chip of the present invention, when operating in conjunction 
with other such chips, can compare sequences as long as 
4,194,304 elements without the aid of external software. 
The logical operations actually carried out by each ele- 
ment of the systolic processor array of FIGS. 2-9 may be 
better understood by reference to FIG. 10. In FIG. 10 the 
computations and parameter generation that occur within the 
a,,b, processor 11 are shown by way of example. As seen in 
65 FIG. 10, in each such processor there are four subtractors, an 
adder and three calculators of maximums. The relevant 
equations are: 
10 permanently associated. 
25 
35 
60 
5,964,860 
10 
In accordance with these equations, the input parameters for indicative of the similarity of lack thereof between ai and b,, 
the a,,b, processor comprise: H,,,, E,,,, H,,,, F,,, and H,,,. referred to previously in the algorithm as the function 
The H,,, parameter is applied to a subtractor to which is also s(ai,b,). This similarity value is generated by a similarity 
applied the value U,+V,, a constant which may be stored table 14, based upon the ai stored therein and the b, input 
within the processor. The parameter E,,, is applied to a therein, from a character register 22, the input to which is 
subtractor to which is also applied the constant value V,. bj+l. 
H,,, is applied to a subtractor to which is also applied the The output of subtractors 24 and 26 are both applied to a 
constant U,+V, and the parameter F,,, is applied to a maximum calculator 34, the output of which by definition is 
subtractor to which is also provided the value V,. The Fij which is an output signal of the processor of FIG. 11 for 
parameter H,,, is applied to an adder to which is also use in subsequent processor. The output of maximum cal- 
supplied a similarity function of a and b, which, as previ- culator 34 is also applied to a maximum calculator 36. Other 
ously indicated, is a constant greater than zero if a, is equal inputs to maximum calculator 36, include the output of the 
to b, and a constant less than zero for a, not equal to b,. adder 28 and a zero signal. The output of maximum calcu- 
The output of the first two subtractors, that is the sub- 25 lator 36 is by definition, the score value signal Hi,, which 
tractors to which the parameters H,,, and E,,, are applied, constitutes the principal information desired from the com- 
respectively, are applied to a maximum value calculator. The parison of two sequences ending at aibj. The output of 
output of this maximum value calculator is, by definition, maximum calculator 36 is also applied to register 18, the 
E,,, and the outputs of the other subtractors are applied to a output of which is thus Hij+l which is, in turn, applied to the 
separate maximum value calculator, the output of which is 30 subtractor 30. Subtractor 30 also receives input U+V. The 
by definition, the parameter F,,,. E,,, and F,,, are applied to output of subtractor 30 is applied to maximum calculator 38, 
a third maximum value calculator to which is also applied the output of which it will be seen hereinafter is E,. 
the output of the adder and a zero signal. The output of this Parameter E, is applied both to the maximum calculator 36 
third maximum calculator is by definition H,,, which is the as an input thereto and also to register 20 in the right-most 
score of the alignment ending at a,,b,. 35 portion of FIG. 11, as an input to that register. The output of 
The functional block diagram of a processor of the present register 20 is thus Eij+l which is applied to subtractor 32 to 
invention for performing the subtractions, additions and which a second input is the constant V. The output of 
maximum calculator functions illustrated in FIG. 10, is subtractor 32 is also applied to maximum calculator 38 to 
shown in FIG. 11. As seen in FIG. 11 at the upper left hand produce the Eij parameter. 
corner thereof, the input parameters are Fi-,j+,, Thus it will be seen that the architecture depicted in FIG. 
the sequence element bj+,. As also seen in FIG. 11, there are 11 carries out the various computations of a single processor 
a plurality of registers, namely a register into which the input for comparing two elements of the sequence A and B in 
parameters are stored for one clock cycle, as well as registers accordance with Waterman and Smith Algorithm, including 
into which parameters generated within the processor of providing the necessary time delay registers, subtractors, 
FIG. 11 are stored for one clock cycle. The purpose of these 45 adder and maximum calculators to receive the appropriate 
registers, as will be seen hereinafter, is to provide the parameters and to generate the parameters for the subse- 
necessary delays in signal transfer to the adder, subtractors quent processor which, in turn, computes the same type of 
and maximum calculators so that the processor carries out its information for two sequence characters. It will be under- 
algorithmic steps in the proper sequence and at the appro- stood that the block diagram of FIG. 11 is of a functional 
priate time and furthermore, so that the various algorithm SO nature only, to indicate the treatment of parameters that 
parameters are available at the appropriate adder, subtractors occur within one processor. However, the actual implemen- 
and maximum calculators when the addition, subtractions tation of a processor is illustrated in FIGS. 12 and 13 taken 
and maximum calculations actually occur. More specifically, in combination. Reference will now be made to FIGS. 12 
it will be seen hereinafter that each register of FIG. 11 and 13 for a more detailed understanding of the actual 
imparts the appropriate amount of time delay in signal flow ss architecture of a processor of the present invention. 
through the processor so that the input of any j parameter The principal differences between the functional block 
occurs simultaneously with the output of a j-1 parameter. diagram of FIG. 11 and the actual block diagram of FIGS. 
Thus for example the Fi-lj+l parameter is input to a register 12 and 13 are the following: Subtractors of FIG. 11 are 
10 which, because of its predetermined delay, outputs simul- actually adders with one of the inputs inverted prior to 
taneously therewith, the parameter Fi-lj. Similarly, the input 60 application to the adder, so that the equivalent operation is 
to register 12, which is occurs substantially simul- a subtraction. Another distinction is that maximum calcula- 
taneously with the output which is Hi-,,,. The output of tors only accept two values, consequently, there are more 
registers 10 and 12 are applied to subtractors 24 and 26, maximum calculators in the actual implementation of FIGS. 
respectively, to which are also supplied the constants, V and 12 and 13 than there are in the functional block diagram of 
U+V, respectively. The output of register 12 is also applied 65 FIG. 11. Still another distinction between the functional 
to a register 16, the output of which is Hi-l,j-l, which is block diagram and the actual block diagram of the processor 
applied to an adder 28. Also applied to adder 28 is a signal of the present invention, is the fact that the latter must 
20 
and 40 
5,964,860 
11 12 
incorporate signals, which in addition to the parameter processors on other chips to which the present chip is 
signal previously discussed in conjunction with FIG. 11, connected. As previously indicated, without the aid of 
must be input and output to permit proper interface from external software, up to 512 processors may be intercon- 
processor to processor, as well as to facilitate appropriate nected to form what is called a block and UP to 8,192 such 
timing of operation, addition, there are at least two 5 blocks may be interconnected without external software to 
es in the actual block diagram of FIGS. One sequence. 
12 and 13 as compared to the functional block diagram of Of the Other Of a processor Of the 
FIG, 11, Specifically, in the actual block diagram, an addi- present invention are designed to provide the requisite 
tional maximum calculator is provided which the information, timing and flow input to and generated 
by the processors. Thus for example in the upper left-hand 
corner of FIG. 14, there is shown a plurality of registers 
which are loaded from a data bus to provide the u+v and 
constants which are needed in all of the processors and 
which represent various values of a linear function, repre- 
senting scoring penalties for insertions and deletions in the 
value of Hij to a preselected threshold value permitting the i o  
logic of the actual process or to ignore any scores which fall 
below the preset threshold value. In addition, the actual 
architecture of the processor of the present invention; pro- 
vides an path through processors in a 
block, as well as an additional maximum calculator in each 15 Smith and Waterman Algorithm, 
Processor of a block, for comparing the maximum value of Also provided in the integrated circuit chip of the present 
each Processor with a maximum value of every other invention is a control logic device which controls the 
processor and propagating a signal which indicates when the application of timing and logic signals to the processors, as 
maximum value of this particular processor is in fact the well as signals which enable block and sequence counters, 
highest Hij of all of the processors in the block. 20 the outputs of which are stored in a maximum memory 
Furthermore, it will be seen that in the block diagram of device shown in the upper right-hand corner of FIG. 15. The 
the actual processor of the present invention, the similarity control logic also controls pause input and output signals 
table of the functional block diagram of FIG. 11, comprises which are used under certain conditions for temporarily 
a random access memory in which the data bus of the chip halting the operation of the processors, such as when maxi- 
brings the character data into the similarity RAM, where it zs mum memory is filled. The processor of the present inven- 
can be either written into the RAM or read out of the RAM tion also provides means for loading a threshold into the chip 
and bj is applied to the addressed terminal of the RAM. In and for utilizing this threshold for enabling storage of 
addition, the similarity RAM is provided with a chip select maximums into memory only when the threshold is 
signal and a readiwrite signal as well as a data output which exceeded. The threshold registers are shown in the upper 
provides the similarity function output from a look-up table 30 left-hand corner of FIG. 15. There is a preload threshold 
in the similarity RAM. A table address signal (TA) is also register which receives its input from the data bus and a 
applied to the address terminal of the similarity RAM sequence threshold register which receives its input from the 
through a multiplexer as a high order five byte address for character port when the chip is to be loaded with a query 
the similarity RAM table. sequence threshold. Also provided is an adder which adds 
Other signals shown used in the block diagram of FIGS. 35 the sequence threshold and the preload threshold to provide 
12 and 13 include location input an d location output, which what is referred to as a real threshold against which the 
provide an indication of the location of the current maxi- scores of the respective processors are compared in a thresh- 
mum value in the block of processors. Maximum enable old comparator. Apair of counters is also provided, namely 
input and maximum enable output signals enable the com- a block counter and a sequence counter. These counters 
parison of the locally generated maximum value with the 40 enable the maximum memory to correlate the maximum 
input maximum value in each processor. A pipeline enable score value with the sequence and the user defined block. A 
signal is used and its state indicates when the F, and Hij physical representation of the layout of the integrated circuit 
values are valid data so that these values can be saved. chip of the present invention is shown in FIG. 16. 
Synchronous clear signals are also input and output to each The sixteen processors are arranged in a serial array 
processor. The synchronous clear input resets the Hij value 45 terminating in a pipeline register. The device in the upper 
so that the maximum value does not exceed the threshold left-hand corner of FIG. 16 is a control block which com- 
value and the synchronous clear output, under certain prises the control logic, counters and registers previously 
conditions, namely when the maximum value generated is described in conjunction with FIGS. 14 and 15. 
greater than the threshold value, sets the H value of the next The interface between integrated circuit chips of the 
processor to zero. However, it will be understood that except SO present invention may be best understood by referring to 
for the timing control and logic control, the use of threshold FIGS. 17 and 18 which provide an exemplary dependence 
and maximum value transfer from processor to processor, graph for 34 processors on three separate chips, the latter 
the functional effect of the actual architecture depicted in being shown on the right side of FIG. 18. Each chip provides 
FIGS. 12 and 13 is identical to that explained previously in 16 processors and a pipeline register. In the dependence 
conjunction with FIG. 11. ss graph the pipeline registers are shown as rectangles which 
The manner in which the processors are integrated in a merely delay the operation between the last processor of one 
chip of the present invention and the other electronics chip and the first processor of the next chip. 
associated with each circuit chip of the present invention The dependence graph of FIGS. 17 and 18 is generally a 
will now be discussed in conjunction with FIGS. 14 and 15 larger matrix version of the graphs of FIGS. 1-9, except that 
which together comprise a functional block diagram of the 60 it includes a sufficient number of processors to demonstrate 
biological information signal processor. Referring therefore the “block edge” behavior based upon a minimum block size 
now to FIGS. 14 and 15, it will be seen that each integrated of 16 elements. This “block edge” behavior is designed to 
circuit chip of the present invention comprises sixteen of the prevent maximum score buffer overflow by resetting “H’ 
aforementioned processors connected in a serial array con- values in the a,,, b,, processor, the a,,, b,, processor, etc. 
figuration in which a plurality of the aforementioned signals 65 Only the “H’ values which exceed the previously noted 
used within each processor, may be transferred from pro- threshold and which are output in the horizontal and diago- 
cessor to processor on this particular chip, as well as to nal directions to the adjacent processors are reset. 
5,964,860 
13 
This “block edge” resetting procedure constitutes a modi- 
fication to the Smith and Waterman algorithm which is 
unique to the present invention. It is implemented in each 
chip by means of a boundary set zero enable signal (ENZ 
flag) in the control logic of FIG. 14. If this bit is set and the 
output H value is greater than the threshold value, then the 
SISP chip will reset the internally fedback E value and the 
Hi-l,j-l value of the next SISP chip. 
It will now be understood that what has been disclosed 
herein comprises a sequence information signal processing 
integrated circuit chip designed to perform high speed 
calculation based upon the dynamic programming algorithm 
defined by Waterman and Smith. This chip is designed to be 
a building block of a linear systolic array. The performance 
of the systolic array can be increased by connecting addi- 
tional such chips to the array. Each such chip provides 
sixteen processor elements, a 128 word similarity table in 
each processor element, user definable query threshold and 
preload threshold and block maximum value and location 
calculation and buffering. The chip provides the equivalent 
of about 400,000 transistors or 100,000 gates. All numerical 
data are input in 16 bit, two’s compliment format, and result 
in comparison scores ranging from +32,767 to -32,768. A 
control logic device in the chip performs the control and 
sequencing of the processor elements. It contains threshold 
logic for sequence and timing, as well as enabling counters 
for sequence and block counts. 
Those having ordinary skill in the arts relevant to the 
present invention will now, as a result of applicants’ teach- 
ing herein, perceive various modifications and additions 
which may be made to the invention. By way of example, 
the particular algorithm as well as the architecture designed 
to perform the algorithm processes, may be altered while 
still providing a useful and accurate measure of the homol- 
ogy of two or more data sequences or subsequences thereof. 
Accordingly, all such modifications or additions are deemed 
to be within the scope of the invention which is to be limited 
only by the claims appended hereto. 
We claim: 
1. An electronic circuit for use in comparing two 
sequences of elements to determine which alignment of the 
sequences produces the greatest similarity between the 
sequences, the circuit comprising: 
multiple processors connected in series and individually 
configured to: 
compare an element in one of the sequences with 
successive elements in the other sequence, 
for each pair of elements compared, generate a scoring 
parameter indicating which of a plurality of seg- 
ments ending at those elements produces the greatest 
degree of similarity between the sequences, 
use the scoring parameter to generate another scoring 
parameter for the next pair of elements compared, 
and 
deliver the scoring parameter to another processor in 
the series for use in generating another scoring 
parameter for another pair of elements, 
threshold circuitry configured to determine which proces- 
sor produces the scoring parameter with the highest 
value, and 
alignment circuitry configured to determine which align- 
ment of the sequences is associated with the scoring 
parameter having the highest value. 
S 
10 
1s 
20 
2s 
30 
3s 
40 
4s 
so 
5s 
60 
14 
2. The electronic circuit of claim 1, wherein each proces- 
sor is configured to deliver the scoring parameter to the next 
processor in the series. 
3. The electronic circuit of claim 1, wherein all of the 
processors, except a final processor in the series, are con- 
figured to deliver the scoring parameter to another processor. 
4. The electronic circuit of claim 1, further comprising 
adjustment circuitry configured to adjust the scoring param- 
eters when two segments differ because one or more dele- 
tions appear in one of the segments. 
5 .  The electronic circuit of claim 4, wherein the adjust- 
ment circuitry is configured to adjust the scoring parameters 
by a value that depends on which of the segments contains 
the deletion. 
6. The electronic circuit of claim 1, further comprising 
adjustment circuitry configured to adjust the scoring param- 
eters when two segments differ because one or more inser- 
tions appear in one of the segments. 
7. The electronic circuit of claim 6, wherein the adjust- 
ment circuitry is configured to adjust the scoring parameters 
by a value that depends on which of the segments contains 
the insertions. 
8. The electronic circuit of claim 1, wherein the proces- 
sors are configured to generate scoring parameters concur- 
rently and each concurrently generated scoring parameter 
represents a comparison of segments ending at different 
elements in the sequences. 
9. The electronic circuit of claim 1, wherein the sequences 
are represented as A=a,, a,, . . . , a,, and B=b,, b,, . . . , b,, 
and wherein each processor is configured to generate the 
scoring parameter associated with any two elements a, and 
b,, respectively, according to the following equations: 
H,,,=maX{O> H,-l,,-l+s(a,kJ,)> E C , , 3 J  
where E~,l=max{H~,l~l-(u,,+V,), Ec,l-l-VE} 
F,,l=maX{H,~l,,-(u,+v,), ~~-l, ,-VFl 
H,,o=Ho,,=O 
s(a,,b,)>O if a,#b, 
s(a,,b,)<O if a,b, 
and U, V, U, and V, are selected constants. 
10. The electronic circuit of claim 9, wherein each pro- 
cessor is configured to generate all three values H,,,, E,,, and 
F,, for two elements a,, b,. 
11. The electronic circuit of claim 9, wherein each pro- 
cessor is configured to receive the values and F,-l, 
from a preceding processor in the series. 
12. The electronic circuit of claim 9, further comprising a 
memory device that stores a table from which the values for 
s(a,,b,) are derived. 
13. The electronic circuit of claim 1, wherein each pro- 
cessor stores a single element from one of the sequences and 
compares this element to all other elements in the other 
sequence. 
14. The electronic circuit of claim 13, wherein each 
processor generates a scoring parameter for each compari- 
son of the stored element with another element. 
* * * * *  
