High precision computing with charge domain devices and a pseudo-spectral method therefor by Fijany, Amir et al.
n i  m m m I io 11 
US005680515A 
United States Patent 1191 ~111 Patent Number: 5,680,515 
Barhen et al. [45] Date of Patent: Oct. 21,1997 
HIGH PRECISION COMPUTING WITH 
CHARGE DOMAIN DEVICES AND A 
PSEUDO-SPECTRAL METHOD THEREFOR 
Inventors: Jacob Barhen, LaCresenta; N i d  
Toomanan, Encino; Amir Fijauy, 
Granada Hills; Michaii Zak, Cypress. 
all of calif. 
Assignee: CaliIomia Institute of Technology, 
Pasadena, Calif. 
Appl. No.: 534,537 
Filed !kp.27,1995 
Related U.S. Application Data 
Division of Set No. 49,829, Apr. 19, 1993, Pat. No. 5,491, 
650. 
Int. CL6 ...................................................... GO6F 15/00 
U.S. c1. ................................................................ 395m 
Field of Search .................................. 395/24, 23, 27, 
395125, 800; 3651238; 382A4; 317160; 
3641606 
References Cited 
U.S. P m  DOCUMENTS 
4,464,726 811984 
4,607,344 811986 
5,008,833 41991 
5,054,040 10/1991 
5,089,983 U1992 
5,111,436 511992 
5,153,923 1011992 
5220,642 6/1993 
5,258,934 11/1993 
5,274,832 1W1993 
5,283,855 U1994 
5,428,710 6/1995 
5,475,794 1U1995 
4,893355 111990 
chiang .................................... w606 
Athale et al. ........................... 364/841 
Tomlinson, Jr. ............ 364/513 
Agranat et al. ............. 364/513 
Y k v  et al. ............................... 377160 
Subotic et al. .......................... 365/238 
Matsuba et al. .......................... 382/14 
Takahashi et al. ............ 395I25 
Agranat et al. ............. W 6 0 6  
Khan ....................................... 3951800 
Motmum et al. ............ 395127 
Toomarian et al. ........... 395I23 
Mashiko .................................... 395124 
chiang .................................... ws44 
OTHER PUBLICATIONS 
Barhen et al., New directions in massively parallel neuro- 
computing, Fifth international conference. Neural networks 
and their applications, pp. 543-554. Nov. 6, 1992. 
Sage et al., MNOSICCD circuits for neural network imple- 
mentations, IEEE international symposium on circuits and 
systems, pp. 1207-1209 May 11, 1989. 
Schwartz et al., A programmable analog neural network 
chip, IEEE journal of solid-state circuits, vol. 24, issue 2, 
pp. 313-319 4 r .  1989. 
Coolen et al., Evolution equations for neural networh with 
arbitrary spatial strudure, First IEEE International confer- 
ence on artificial neural networks, pp. 238-241 Oct. 16, 
1989. 
Primary Emminer-Robert W. Downs 
Assistant ExminerSanjiv Shah 
Attomex Agent, or Firm-Michaelson & Wallace 
1571 ABSTRACT 
The present invention enhances the bit resolution of a 
CCDICID MVM processor by storing each bit of each 
matrix element as a separate CCD charge packet. The bits of 
each input vector are separately multiplied by each bit of 
each matrix element in  massive parallelism and the resulting 
products are combined appropriately to synthesize the cor- 
rect product. In another aspect of the invention, such arrays 
are employed in a pseudo-spectral method of the invention, 
in which partial differential equations are solved by express- 
ing each derivative analytically as matrices, and the state 
function is updated at each computation cycle by multiply- 
ing it by the matrices. The matrices are treated as synaptic 
arrays of a neural network and the state function vector 
elements are treated as neurons. In a further aspect of the 
invention, moving target detection is performed by driving 
the soliton equation with a vector of detector outputs. The 
neural architecture consists of two synaptic arrays corre 
sponding to the two differential terms of the soliton-equation 
and an adder connected to the output thereof and to the 
output of the detector array to drive the soliton equation. 
12 Claims, 10 Drawing Sheets 
https://ntrs.nasa.gov/search.jsp?R=20080004620 2019-08-30T02:42:08+00:00Z
U.S. Patent Oct. 21, 1997 Sheet 1 of 10 5,680,5 15 
0 m - 
T 
1 
1 
1 
0 
0 .- 
U.S. Patent 
. .  . . .  
e .  
e .  . .  - 
t 
K! 
tY 
0 
Oct. 21, 1997 Sheet 2 of 10 
p - f -  m 
fY 
E 
0 
a 
E n 
v 
e .  
* .  - :. 
0 . 9  
- .  
. .  
n 
4 
U 
0 
E n 
IF 
U 
L 
* . .  
b * .  
r .  .-. 
0 :. 
e . .  . 
1 
> 
> 
0 
m .  . .  . .  
0 . .  
* D  . 
I 
5,680,515 
n 
I- 
CE 
4 
E 
0 
E a. 
U 
(3 cv 
W 
US, Patent Oct..Zl, 1997 Sheet 3 of 10 5,680,5 15 
PHASE 1 PHASE 3 
170 
. . . . . . . . .  . . . . . . . . . .  
. . . . . . . . . .  
a * . . . .  
. . e . . . .  
(PRIOR ART) 
FIG. 3A 
DC DC 
I COLUMN 1 ROW 
170 
I . . . . . .  
0 . 0 . .  . . . .  
J * - * * .  \ / 
(PRIOR ART) 
FIG. 3B 
U S .  Patent 
205 - 
205 
Oct. 21, 1997 Sheet 4 of 10 5,680,515 
A 
0 
. 
1 
205 205 
-210 
FIG. 4 
US.  Patent Oct. 21, 1997 Sheet 5 of 10 
r r l  
i c 
t 
c- 
M --$I 
5,680,515 
c5 
E 
0 
M 
-F 
U S .  Patent Oct. 21, 1997 Sheet 6 of 10 5,680,515 
FIG. 6 
UmSm Patent 
n 
u a + 
u 
U 
Oct. 21, 1997 
c5 
E 
Sheet 7 of 10 5,680,5 15 
U.S. Patent Oct. 21, 1997 
I d -  
d 
Sheet 8 of 10 5,680,515 
t 
I I 
U.S. Patent Oct. 21,1997 Sheet 9 of 10 
It 
rD L 
cq 
0 
4 
0 
.1 
5,680,515 
VI 
CIC 
0 ul 
Z w 
4n 
t 
I 
I 
I 
I 
I 
I 
I 
1 
1 
I 
I 
1 
8 I I I I ' r  
0 
.1 
t> 
0 
1 
U.S. Patent Oct. 21,1997 
I I I I 
W 
v, 
cq 
t 
(D 
Sheet 10 of 10 
4 
I- 
- 
5,680,515 
I 
I 
I 
I 
I 
I 
I 
I 
I 
I 
I 
I 
I 1 
VI 
E 
0 
VI z w 
VI 
4- 
CD 
5,680,515 
1 
HIGH PRECISION COMPUTLNG WITH 
CHARGE DOMAIN DEVICES AND A 
PSEUDO-SPECTRAL METHOD THEREFOR 
This is a division of application Ser. No. 08/049,829, 5 
filed Apr. 19, 1993 now U.S. Pat. No. 5,491,650. 
ORIGIN OF THE INVENTION 
The invention described herein was made in the perfor- 
mance of work under a NASA contract, and is subject to the 
provisions of Public Law 96-5 17 (35 USC 202) in which the 
contractor has elected to retain title. 
15 
BACKGROUND OF THE INVENTION 
1. Technical Field 
The invention relates to charge coupled device (CCD) and 
charge injection device (CID) hardware applied to higher 
precision parallel arithmetic processing devices particularly 
adapted to perform large numbers of multiply-accumulate 2o 
aperations with massive parallelism and to methods for 
solving partial differential equations therewith. 
2. Background Art 
“Grand Challenges” have been defined as fundamental 25 
problems in science or engineering, with broad economic 
and scientific impact, that could be advanced by applying 
high performance computing resources. While high speed 
digital computers with some level of parallelism are 
enabling the consideration of an ever growing number of 30 
practical applications, many very largescale problems await 
the development of massively parallel hardware. To carry 
out the demanding computations involved in grand 
challenges, it is generally accepted that one needs to pursue 
a hybrid approach involving both novel algorithms devel- 35 
opment and revolutionary chip technology. 
Many algorithms required for scientific modeling make 
frequent use of a few well defined, often functionally simple, 
but computationally very intensive data processing opera- 
tions. Those operations generally impose a heavy burden on 40 
the computational power of a conventional general-purpose 
computer, and run much more efficiently on special-purpose 
processors that are specif~cally tuned to address a single 
intensive computation task only. A typical example among 
the important classes of demanding computations are vector 45 
and matrix operations such as multiplication of vectors and 
matrices, solving linear equations, matrix inversion, eigen- 
value and eigenvector search, etc. Most of the computation- 
ally more complex vector and matrix operations can be 
reformulated in terms of basic matrix-vector and matrix- 50 
matrix multiplications. From a neural network perspective, 
the product of the synaptic matrix by the vector of neuron 
potentials is another good example. 
An innovative hybrid, analog-digital charge-domain 
technology, for the massively parallel VLSI implementation ss 
of certain large scale matrix-vector operation, has recently 
been developed, as disclosed in U.S. Pat. No. 5,054,040. It 
employs mays of Charge CoupleUCharge Injection Device 
(CCD/CID) cells holding an analog matrix of charge, which 
process digital vectors in parallel by means of binary, 60 
non-des&uctive charge transfer operations. FIG. 1 shows a 
simplified schematic of the CCD/CID array 100. Each cell 
110 in the array 100 connects to an input column line 120 
and an output row line 130 by means of a column gate 140 
and a row gate 150. The gates 140,150 hold a charge packet 65 
160 in the silicon substrate underneath them that represents 
an analog matrix element. The matrix charge packets 160 are 
2 
initially stored under the column gates 140. In the basic 
matrix-vector multiplication (MVM) mode of operation, for 
binary input vectors, the matrix charge packets 160 are 
transferred from under the column gate 140 toward the row 
gates 150 only if the input bit of the column indicates a 
binary ‘one’. The charge transfexred under the row gates 150 
is summed capacitively on each output row line 130, yield- 
ing an analog output vector which is the product of the 
binary input vector with the analog charge matrix. By virtue 
of the CCD/CID device physics, the charge sensing at the 
row output lines 130 is of a non-destructive nature, and each 
matrix charge packet 160 is restored to its original state 
simply by pushing the charge back under its column gate 
140. 
FIG. 2 is an illustration of the binary-analog MVM 
computation cycle for a single row of the CCD/CID array. 
In FIG. 2A, the matrix charge packet 160 sits under the 
column gate 140. In HG. 2B, the row line 130 is reset to a 
reference voltage. In FIG. 2C, if the column line 120 
receives a logic one input bit, the charge packet 160 is 
transferred underneath the row gate 150. In FIG. 2D, the 
transferred charge packet 160 is sensed capacitively by a 
change in voltage on the output row line 130. In FIG. 2E the 
charge packet 160 is returned under the column gage 140 in 
preparation for the next cycle. A bit-serial digital-analog 
MVM can be obtained from a sequence of binary-analog 
MVM operations, by feeding in successive vector input bits 
sequentially, and adding the corresponding output contribu- 
tions after scaling them with the appropriate powers of two. 
A simple parallel array of divide-by-two circuits at the 
output accomplishes th is  task Further extensions of the 
basic MVM scheme of FIG. 1 support full digital outputs by 
parallel A/D conversion at the outputs, and four-quadrant 
operation by differential circuit techniques. 
FIG. 3 illustrates how the matrix charge packets are 
loaded into the array. In HG. 3A, appropriate voltages are 
applied to each gate 170 in each cell 110 of the CCD/CID 
array 100 so as to configure each cell 110 as a standard 
4-phase CCD analog shi f t  register to load all of the cells 110 
sequentially. In FIG. 3B, the same gates 170 are used for row 
and column charge transfer operations as described above 
with reference to FIG. 2. 
The particular choice for this unusual charge-domain 
technology resulted from several considerations, not limited 
to issues of speed and parallelism. In comparison to other, 
more common parallel high-speed technology environments 
(digital CMOS, etc.), the distinct virtues of the foregoing 
chargedomain technology for large-scale special-purpose 
MVM operations are the following: 
Very High Density: The compactness of the CCD/CID cell 
allows the integration of up to 16 cells on a 1 cm2 die (in 
a standard 2 pin CMOS technology), providing single- 
chip 100 GigaOPS computation power. 
Very Low Power Consumption: The charge stored in the 
matrix is conserved along the computation process 
because of the non-destructive nature of the CCD/CID 
operation. Hence, the entire power consumption is local- 
ized at the interface of the array, for clocking, VO and 
matrix refresh purposes. This enables the processor to 
operate at power levels in the mWfIkraOPS range. 
Scalability: The scalable architecture of the CCDlCID array 
allows the interfacing of many individual processors in 
parallel, combining together to form effective processing 
units of higher dimensionality, s t i l l  operating at nominal 
VO Flexibility: Although an analog representation is used 
inside the array to obtain fast parallel computation, the 
speed. 
5,680,515 
3 4 
architecture of the processor provides the flexibility of a 
full digital interface, eliminating the bandwidth and noise 
problems typical.for analog interfacing. 
R0gr-g Flexibility: The architecture allows for either The Summation of N scalars can be performed in O( l)a 
o f i d  @=del, sustained) or electronic (semi-parallel, 5 operations with q N )  “processors” (CCD/CID cells). The 
periodical) loading of the charge matrix. ne complexity of matrix-vector operations follows from the 
method, described above with reference to FIG. 3, above 
requires interrupts, of duration usually much shorter than corollary 2.1. A Vector-dot product can be performeed in 
the timc interval in computation mode, for which the O(l)atO(l)m operations with O(N) “processors” or 
stored charge matrix remains valid before a matrix refresh io ccD/CID cells* 
is needed. Corollary 2.2. A matrix-vector multiplication can be per- 
Preliminary results on a 128x128 working prototype, formed in O(l)atO(l)m operations with q N Z )  “pro- 
cessors” or CCD/CID cells. implemented on a single 4 mmx6 mm die in 2 pm CCD- 
CMOS indicate a of sppr0ximately Corollary 2.3. A matrix-matrix multiplication can be 
cessors with 1024x1024 cells will be realizable in the near 
is independent of the 
problem size, assuming that the chip size matches the future. 
problem. The above results are not only significant from a 
theoretical point of view, since they improve the known 
time-lower bounds in these computations, but are also of 
practical importance. They show that the design and analysis 
Of Pardel algorithms based on C C D / m  architectures is 
drastically different from that for conventional pardel corn- 
develop c o m p u ~ o ~  schemes which exploit, from the Puters* The ccD/cID is considered for p- 
The ccDlcID 25 forming Matrix-Vector Multiplication (MVM) for which it 
achieves the computation time given in Corollary 2.2. 
However, in order to output the results. k clock cycles are 
needed, where k is the required precision, in number of bits. 
The CCDICID chip is most efficient for MVM computations 
30 wherein the ma& is known beforehand, as well as for 
series of MVMs performed with the same matrix. Despite 
this seeminglynarrow the ccD/cID processor can 
be used for many applicatiomleading to new results. In the 
fouowing9 a few 
35 a. The Discrete Fourier Transform (DFT). The DFT can be 
presented as a MVh4 for which the timelower bound can 
be derived from Corollary 2.2. The DFT represents an 
application for which the coefficient matrix is known 
The Of scalars can be performed in oflog beforehand. However, for both serial and conventional 
its greater efficiency. In particular, for parallel 
computation, the asymptotic time lower bound of 
O(log,N) can be achieved by using q N )  “processors”. In 
contrast, for the CCD/CID chip the DFT is more efficient 
for large roblems. and can be perfcaned in q k )  steps 
with q N  ) CCD/CID cells. The computational complex- 
ity of O(k) not only improves the previously assumed 
time-lower-bound, but is also independent of the trans- 
form size. 
50 b. Linear System Solution. For many applications, the 
solution of a linear system. Ax&, is required, wherein the 
coefficient matrix A is known a priori. This is, for 
example, the case for the solution of partial differential 
equations (PDE) using a finite difference method, e.g., 
parabolic PDE, for which A is a tridiagonal matrix, or 
Poisson equation, for which A is block tridiagonal. For 
both cases, given the fact that A is positive definite, its 
inverse A-’ can also be computed a priori. The linear 
systems solution can then be reduced to a simple MVM. 
However, this is not efficient for either ser ia l  or conven- 
tional parallel computation since, unlike A, the matrix A-l 
is dense. There are more efficient algorithms for both 
serial  and parallel computation which exploit the sparse 
structure of A. For the CCD/CID chip, on the other hand, 
the unconventional and seemingly inefficient method is 
more suitable, since it solves the above linear system in 
q k )  steps with q N Z )  CCDlCID cells. 
to a new set of results regarding the complexity of parallel 
computation using CCD/CID architectures. 
Theorem 2. 
In particular: 
10’’ 8-bit m u l t i p l y - a c d t e  operations per second 15 perfomd in O(l)a+O(l)m with ow3) 
“processors” or CCD/CID cells. 
The c o m p l e ~ ~  of these 
NEW DIRECTIONS EMERGING FROM THE 
PRESENTINVENTION 
An innovative approach to parallel design 
puting. ~n contrast to conventional approaches, one must 
a key enabling factor for high performance 
he concept of massive 
MVM approach discussed above represents a promising 
hardware technology for 
which oEas both opportunities and in the design 
of paralld algorithms. new technology defies some of 
the most basic assumptions in the analysis of 
mmplexiv of parallel algoritfims. To clarify this, a short 
discussion is now given. In what follows, Nxl vectors and 
NxN matrices are Also, and a stand for 
multiplication and addition, respectively. One of the most 
of 
computation involves the summation of N scalars. 
Theorem 1. 
N)a opmtions with O(N) processors. The complexity of 
theorem as: 
are given: 
results regarding the 
other operations follows from the above 40 p a l l e l  computation, the is always ~ f e m e d  ue to 
Corollary 1.1. A vector-dot product can be performed in 
@log N)&ql)m operations with q N )  processors. 
C W l l V  1.2- A matrix-v- dtiPfiCation can be Per- 45 
formed in O(l% N ) w l ) m  With qN’) 
processors. 
Corollary 1.3. A matrix-matrix multiplication can be 
perfamed in q l o g  N)a-tO(l)m operations with q N 3 )  
processors. 
The complexity of many other computational problems, 
including Fourier transforms and linear systems, can also be 
derived based on the above theorem and its corollaries. 
Currently, these results provide the framework for the design 
of efficient parallel algorithms. It should be mentioned that 55 
Corollaries 1.2 and 1.3 are more of a theoretical importance 
than a practical one, since the implementation of parallel 
algorithms achieving the above time-lower bounds requires 
rather complex two-dimensional processor interconnections. 
Interestingly, the result of Theorem 1 has been assumed to 60 
be independent of hardware technology. This reflects a basic 
feature of digital computers. which do not allow the simd- 
taneous summation of N numbers. 
The CCD/CID technology of FIGS. 1-3 defies this most 
basic assumption, since the summation of N numbers is just 65 
the summation of N charges, which can be performed 
simultaneously. This is a fundamental change, which leads 
5,6803 15 
5 6 
c. Eigenvector Search. Given a matrix A having N linearly 
independent eigenvectors and associated eigenvalues, one 
often needs to find the eigenvector v(’) corresponding to 
the d o e a n t  eigenvalue A,. One available technique is 
the Power Method which, starting from an arbitrary 5 c o r r e s ~ n d i n g t o ~ e t l ~ ~ o n s .  
vector u(Q), perfoms successive MVMS U(~)=AU(+~). 
with a suitable s&g, th is  sequence converges to cV(1), 
for some constant c. 
It is believed that charge-domain processors not only 
represent a new hardware technology, allowing 
high performance solutions for a large class of problems, but Of the matrix cell group a Of CCD 
that they also affect many theoretical issues regarding the 
design and analysis of massively parallel algorithms. One 
area of for future advances is the need 
to extend the device’s dynamic range (accuracy). 
discussed 
above provide a strong incentive to develop massively 
parallel algorithms for selected infonnation processing 
problems. However, even though the hardware available to 
&te is characterized by vay high sped, its 
typical of CCD/CID technology, Le., of the order of 8 bits. 
since ach matrix element in m ~ .  1 is represented by a 
charge packet, its accuracy is limited to the resolution of a 
CCD charge packet. A typical CCD charge packet is on the 
order of a million electrons, and th is  size limits its resolution 25 representing the 
to under 8 bits. Furthennore, voltage resolution (i.e., charge 
sensing) at the end of each row h e  is also of the order of 
soliton equation. A resonance phenomenon between the 
moving waves of the soliton equation and moving targets 
sensed by the detector array is observed which enhances the 
2llI@tUde Of the State function JleUrOll elements at locations 
In a preferred embodiment, the MVM processor of the 
invention includes an array of N rows and M columns of 
CCD matrix cell groups corresponding to a matrix of N rows 
and M columns of matrix elements, each of the matrix 
elements representable with b binary bits of precision, each 
Storing b CCD charge packets representing the b binary bits 
Of the corresponding matrix element, the amount Of charge 
in each packet COIIeSponding to One Of tW0 predetexmined 
15 amounts of charge. Each of the CCD cells includes a holding 
site and a charge sensing site, each charge packet initially 
residing at the respective holding site. The MVM processor 
further includes a device for sensing, for each row, an analog 
signal corresponding to a total amount of charge residing 
is 20 undex all charge sensing sites Of the CCD cells in the row, 
an array Of rows and 
corresponding to a vector of M elements representable with 
C binary bits Of precision, each one Of the M 00l~mns Of
CCD Vector cells a platy of c charge Packets 
vector 
the amount Of charge in each Packet corresponding 
to one Of two predetermined amounts of ch-e. A multi- 
ne advances in analog W I  
CCD vector 
binary bib Of the 
8 to 10 bits. Thus, both architectures and algorithms are 
needed, which not only are able to fully utilize the hard- 
ware’s throughput, but also, ultimately, provide highly accu- 
rate results. Accordingly, it is an object of the present 
invention to extend the accuracy or resolution of CCD/CID 
MVM processors far beyond that currently available. It is a 
related object of the invention to do so without incurring a 
proportionate penalty in speed. 
SUMMARY OF THE DISCLOSURE 
The present invention enhances the bit resolution of a 
CCD/CID MVM processor by storing each bit of each 
matrix element as a separate CCD charge packet. The bits of 
each input vector are separately multiplied by each bit of 
each matrix element in massive parallelism and the resulting 
products are combined appropriately to synthesize the cor- 
rect product. In one embodiment, the CCDICID MVM array 
is a single planar chip in which each matrix element occu- 
pies a single column of b bits, b being the bit resolution, 
there being N rows and N columns of such single columns 
in the array. In another embodiment, the array constitutes a 
stack of b chips, each chip being a bit-plane and storing a 
particular significant bit of all elements of the matrix. In th is  
second embodiment, an output chip is connected edge-wise 
to the bit-plane chips and performs the appropriate arith- 
metic combination steps. Such arrays are employed in a 
pseudo-spectral method of the invention, in which partial 
differential equations are solved by expressing each deriva- 
tive analytically as matrices, and the state function is 
updated at each computation cycle by multiplying it by the 
matrices. The matrices are treated as synaptic arrays of a 
neural network and the state function vector elements are 
treated as neurons. Using such a neural architecture, both 
linear (e.g., heat) and non-linear (e.g., sdlton) partial dif- 
ferential equations are solved. Moving target detection is 
performed by driving the soliton equation with a vector of 
detector outputs. The neural architecture consists of two 
synaptic arrays corresponding to the two Merential terms 
of the soliton equation and an adder connected to the output 
thereof and to the output of the detector array to drive the 
plying device operative for each one of the crows of the 
CCD vector cells temporarily iransfers to the charge sensing 
30 site the charge packet in each one of the M columns of 
matrix cells for which the charge packet in the correspond- 
ing one of the M columns and the one row of the CCD vector 
cells has an amount of charge corresponding to a predeter- 
mined binary value. 
The preferred embodiment further includes an arithmetic 
processor operative in synchronism with the multiplying a 
device including a device for receiving, for each row, the 
sensed signal, whereby to receive Nxb signals in each one 
of c operations of the multiplying a device, a device for 
40 converting each of the signals to a corresponding byte of 
output binary bits, and a device for combining the output 
binary bits of all the signals in accordance with appropriate 
powers of two to generate bits representing an N-element 
vector corresponding to the product of the vector and the 
In one embodiment, the array of matrix CCD cells is 
distributed among a plurality of b integrated circuits con- 
taining subarrays of the M columns and N rows of the 
matrix CCD cells, each of the sub-arrays corresponding to a 
50 bit-plane of matrix cells representing bits of the same powa 
of two for all of the matrix elements. A backplane integrated 
circuit connected edgewise to all of the b integrated circuits 
includes a device for associating respective rows of the 
vector CCD elements with respective rows of the matrix 
55 CCD elements, whereby the multiplying device operates on 
all the rows of the vector CCD elements in parallel. 
The invention further includes a neural architecture for 
solving a partial differential equation in a state vector 
consisting of at least one tenn in a spatial partial derivative 
60 of the state vector and a term in a partial time derivative of 
the sta te  vector. The neural architecture includes a device for 
storing an array of matrix elements of a matrix for each 
partial derivative term of the equation, the matrix relating 
the partial derivative to the state vector, a device for mul- 
65 tiplying all rows of the matrix elements by conresponding 
elements of the state vector simultaneously to produce terms 
of a product vector, and a device for combining the product 
35 
45 -. 
5,680,515 
7 
vector with a previous iteration the state vector to produce 
a next iteration of the state vector. The equation can include 
plural spatial partial daivative t m  wherein the neural 
architecture includes plural arrays of matrix elements of 
matrices corresponding to the partial spatial derivative terms 
and a device for combining the product vectors. The partial 
spatial daivative terms are definable over a spatial grid of 
points and the matrix elements are obtained in a pseud6 
spectral analytical expression relating each mairix element 
to the distance between a corresponding pair of grid points. 
In one embodiment of the neural architecture, moving 
targets are detected in the midst of noise in  images sensed by 
an array of sensors, by the addition to the s u m  of product 
vectors of a vector derived from the outputs of the sensors, 
to produce a next itu-ation of the state vector. Preferably, the 
stored matrices implement a soliton equation having iirst 
and third order partial sptial derivatives of the state vector, 
so that the neural architecture includes a device for storing 
respective arrays corresponding to the first and third order 
derivatives. This produces two product vectors correspond- 
ing to the first and third order derivatives. respectively. This 
unbodimcnt further includes a &vice for multiplying the 
product vector of the first ordex derivative by the state 
vector. 
BRIEF DESCRIPTION OF THE DRAWINGS 
FIG. 1 is a simplitied diagram of a CCD/CID MVM 
processor array of the prior art. 
FIG. 2 includes FIGS. 2A, ZB and 2C-E illustrating a 
sequence of matrix-vector multiply operations in a unit cell 
of the array of FIG. 1. 
FIG. 3 includes FIGS. 3A and 3B illustrating, 
respectively, electronic loading of the matrix elements and 
arithmetic operations in a unit cell of the array of FIG. 1. 
FIG. 4 is a plan view of a CCD/CID MVM processor of 
the present invention. 
FIG. 5 is a schematic diagram of a typical architecture for 
a higher precision arithmetic processor employed in com- 
bination with the CCD/CID processor of FIG. 4. 
FIG. 6 is a diagram of a three-dimensional embodun * ent of 
the CCD/CID MVM processor of the present invention. 
FIG. 7 is a schematic diagram of a general neural archi- 
tecture for implementing the pseudo-spectral method of the 
present invention applied to linear partial diffexential equa- 
tions. 
FIG. 8 is a schematic diagram of a neural architecture for 
puforming the pseudcFspectral method applied to a non- 
linear partial difFerential equation and for performing target 
detection therewith. 
FIG. 9 includes data plots obtained in  a computer simu- 
lation of the neural architecture of FIG. 8 in which the 
sensors only sense noise, of which FIG. 9A is a Fourier 
spectrum of the sensor input, FIG. 9B is a spatial plot of the 
sensor inputs, FIG. 9C is the network output and FIG. 9D is 
the dltcrcd network output in which noise has been removed. 
FIG. 10 includes data plots obtained in a computer 
simulation of the neural architecture of FIG. 8 in which the 
sensors Sense a moving target with a per-pulse signal46 
noise ratio of 4 dB, of which FIG. 1OA is a Fourier 
spectrum of the sensor input, FIG. 1OB is a spatial plot of the 
sensor inputs, FIG. 1OC is the network output and FIG. 1OD 
is the dltered network output in which noise has been 
removed. 
FIG. 11 includes data plots obtained in a computer simu- 
lation of the neural architecture of FIG. 8 in which the 
8 
sensors sense a moving target with a per-pulse signal-to- 
noise ratio of -10 dB, of which FIG. 11A is a Fourier 
spectrum of the sensor input, FIG. 1lB is a spatial plot of the 
sensor inputs, FIG. 11C is the network output and FIG. 11D 
5 is the filtered network output in which noise has been 
removed. 
DETAILED DESCRIPTION OF THE 
PREFERRED EMBODIMENTS 
lo ENHANCED BIT-RESOLUTION CCD MVM PROCES- 
SOR 
In order to achieve high precision in a CCD/CID MVM 
processor, the present invention provides an advanced archi- 
tecture which is illustrated schematically in FIGS. 4 and 5. 
l5 In the following, the invention is described in its basic form 
However, different architectures can be derived from this 
basic f m  which are not discussed here. A key element of 
the CCD/CID MVM array 200 of FIGS. 4 and 5 is the 
encoding in each CCD/CID processor cell 210 of one bit of 
the binary representation of each matrix element. As shown 
in FIG. 4, if a matrix A is to be specified with b bits of 
precision, each element Ail of the matrix occupies a single 
column 205 of b cells, namely cells corresponding to [b(i- 
l)+lj], for 1=1, . . . b, labelled in FIG. 4 as Avo, . . . Ayb'. 
25 Thus, the CCD/CID MVM processor array 200 of FIG. 4 is 
an array of N columns and N rows of matrix elements, each 
matrix element itself being a column 205 of b CCD/CID 
cells 210. There are, therefore, at total of NxNxb cells 210 
in the array 200. A vector of N elements representable with 
a precision of b binary bits is stored in an array 230 of 
CCD/CID cells 235, the array 230 being organized in N 
columns 240 of b CCD/CID cells, each cell 235 in a given 
column 240 storing the corresponding one of the b binary 
bits of the corresponding vector element. Each CCDICID 
cell 210 in FIG. 4 is of the tvpe described above with 
reference to FIGS. 1-3 and is operated in  the same manner 
with N column input lines (coupled to a successive row of 
N vector CCD/CID cells 235 in the vector CCDlCID array 
230) and row output lines of the type illustrated in FIG. 1. 
Matrix-vector multiplication will now be described with 
reference to the example of a conventional matrix-vector 
product obtained by adding the products of the elements in 
the vector and each row of the matrix. Of course, the present 
45 invention is not confined to a particular type of matrix vector 
product, and can be used to compute other types of matrix- 
vector products. One example of another well-known type 
of matrix-vector product is that obtained by adding the 
products of a respective vector element multiplied by all 
Computation proceeds as follows. At clock cycle one, the 
matrix A, in its binary representation, is multiplied by the 
binary vector labelled u:, . . . uNo, which contains the least 
significant bits of ul, . . . u, (i.e., by the top row of charge 
55 packets in the array 230 of vector CCD/CID cells 235). By 
virtue of the charge transfer mechanisq analog voltages 
labelled . . (o%Nh' are sensed at the output of each 
one of the bxN rows of the matrix array 200. To keep track 
of the origin of this contribution to the result, a left super- 
In the present example, all of the products computed in 
the array 200 are synthesized together in accordance with 
corresponding powers of two to form the N elements of the 
transformed (output) vector. This is accomplished in an 
65 arithmetic processor illustrated in FIG. 5. The arithmetic 
processor is dedicated to the computation of the particular 
type of matrix-vector product computed in the present 
30 
35 
40 
5o elements in a respective column of the matrix. 
60 script (O'V is utilized in the notation employed herein. 
5,6803 15 
9 10 
example. Processors other than that illustrated in FIG. 5 matrix-vector product to be computed. The %plane embodi- 
could be employed in making these same computations from ment of FIG. 6 permits a l I  b bits of every matrix element to 
the products produced by the array 200, and the present be multiplied by a given vector element, and therefore is 
invention is not confined to the type of arithmetic processor potentially much faster than the embodiment of FIGS. 4 and 
illustrated in FIG. 5. Of course, in order to compute matrix- 5 5. 
vector products Merent from the type computed in the PSEUDO-SPECTRAL =HOD OVERVIEW 
present example, an arithmetic processor diffexent from that The pseudospectral neuralcomputing method of the 
illustrated in FIG. 5 would be employed in combination with present invention is now described, as it applies generally to 
the array 200 of FIG. 4. a broad class of partial differential equations (PDE's), 
At clock cycle two, the voltages sensed at each of the N 10 involving P d d  derivatives Of VariOUS Orders d Of a S k k  
rows are fed into r e sphve  pipelined converters 300, variable. For the sake Of Simpfidty, the present discussion is 
converter 300 having bd bib of precision limited to One "Spatial" dimension X. Extension to multi- 
('here &lOgpN, and  denotes the number of col~ms of dimensional CaSeS is straightforward. ThUS, consider the 
A), shultaneously A is multiplied by ull, . . . ,,,I, state variable u(x, t), which is periodic over some interval 
(i.e., by the second row of charge packets in the array 230 15 which can, without loss of generality, be taken as an interval 
of vector CCD/CID cells), yielding %. between 0 and L. The first step in the method is to transform 
the state variable u(x, t) into Fourier space with respect to x. 
The main advantage of this operation is that the derivatives 
20 variable. Before proceeding, the spatial interval [0 2 a] is 
[03 nl. certain classes bf PDE's 
At clock cycle three, the representations of (0) 
an appropriate bit-offset toward the most significant bit 
position. Specifically, the result or dement ?: obtained 
Ik bits toward the most significant bit position (toward the 
leftmost bit) at clock cycle k This offset is controlled by an 
v,=Zaddv.i (1) offset counter 310 (which is inmemented by one at each new 25 
converters 300, and the vector u:, . . . uN2 multiplies A to where d is the order of the derivative of v with respect to x, 
v, is the partial dfi derivative of v with respect to x and v, yield (2%. 
Elements (%: with same row index i are then fed into is the with respect to the. In 
i, and pipelined over k. The cascaded s u m  circuits 315 
cascaded sumcircuits3l5 shown in FIG. 5, in W e 1  forall 30 order to numerically solve m. (l), the intmal [O, 2 n] is are discretized by 2~ points, with ~ ~ m .  
connected in a tree Of the The function v(x, t), which is defined only at these points, is 
known in the *- the v~ Of the Product approximated by v(x,, t), where x,,=nAm and n d ,  1, . . . , 
~ e o ~ n e d a f t e r l o g ~ b  cycles, and the overall latency 2N-1. The function v(x,,,t) is now transformed to discrete 
is b+log2b+3. If one needs to multiply a set of vectors u by 35 ~ ~ u r i ~  space by 
the same matrix A, this pipelined architecture will output a 
precision has been achieved than was previously available. 
An added benefit is that the refkesh time overhead is sig- 
nificantly reduced, since in the matrix representation of FIG. 40 where k takes the values k=O, +1, . . . , m. The inversion 
4 each electron charge packet only refers to a binary 
quantity. 
While FIG. 4 indicates that the NxbxN cells 210 of the 
array 200 are formed on a single planar silicon chip, the 
connected together, using Z-plane technology, as illustrated 
in FIG. 6. In such an embodiment, it is preferable to have s 
each chip 400 assigned to a particular bit plane, in which 
each chip is an N-by-N CCD/CID array of CCD/CID cells 
210 of the type illustrated in FIG. 1, which stores NxN bits 50 
or charge which in the present invention, however, 
represent binary values only. The first chip 400-0 stores the 
least significant bits of all matrix elements of the N ~ N  
m, he second ,.hip 400-1 
cant bits of all matrix elements, and SO forth, and the last 
dements. A backplane chip 405 implements the m y  230 of 
N columns and b rows of vector CCD cells 235 of FIG. 4. 
The backplane chip is connected edge-wise to the column 
every one of the b rows of vector CCD/CID cells 235 to be 
i n p t  to c o l m  i n p t  lines of respedve ones of the b 
bit-plane chips 400, greatly enhancing performance and reveals the constraint 
reducing the latency of a given matrix-vector multiplication 
operation. The arithmetic processor of FIG. 5 could also be 65 
implemented on the backplane chip 405. The architecture of 
the arithmetic processor would depend upon the type of 
':? * . '"N&' are into a re&ter 305 with respect to then become algebraic in the transformed 
n m z e d  to the 
t) as 
during 'lock cyc1e kis Offset in the *promte Vre@ster can then be re-swed in t e r n  of the new swe variable v(x, 
clock cycle). Next, the voltages (''V are fed into the A/D d 
first devative of 
new result every b clock cycles. Clearly, a far higher 1 2N-1 (2) Vdt) = Fdv] = - v(W)ei*k 5 ?la 
fornuhis 
(3) 
200 can be implemented using 
45 This an efficient of the derivatives of 
with respect to x. In particular the drh partial derivative of 
v(x, t) is 
(4) 
where d is the order of the derivative and i=+l. In the above 
expression, the symbol * denotes the Schw-Ha-dprod- 
uct of two Nxl matrices. It has been shown that such a 
representation of derivatives of periodic functions can be 
55 related to central difference approximatons of infinite order. 
the t e r n  derivative, Yields the pseudospectral scheme 
for 
v ~ ' { - t W * m }  
the next least 
chip 4 ~ ~ 1 )  storing the most significant bits of all matrix Combining Eq. (4) with a first order m e r  approximation to 
the 
input lines of all of the bit-phe chips 400. This pennits 60 v ( ~ t + A f ) = v ( & t ) - z ~ ~ A ~ ~ ' [ ( ~ ~ * v ( r ) ] }  (5) 
where A* denotes the time step size. Simple stability analysis 
d 
A 2 / ( a m = 4 0  (6) 
In order to map the proposed numerical solution of the 
PDE onto a MVM architecture, Eq. (4) is expanded by 
5,680,5 15 
11 
substituting the appropriate tenns from Eqs.  (2) and (3), as 
follows: 
By rearranging the terms: 
The last term in Eq. (8) can be represented by a constant 
matrix which depends solely on the distance between two 
spatial grid points, i.e., 
Since the function v(xn, t) is real, its derivatives should be 
real as well. Thus, only the real part of the matrix in Eq. (9) 
is needed for the computation of v,,,(x,,, t). Hence. 
20 
which is readily evaluated based upon the order d of each 
derivative in the PDE. 
The neural architecture to be employed in solving the 
PDE can now be spedied using the foregoing results. Let 
each grid point represent a neuron with activation function 
v,(t) equal to v(x,,, t). Combining Eqs. 1 and 9, the network 
dynamics is readily obtained: 
25 
30 
Thus, the pseudo-spectral architecture consists of 2N 
neurons, the dynamics of which is governed by a system of 35 
coupled linear differential equations, i.e., Eq. (11). The 
synaptic array, T, fully interconnects a l l  neurons, and its 
elements are calculated using Eq. (10). 
For illustrative purposes only, a first order Euler scheme 
is considaed f a  the temporal dependence. The resulting 4o 
neural dynamics is then ready for implementation on the 
CCD/CID architecture: 
V"(t + 4) = VA?) + a, I: aap z F-vdt) (12) 
An architecture for implementing Equation 12 is illus- 
trated in FIG. 7. In FTG. 7. each array 450 labelled TI, T2, 
. . , etc. is preferably a CCD/CID MVM processor of the type 
illustrated in FIGS. 4 and 5 and stores the matrix elements 
defined by Equation 10. Each array 450 receives the current 5o 
neuron vector vn(t) from a common register 460 labelled 
vn(t). Multipliers 470 multiply the outputs of the arrays 450 
p', T2, . . . etc.) by the appropriate coefficients from 
Equation 12 and an adda 480 sums the products together. A 
delay 485 implements the first term in Equation 12 and an 55 
adda 490 provides the final result vn(t+A,) in accordance 
with Equation 12. A second delay 4% provides the neuron 
vector vn(t) for the next computational cycle. 
APPLICP;IION TO THE HEbX EQUHION 
formalism, we focus OUT attention on the one-dimensional 
heat equation (Equation 13 below). This linear partial dif- 
ferential equation has the advantage of exhibiting both 
suf€icient computational complexity, and possessing analyti- 
cal solutions. This enables a rigorous benchmark of the 65 
proposed neural algoritbm The heat equation is: 
d m  
45 
In  order to provide a concrete framework for the proposed 6o 
upu, (13) 
12 
where u denotes the temperature, and a is the thermal 
Wsivity. Partial derivatives of u with respect to time and 
space are denoted by u, and u, respectively. The following 
working example concerns an infinite slab, of thickness L, 
with an initial tempexatwe distribution at time t=O of 
U(& O)=l-Cm (2 mA) OSXBL 
It is assumed that the slab is insulated at both ends, so that 
no heat flows through the sides. Thus, the following periodic 
boundary conditions apply: 
[%M t a l  (14) 
[%LS t a l  (15) 
PSEUDOSPECI'RAL SOLUTION TO THE HEAT EQUA- 
TION 
Application of the pseudospectral numerical method of 
the invention scheme to the heat equation will now be 
described. This first step is to transform u(x, t) into Fourier 
space with respect to x. The main advantage of t h i s  operation 
is that the derivatives with respect to x then become alge- 
braic in the transformed variable. Before proceeding, the 
spatial period is normalized to the interval [O, 2 a]. The 
scaled heat equation can then be expressed in terms of the 
new state variable v(x, t) as 
' vi-+, (16) 
where s is defined as 2 fi. In order to numerically solve Eq. 
(16), the interval [O, 2 n] is discretized by 2N equidistant 
points, with spacing Ax&. The function v(x,t), which is 
defined only at these points, is approximated by v(x,,t), 
where %=I& and n=O, 1, . . . ,2N-1. The function v(x,, t) 
is now transformed to discrete Fourier space by 
where k takes the values k=O, +1, . . . , &N. The inversion 
formula is 
This enables an efficient calculation of the derivatives of v 
with respect to x. In particular 
v__F'{-CZ'flv]} (19) 
In the above expression, the symbol * denotes the Schur- 
Hadamard product of two Nxl mahices. Combining Eq. 
(19) with a first order m e r  approximation to the temporal 
derivative, yields the pseudospectral scheme for solving the 
heat equation: 
V(xm *A,)=+,,, t ~ A , { F , ? [ ~ * W ) l )  (20) 
where At denotes the time step size. The following constraint 
of Equation (6) applies: 
A*-QW@W< (21) 
NEURALNEIwoRKARCHlTECl'uREFORTHE~ 
EQUPnON 
In order to map the proposed numerical solution of the 
heat equation onto the CCD/CID architecture of FIGS. 1 or 
4, the spatial derivatives are evaluated using the Fourier 
transform formula of Eq. (19), which is expanded by sub- 
stituting the terms from Eqs. (17) and (18) to obtain 
5,680,515 
13 14 
describe the behavior of one-dimensional shallow water 
waves with s m a l l  but finite amplitudes. Since its discovq, 
solitons have enabled many advances in areas such as 
2N-1 - 1 1 : ( & y 1  I: v(hr)c".dhk plasma physics and fluid dynamics. The following descrip- K k  5-0 5 tion concerns the one-dimensional KdV equation, which is: 
(22) 
By rearranging the tenns: U[+aUU*+bU-4 (31) 
(23) where ut and u, denote partial derivatives of u with respect 
10 to time and space, respectively. If a and b are set to 6 and 1 
respectively, an analytical solution to Equation 31 can be 
obtained for an infinite medium. a s  solution is: 
vd.;l,r) = I: fix.,,,?) ( + ) P eu~*r.**, m 
The last term in Eq. (23) can be represented by a constant 
matrix which depends solely on the distance between two 
spatial grid points, i.e., U(X, r & 2 k ? . d P ( ~ k ' t + ~ )  (32) 
15 
T , = - A  X P & d  
W k  
(24) 
Since the function v(xn, t) is real, its derivatives should be 
real as well. Thus, only the real part of the matrix in Eq. (24) 
is needed for the computation of v,(%, t). Hence, 
where k and q,, are constants, with bo. The above expres- 
sion represents a solitary wave of amplitude 2kz initially 
located at X-qdk, moving at a VelmiV Of 4@. In Order to 
numerically the equdon, the following boUmh'Y 
2o condition is imposed: u(x+2L, t)=u(x, t) for t in the interval 
[0, TI and x in the interval 1-L, Ll. 
As in the application of the pseudo-spectral method to the 
heat equation, the spatial interval [-L, L] is normalized to 
the interval [0,2 7c] using the transformation (fi)(x+L)=x. 
T,=Re [ ( + ) f P & r 5 3 ]  (25) 
25 The KdV equation can then be re-stated as: 
or, simply 
The neural architecture can now be specified for the heat 
equation. Let each grid point represent a neuron with acti- 
and 26, the network dynamics is readily obtained: 
where 
tion 33 becomes vation function vn(t) equal to ~ ( 4 ,  t). Combining Eqs. 16 3O 
Using analogous to those Of Equations 1-11, wua-  
Thus, the architecture consists of 2N neurons, the dynamics 
of which is governed by a system of coupled linear differ- 
ential equations of Eq. (27). The synaptic array, T, fully 
interconnects all neurons, and its elements are calculated 
using Eq. (26). 40 
For illustrative purposes only, a first order Eider scheme 
Where the matrices Tnrn' and Tm3 are computed from 
Equation 10 as 
k = T  1 ~ksin[wx;-&)l 
P , = L  E p a4x; - - 4 1  
(35) 
(36) 
The neural network architecture consists of 2N neurons, the 
45 dynamics of which are governed by Equation (34) defining 
a system of coupled non-linear differential equations. Two 
overlapping arrays, T' andT3 fully interconnect all neurons. 
In analogy with Eiquation 12, Equation 34 is recast using 
central difference temporal dependencies as follows: 
is considered for the temporal dependence. The resulting 
neural dynamics is then ready for implementation on the 
CCD/CID architecture: 
(28) 
vdt + A[) = v&) + A& Z T-v,,,(r) 
m 
One mY further remange the above equation to obtain 
or 
denotes a synaptic matxix corresponding to the second 
scaling factor. the time step, and the thermal difhsivity. 
spatial derivative in the heat equation, and includes the 
EQUAI'IONS Whcrc 
(38) 
v.(t + A,) = v,,(r- A,) - v&) Z w-v,,,(f) - I: pWv,,,(t) 
m rn FAST NEURAL SOLUTION OF NON-LINEAR WAVE 60 ' 
In the foregoing, the pseudespectral method was applied 
equation. The method is also applicable to quasi-linear 
approximations of non-hear partial differential equations 65 
such as the Korteweg-deVries equation for the soliton. The 
KdV or soliton equation was originally introduced to 
to solve linear partial differential equations such as the heat Wk,=-(12&WikW& -%)I 
and d is gmcrald for lvge infmrdS (smnll s) ps: 
(39) 
5,6803 15 
15 16 
-continued mented by the CCD/CID array 510, the arithmetic processor 
(40) 520 and an adder 545. The last term of Equation 41 is 
implemented by the detector array 535 and an adder 550. A 
delay 555 implements the first term of Equation 41. 
Here, W’ and W3 are the synaptic matrices carresponding to The results of a computer simulation of FZG. 8 are 
the first and third derivatives in the Kdv equation, illustrated in FZGS. 9-11. Each drawing of FIGS. 9-11 
and inch& Scaling factor as Well as the time Step. Their includes four plots. The lower left corner (FIGS. 9A, 10A, 
matrix ekments are individually Stored in corresponding 11A) shows the sensors input to the network Since data are 
matrix cells Of an MvM pres so r  Such as the C c D I a  simulated, the contributions from the target and background 
MVM processars of FIGS. 1 or 4. noise are plotted separately, for illustrative purposes, even 
though the network actually receives only their combined MOVING TARGET D-ON 10 
The d&cth of tarBets m w h  in an environment domi- value. The Fourier spectrum of the total signal is given in the 
Mted by ‘’noise” is addressed from the Of upper left corner (FIGS. 9B, 1OB and 11B). The network’s 
mnlinear dynamics. Sensor data are used to drive the output is plotted in the lower right corner (FIGS. 9C, loC, 
K e w W & v f i e s  (Soaton) equation 3% inducing a 11C). Finally, the solution in absence of noise is shown in 
resonancetype phenomenon which indicates the presence of 15 the v p  right corner (FIGS. 9D, 10D, 11D). 
hidden target signals. In  this case, the right hand si& of Eqn. The results are plotted after 10 time steps (using 
33 is not zero, but rather is set to equal the vector S, of A,=O.OOS), and were obtained using a Welement senmr 
derivatives of target sensor outputs. array. FIG. 9 indicates that when only noise is fed to the 
Long-range detection Of the motion Of a target i n  an network, no spurious result emerges. FIG. 10 illustrates the 
environment dOmiMtcd b!f noise and clutter is a formidable 20 spectral network’s detection capability for a target moving 
challenge. Target detection problems are generally in a noise and clutter background 
addressed from the ~ p e c t i v e  of the thear~ of statistical pulse signal-tenoise (SNR) ratio of approximately o d ~ .  
hypdhesis testing. So far. existing methodologies usually Conventional detection methods usually reach their break- 
fail when the signal-to-noise ratio, in  dB units, becomes ing point in the neighborhood of such S N R  ratios. Finally, 
negative. notwithstanding the sophisticated but complex 25 FIG. 11 presents results for a case where the SNR drops 
computational schemes involved. The present invention uses below -10 dB. The above SNR is by no way the limit, as can 
the phenomenology of nonlinear dynamics, not only to filter be inferred from the st i l l  excellent quality of the detection 
out the noise, but also to provide precise indication on the peak. Furthermore, multiple layers of the spectral neural 
position a d  velocity of the target. Specifidy, the invention architecture provide a “space-time” tradeoff for additional 
cmploys the pseudospectral method described above to 3o enhancement. The following Presents a Possible Physical 
solve a driven KtjV equation and achieve, by means of explanation of the detection phenomenon- 
resonances, a dramatic enhancement of the signal to noise In i nme t ing  *e *served detection phenomenon of 
Eqn. 41, it is assumed that the sensor input to the network ratio. 
oscillations; and (2) a “target” 0(x,t), the value of which is 
an of sensors. Both are linear, with pumped into the system via the ~(x, t) term, active steps 
wed Let denote the Of have to be taken to avoid unbounded growth of the solutions. 
and W3 of Equations 39 and 40 fully interconnect the 40 
network which obeys the following system of coupled (42) 
nonlinear differential equations: Thus, based upon the properties of the homogeneous KdV 
equation (Eqn. 34), the target signal 0(x, t) “resonates” with 
the “eigen-solitons” of the homogeneous equation. Hence, it 
45 will be amplified, while the random components q(x, t) will 
be dispersed. Furthermore, such a “resonance” should not 
depend on the target velocity, since the velocity of the 
“eigen-solitons” of the KdV equations are not prescribed. In 
other words, the proposed methodology can detect targets 
While the invention has been described in detail by 
specific reference to preferred embodiments, it is understood 
that variations and modifications thereof may be made 
Without departing from the m e  SPst and scope Of the 
W“,=(llW~*~@dr(*-x*)l 
by a 
To smm the discussion, and with no loss of generality, consists Of Parts: q(x, t)9 space-dependent random 
only mOtion in R’ space is considered. An -Y of N 35 non- only over a few ‘pixels” ofthe sensor -y* Since 
motion-dCteCtor “ncuToI1sn is fed signals sn(t) derived from the KdV equations are non dissipative, andenergy is 
the n-th neuTon* The overh@ng spa@c w’ if v denotes the target’s velocity, then 
e(& tH-4 
(41) 
vdr + 4) = V” (t - 4) - v&) z WLv& - 
m 
~,v,+s4z) 
m 
where W,’ and W,” are the synaptic arrays defined in 50 over a wide range of velocities. 
Equations 39 and 4. &in Eqn. 37, spatial Q.gdmtion will 
b~ considered over the interval [o, 2 R]. The actual positions 
of neurons and Sensors are given by the discrete values 
xn=nAx (d, . . . N-l), with resolution Ax=2 wN. Thus, 
u(x,,, t) is written u,(t). For convenience, N is even. The 55 invention. 
Sensor derivative inputs to the network are denoted by S,,(t). 
The corresponding hardware architecture is illustrated in 
FIG. 8. synaptic &,& 
(chip) embodying mD/a mays ofthe type in 
FIG. 4 md store the ma& elements of w’ and w3 60 a p t k d  time derivative of said state vector, comprising: 
respectively. Arithmetic processors 515,520 are each chips 
embodying arithmetic processors of the type illustrated in 
FIG. 5. Registas 525 and 530 hold the current values of u(t). 
A detector array 535 furnishes the derivative signal vector 
S,(t). The second term of Equation 41 is implemented by the 65 
CCD/CID array 5@0, the arithmetic processor 515 and a 
multiplier 540. The third term of Equation 41 is imple- 
What is claimed is: 
1. A neural architecture for Solving a partial differential 
equation in a State Vector consisting of at least one term in 
a spatial partial derivative of said state vector and a term in 
means for storing an array of mahix elements of a matrix 
for each partial derivative term of said equation, said 
matrix relating said partial derivative to said state 
vector; 
means for multiplying all rows of said matrix elements by 
corresponding elements of said state vector simulta- 
neously to produce terms of a product vector; and 
500,510 are each 
5,680,515 
17 
means for combining said product vector with a previous 
iteration said state vector to produce a next iteration of 
said state vector. 
2. The neural architecture of claim 1 wherein said equa- 
tion comprises plural spatial partial derivative terms and said 
neural architecture comprises plural arrays of matrix ele- 
ments of matrices corresponding to said partial spatial 
derivative terms and plural means for multiplying, said 
neural architecture further comprising means for combining 
the product vectors produced by said means for multiplying. 
3. The neural architecture of claim 1 wherein said partial 
spatial derivative terms definable over a spatial grid of 
points and said matrix elements are obtained in a pseudw 
spectral analytical expression relating each matrix element 
to the distance between a corresponding pair of grid points. 
4. The neural architecture of claim 3 wherein the matrix 
element of column m and row n of the matrix is: 
where x,, and x,,, are the corresponding pair of grid points, 
d is the ordm of the derivative and k is a one of a set of 
5. The neural architecture of claim 1 wherein said means 
an array of N rows and M columns of CCD matrix cell 
groups corresponding to a matrix of N rows and M 
columns of matrix elements, each of said matrix ele- 
ments representable with b binary bits of precision, 
each of said matrix cell groups comprising a column of 
b CCD cells storing b CCD charge packets representing 
the b binary bits of the corresponding matrix element, 
the amount of charge in each packet corresponding to 
one of two predetermined amounts of charge. 
6. The neural architecture of claim 5 wherein said means 
an array of c rows and M columns CCD vector cells 
corresponding to a vector of M elements representable 
with c binary bits of precision, each one of said M 
columns of CCD vector cells storing a plurality of c 
charge packets representing the c binary bits of the 
corresponding vector element, the amount of charge in 
each packet corresponding to one of two predetermined 
amounts of charge; and 
multiplier means operative for each one of said c rows of 
said CCD vector cells for generating, for each one of 
said rows of matrix CCD cells, a signal corresponding 
to the s u m  of all charge packets in both (a) said one row 
and (b) columns of said matrix CCD cells for which the 
corresponding vector CCD cell contains a charge 
packet of a predetermined amount. 
7. A neural architecture for detecting moving targets in 
means for storing an array of matrix elements of a matrix 
for each partial spatial derivative term of a partial 
differential equation in a state vector consisting of at 
least one term in a spatial partial derivative of said state 
vector and a term in a partial time derivative of said 
state vector, each matrix relating the corresponding 
spatial partial derivative to said state vector; 
means for multiplying all rows of said matrix elements of 
each matrix by corresponding elements of said state 
integers. 
for storing said m y  comprises: 
for multiplying comprises: 
images sensed by an may of sensors, comprising: 
5 
10 
15 
20 
25 
30 
35 
40 
45 
50 
55 
60 
18 
vector simultaneously to produce a product vector for 
each matrix; and 
means for combining the product vectors with (a) a 
previous iteration said state vector and (b) a vector 
derived from the outputs of said sensors, to produce a 
next iteration of said state vector. 
8. The neural architecture of claim 7 wherein said equa- 
tion is a soliton equation having first and third order partial 
spatial derivatives of said state vector, whereby said neural 
architecture comprises means for storing respective arrays 
corresponding to said fmt and third order derivatives, 
whereby said means for multiplying produces two product 
vectors corresponding to said first and third order 
derivatives, respectively, said neural architecture further 
comprising means for multiplying the product vector of said 
first order derivative by said state vector. 
9. The neural architecture of claim 8 wherein said partial 
spatial derivative terms are definable over a spatial grid of 
points corresponding to said sensor array and said matrix 
elements are obtainedin a pseudo-spectral analytical expres- 
sion relating each matrix element to the distance between a 
corresponding pair of grid points. 
10. The neural architecture of claim 9 wherein the matrix 
element of column m and row n of the matrix is: 
where x, and x,,, are the corresponding pair of grid points, 
d is the order of the derivative and k is a one of a set of 
integers. 
11. The neural architecture of claim 8 wherein said means 
for storing said m y  comprises: 
an array of N rows and M columns of CCD matrix cell 
groups corresponding to a matrix of N rows and M 
columns of matrix elements, each of said matrix ele- 
ments representable with b binary bits of precision, 
each of said matrix cell groups comprising a column of 
b CCD cells storing b CCD charge packets representing 
the b binary bits of the corresponding matrix element, 
the amount of charge in each packet corresponding to 
one of two predetermined amounts of charge. 
12. The neural architecture of claim 11 wherein said 
an array of c rows and M columns CCD vector cells 
corresponding to a vector of M elements representable 
with c binary bits of precision, each one of said M 
columns of CCD vector cells storing a plurality of c 
charge packets representing the c binary bits of the 
cmesponding vector element, the amount of charge in 
each packet corresponding to one of two predetermined 
amounts of charge; and 
multiplier means operative for each one of said c rows of 
said CCD vector cells for generating, for each one of 
said rows of matrix CCD cells, a signal corresponding 
to the s u m  of all charge packets in both (a) said one row 
and @) columns of said matrix CCD cells for which the 
corresponding vector CCD cell contains a charge 
packet of a predetermined amount. 
means for multiplying comprises: 
* * * * *  
