Analog VLSI implementation for stereo correspondence between 2-D images by Erten, Gamze & Goodman, Rodney M.
266 IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 7, NO. 2, MARCH 1996 
nalog VLSI Implementation for Stereo 
Correspondence Between 2-D Images 
Gamze Erten, Member, IEEE, and Rodney M. Goodman, Member, IEEE 
Abstract-any robotics and navigation systems utilizing stere- 
opsis to determine depth have rigid size and power constraints 
and require direct physical implementation of the stereo algo- 
rithm. The main challenges lie in managing the communication 
between image sensor and image processor arrays, and in par- 
allelizing the computation to determine stereo correspondence 
between image pixels in real-time. This paper describes the first 
comprehensive system level demonstration of a dedicated low- 
power analog VLSI (very large scale integration) architecture 
for stereo correspondence suitable for real-time implementation. 
The inputs to the implemented chip are the ordered pixels from a 
stereo image pair, and the output is a two-dimensional disparity 
map. The approach combines biologically inspired silicon model- 
ing with the necessary intedacing options for a complete practical 
solution that can be built with currently available technology in 
a compact package. Furthermore, the strategy employed consid- 
ers multiple factors that may degrade performance, including 
the spatial correlations in images and the inherent accuracy 
limitations of analog hardware, and augments the design with 
countermeasures. 
I. INTRODUCTION 
HE classical discussion of binocular stereopsis begins 
with the description of two cameras (or eyes) separated 
by a baseline obtaining slightly dissimilar views of the scene. 
The pair of images are then processed, and areas of interest 
(targets) are selected. Corresponding target pairs between the 
images are matched and their spatial relationships (disparities) 
are noted. Determining this correspondence between image 
points (stereo correspondence) is computationally the most 
challenging step of the binocular stereo problem. When appro- 
priate constraints are imposed, interpolation using the sparse 
values of disparities between corresponding target pairs yields 
a dense disparity field, from which the relative depth of each 
image point can be determined. We will focus our discussion 
on the stereo correspondence problem. 
The stereo correspondence problem is highly ambiguous: 
In determining correspondence between two images, one runs 
into many false targets. Fortunately, physical attributes of 
the scene constrain the behavior of surface position, and 
consequently define the properties that a correct match must 
Manuscript received February 11, 1994; revised December 9, 1994 and 
May 28, 1995. This work was supported in part by ONR and A W A  under 
Grant N00014-92-J-1860, the Department of the Amy,  NSF Grant 9461047, 
and NSF Graduate Fellowship. 
G. Erten is with IC Tech, Inc., Okemos, MI 48864 USA. 
R. M. Goodman is with the Department of Electrical Engineering, California 
Publisher Item Identifier S 104.5-9227(96)01244-1. 
Institute of Technology, Pasadena, CA 91125 USA. 
possess. These can be imposed on the computation as con- 
straints and prove most useful in reaching a solution to the 
ill-posed correspondence problem. Most commonly exploited 
constraints are compatibility of matching primitives, unique- 
ness of each match, and the continuity of disparities across 
the image. 
Conventional hardware methods of image processing which 
use microprogrammable systolic array implementations [ 11, [2] 
remain inadequate in offering a real-time solution to stereo 
correspondence, which like many other early vision tasks, de- 
mands high throughput and high computational density alopg- 
side sophisticated algorithms. Recently, analog VLSI (very 
large scale integration) processing arrays [6] have emerged as 
an alternative to address these problems and matured over time 
to tackle retinal adaptation [7] ,  motion [SI, color constancy [9], 
and one-dimensional (1-D) stereo correspondence [lo], [ 111. 
Our work expands on the same tradition by implementing 
a hardware stereo correspondence algorithm to handle two- 
dimensional (2-D) images serially. 
The paper is organized into several sections. We start by 
describing the algorithm. We draw special attention onto 
the specific design choices concerning metrics, procedures, 
and algorithmic enhancements for analog hardware imple- 
mentation. Simulation results from the algorithm’s hardware 
model follow this discussion. We then describe the prototype 
implementation, which uses a 1-D scan line matching strategy. 
Finally, we present test results from this first prototype to 
demonstrate proof-of-concept. 
11. DESCRIPTION OF THE STEREO 
CORRESPONDENCE ALGORITHM 
A. Procedure 
Image matching is carried out between the stereo pairs 
using an improved image block matching scheme. The region 
selected in one image is compared with candidate regions 
in the other image and exactly one region is selected as its 
match. 
Prior to processing, the images are filtered by an exponential 
filter to reduce the undesirable effects of noise. The use of the 
exponential filter may seem unusual, but this choice which is 
dictated by hardware turns out to be an adequate strategy for 
spatial scale adjustment, as well as noise reduction. Filtered 
pixel values are used to compare neighborhoods in each image. 
A comparison of the two filtered neighborhoods is made at 
1045-9227/96$05.00 0 1996 IEEE 
ERTEN AND GOODMAN ANALOG VLSI IMPLEMENTATION FOR STEREO CORRESPONDENCE 261 
xm . .  i 
. . . , '  . . :  ..... : .... ~ .. 1 
. .i ..:. ~ ..... j ....., 
I 
i j  





Fig. 1. Image matching procedure. 
each possible disparity value. This process, which is illustrated 
in Fig. 1, can be summarized as follows: 
1) Select image region X in the left image. X can be 
viewed as a vector with N elements, XI ,  . . * , XN. 
2) Compare X with K candidate match regions, or subim- 
ages Yl , . . , Y,, . , YK in the right image. Each of these 
regions Y is of the same size as X. 
3) Select one region Y as the match for X, based on 
value(s) of a similarity metric. 
4) Assess the computational confidence in the selected 
match, i.e., quantify the likelihood that it is indeed a 
correct match. 
5 )  Repeat for all regions X in the left image. 
The following discussions will concentrate on the issues 
conceming the selection of the similarity metric and the 
methods used for the confidence assessment. 
B. Selecting the Correct Similarity Metric 
for Stereo Correspondence 
The selection of the similarity metric is pivotal when an 
analog hardware implementation is being considered. Tradi- 
tional metric implementations, such as those measuring the 
Euclidian distance between two vectors of pixel values (also 
known as the sum of squared differences-SSD), are not only 
difficult to implement, but also do not yield a reliable enough 
solution. The spatial correlations in an image dictate that the 
SSD operation be carried out using accuracy levels that analog 
hardware can not deliver [U].  
During the following analysis presented to demonstrate the 
advantages of the proposed similarity metric, our discussion 
will be limited to a class of similarity metrics M which are 
summed values of painvise pixel operators m. Thus, m is a 
function of two scalars, and M is a function of two vectors, 
and they are related by a simple summation 
particularly true when the medium is the rather noise-prone 
analog VLSI domain, and not a digital system in which one 
can represent numbers with desired precision. 
A simple method of assessment is based on the probability 
distribution function of the specific similarity metric used, 
or fm(m). For best results, a similarity metric must be 
nearly uniformly distributed. If the values are distributed 
nonuniformly or, worse yet, are gathered around a single value 
with low variance, the match identification process is often 
very ambiguous, and the precision of the system must be very 
high to identify the extrema of the metric values. Since m 
is a function of x and y, fm(m) is a function of the joint 
probability distribution fZy (2, y). Due to significant spatial 
correlations within a region, pixels in the image (y;) and x 
are not independent. This is particularly true around the area 
of the correct match. Assuming identical normal distributions 
fz(x) and fy(y), with means p, = py = 0 and variances 
gx = gy = C T ~ ,  we obtain the following for the correlation 
between pixels x and y 
We can further define expected value of the product xy 
where the joint probability distribution, fxy is 
We will proceed to examine the level of ambiguity when 
using the two most popular similarity metrics, by assessing the 
probability distributions of their values. We will take as given 
the spatial statistics of the images that these metrics attempt 
to match. The probability distribution for (x, y) is assumed to 
be that in (4). 
Absolute difference: m(z ,y )  = 1x - yI. 
The analytical solution for fm(m) is easily obtainable 
where U(m) is the step function i 
0 i f x < O  { 1 otherwise. 
Squared differences: m(x, y) = (X - Y ) ~ .  
There are many possible similarity metrics, m(x, y). We will U ( X )  = 
examine two very common ones, namely the absolute differ- 
ence and the squared difference functions before presenting 
the metric used by the stereo correspondence chip. We will 
assess the appropriateness of all three metrics for a hardware 
implementation based on the level of ambiguity that they gen- 
erate. Minimizing ambiguity of a match is especially important 
in a physical implementation because a physical medium can 
offer only limited precision. As previously mentioned, this is 
The probability distribution fm(m) is 
U(m) - ( 4 ' 7 z C - P ) ) .  (6) 
204- e fm(m) = 
268 E E E  TRANSACTIONS ON NEURAL NETWORKS, VOL. I ,  NO. 2, MARCH 1996 
Correlation probability distribution (experimental) Hardware metric probability distribution (experimental) 
Fig. 2. 
metrics. 
0:00 ldo.00 i.00 3ob.m 4do.00 sdo.00 sdo.00 
Experimental probability distributions for two common similarity 
For both the difference and the difference squared metrics, 
fm(m) is high for low values of m. Since both metrics need 
to be minimized to obtain the correct match, comparisons 
are likely to contain significant ambiguity when using these 
metrics, especially the difference squared metric. Experimental 
values obtained from a natural image (Fig. 10) are shown in 
Fig. 2. The analytical and experimental distributions compare 
rather favorably. Experimentally, though, we observe: 
1) An offset between the two images, most likely due to 
an average difference in illumination between the two 
images. 
2) A higher variance in the Gaussian distribution, which 
most likely means that a parameter adjustment is needed 
in plotting the analytically obtained function(s). 
3) Singularities in the distribution, a byproduct of the 
digitatization process for the photograph. 
c o s h 2 ( 2 ~ y ) .  
1 Hardware Metric: m(z, y) = 
The parameter w is adjustable through change of design 
parameters. 
The metric has a clear upper bound for all (2, y). Its prob- 
ability distribution fm(m) for the hardware metric (Fig. 3) 
was obtained experimentally, again using the same natural 
image pair and procedure as for Fig. 2. The singularities arise 
from the singularities in the image itself and possibly from 
the nature and limitations of the numerical computation. The 
distribution is almost uniform in a range of values, leading us 
to conclude that it has a higher variance than the other metrics 
when scaled to cover the same range. Besides being easy to 
implement in analog VLSI, this metric is thus also far more 
suitable mathematically than the previous two discussed above. 
Now, let us put it all together and examine the formal 
mathematical description of the metric in the context of stereo 
correspondence. Assuming that the selected neighborhoods X 
and Y are 2-D, with width 20  + 1 and height 2X + 1, the value 
of the similarity metric function at image coordinates ( i ,  j ) ,  for 
X of instances 
0 ‘  ! j  m 
0!00 100.00 A.00 0dO.W 4WW sd0.W 60000 
Fig. 3. Probability distribution f,(m) for the hardware metric. 
a given horizontal disparity 6, [in other words, M ( i , j ,  
Where w and 6 are hardware circuit parameters, kT is a 
constant, 6, is the disparity, and X ( i , j )  and Y ( i , j )  are 
central pixel values of the right and the left image regions, 
respectively. The region that generates the highest metric value 
is identified as the corresponding region. Assuming that the 
allowed disparity range is between -A and A, this can be 
written as 
(8) 
w(2a + 1)(2X + 1) 
w + 4  
5 
The above inequality stems from the bounded nature of the 
hardware metric. Unlike many other metrics mentioned in the 
previous section, for any scalar value of the elements of X 
and Y, the metric always stays below a known maximum. 
C. Assessing the Confidence in the Computed Match 
Because ambiguity prohibits the hardware metric (or any 
other single similarity metric) from solving the image match- 
ing problem, a confidence metric is introduced. This is a 
significant algorithmic enhancement, that can be used in a 
variety of ways to add sophisticated postprocessing schemes. 
The confidence metric is extracted from the valuesh that the 
similarity metric takes around the winning disparity. The peak 
value attained at the winning disparity is compared to the 
rest of the metric values. If this peak value is found to be 
a clear winner, the confidence in the corresponding disparity 
is deemed to be high. If, on the other hand, the peak value 
is almost equal to its runner(s)-up, then the confidence is 
deemed to be low. Quantitatively, we have explored two 
ERTEN AND GOODMAN: ANALOG VLSI IMPLEMENTATION FOR STEREO COR1 ZESPONDENCE 269 
Fig. 4. Random dot stereogram pair. 
methods for assessing the peak that are easy to compute for 
ease of implementation. These are the derivative and the ratio 
methods. 
1) Derivative method: The sharpness of the peak can be 
assessed from the first derivative 
(SZ6) 
max M ( z ,  y, E )  Confidence (2 ,  y) = M ( z ,  y, 6) - - A l e S A  
where 
This operator computes the difference between the win- 
ning metric value and its runner-up. 
2 )  Ratio method: 
where M ( z ,  y, 6) is as described in (10). This operator 
computes the ratio of the winner (highest metric value) 
and the sum of all metric values within the window. 
In 1-D step edge images, the peaks of both confidence 
metrics coincide with the peaks of image variance, as one 
would predict. Therefore, the confidence metric performs a 
function equivalent to that of an interest operator; except it 
acts after the matching computation. As previously mentioned, 
the traditional interest operator evaluates the entire image 
to identify high-variance neighborhoods before attempting to 
match corresponding regions. We will show how these two 
metrics perform in application. 
D. Simulation Results 
We simulated our algorithm using random dot stereograms 
(RDS's), a synthesized image and two natural images. 
Random dot stereograms (RDS' s) are image pairs composed 
of various gray level pixels arranged in a random pattern 
(Fig. 4). One of the images is usually a replica of the other, 
except for regions strategically displaced against those in the 
other image to create a sense of depth. When an RDS pair 
is presented to the two eyes, the observer gets the sensation 
of viewing surfaces at different depths because of these 
displacements, or disparities. 
The typical binary RDS contains SO% white and SO% black 
dots or pixels. As one increases the percentage of white or 
Fig. 5. Adjusting target density in RDS's. From left to right target density 
values are 50%, 30%, 20%, lo%, and 5%. Target density is adjusted by 
decreasing the probability of white pixels in the RDS image generation 
program. 
Fig. 6.  Simulating decreasing target density. As target density (dt) decreases, 
the ability of the algorithm to detect the raised square surface in the center 
of the RDS declines. The results above were obtained using the same target 
density values as in Fig. 5 ,  from left to right 50%, 30%, 20%, lo%, and 5%. 
Darker pixels correspond to surfaces that are further away from the viewer. 
Fig. 7. 
University. 
The synthesized image pair courtesy of Prof. D. G. Jones of McGill 
black dots, target density decreases, leading to an increase in 
essentially featureless regions in the image. Decreased target 
density causes the image matching problem to become more 
ambiguous. An RDS made up of all white or all black pixels 
contains no information for image matching. We carried out 
simulation and hardware experiments to study the effects of 
adjusting the percentage of black dots (or targets) in an RDS. 
Fig. 5 shows RDS's with decreasing target density. Fig. 6 
shows the simulation results for the same target densities. 
Pixels that are lighter correspond to regions in the RDS that are 
closer to the viewer. Because the performance of the algorithm 
degrades with decreasing target density, the raised center in 
the RDS also becomes less obvious. Hardware test results are 
reported in Section IV. 
The synthesized image pair (Fig. 7) was previously used 
to evaluate the performance of the stereo matching algorithm 
by Jones and Malik [13]. This is a synthesized image with 
interesting features. The background is similar to a gray- 
level random dot stereogram. The geometry is convergent 
with significant vertical disparity at the comers. Image pair 
contains many occlusion points, some of which extend over 
many pixels. We used this image pair to assess the values of the 
confidence metric and to evaluate the benefits of including the 
second dimension. Simulation results in Fig. 8 show the values 
270 IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 7 ,  NO. 2, MARCH 1996 
Fig. 8. 
methods. 
Confidence metrics using the ratio (center) and derivative (right) 
Fig. 9. 
surfaces further from the viewer.) 
1-D (left) and 2-D (right) simulations compared. (Darker pixels signal 
of the two confidence metrics we described. The confidence 
values are based on the hardware metric and have been 
appropriately scaled to form informative pixel maps. Darker 
pixels are low confidence areas. Note that these overlap with 
featureless and occluded regions of the image. 
So far, only 1-D simulation results have been reported. We 
also simulated our image matching algorithm using 2-D image 
patches and a 2-D search space. A comparison of the 1-D and 
2-D simulations are shown in Fig. 9. Including the second 
dimension brings along three important improvements: First, 
the disparity results are accurate in the corners of the image 
since vertical disparity is corrected for. Second, the image 
matching region expands from being a string of pixels to a 
2-D region of pixels. The correct match is identified based on 
a wider range of support and the results are generally more 
accurate. Third, again with the aid of the second dimension, 
jagged disparity discontinuities are reduced. As one would 
expect, the 2-D version of the algorithm takes significantly 
longer to simulate. The hardware implementation described in 
Section I11 is limited to one dimension. 
Two natural image pairs, both of size 240 x 256 pixels, 
were simulated using our image matching algorithm. Both 
images have been used previously to evaluate other image 
matching algorithms [14]. Fig. 10 shows the rock image 
and the disparity map obtained with the hardware algorithm. 
Fig. 11 shows the same with the train image. 
111. ANALOG VLSI IMPLEMENTATION 
A. Design Goals 
In the design requirements of the chip, we stressed the 
features of simplicity, versatility, accuracy, compactness, and 
low power consumption. We projected that, even though 
the stereo correspondence problem is complex, the actual 
implementation should be very simple. It should be possible 
to process multiple sized images provided that the image 
disparity range is accommodated by the hardware disparity 
range. Moreover, chip outputs of disparity results must be 
accurate. Algorithmic enhancements should be well thought 
out to provide the best retum for the silicon area invest- 
ment without degrading the overall physical performance. 
We anticipated that the confidence metric would take us a 
long way toward compensating for the inherent ambiguities 
of the stereo correspondence problem. We noted that many 
systems that perform similar tasks for solving equivalent vision 
problems require a lot of hardware at very high cost. Our stereo 
correspondence architecture should be implementable using a 
single dedicated chip. Our main advantage for reducing power 
consumption was that analog VLSI chips operating below 
threshold consume far less power than large digital systems 
that emulate vision algorithms. This low-power feature would 
be instrumental in the miniaturization of the system. 
B. Architecturul Overview 
The architectural overview of our overall system is shown 
in Fig. 12. All elements shown have been implemented in 
VLSI except the area enclosed inside the ellipse where the 
disparity is smoothed using a resistive grid. The prototype 
chip fits snugly into a 40-pin TINYCHIP package from metal- 
oxide semiconductor imlpelentation service (MOSIS) and in- 
corporates most of the described algorithmic features. The 
functional units inside its 2 mm x 2 mm 2 pm, n-well 
complimentary metal-oxide semiconductor (CMOS) process 
workspace accommodate image preprocessing to adjust spatial 
scale and reduce noise effects, feedfonvard serial computation 
to handle any size image, and internal voting circuitry to report 
low confidence in ambiguous regions. The implementation 
uses a 1-D search to lower hardware and input-output (1-0) 
overhead. 
C. Design Details 
We will describe in this section the functional units of the 
stereo correspondence architecture, as they are implemented 
in the prototype chip. 
1) 1-D resistive grid for signal conditioning: Prior to the 
image matching step, the pixels of the image pair are 
input onto a 1-D resistive grid. This structure smooths 
the image to the appropriate spatial scale, and to some 
extent, reduces the undesirable effects of noise. The 
horizontal resistor (hRes) circuit [15] is used to form 
the horizontal component of the resistive grid. This 
circuit operates as a typical linear resistance for voltage 
differences of a few hundred millivolts, and is current 
limited beyond that range. Its resistance (R) is control- 
lable between a range of values, typically in the order of 
mega ohms. The vertical conductance (G) component of 
the grid, on the other hand, is formed by connecting a 
transconductance amplifier in the follower configuration. 
The controllable gain of the amplifier determines the 
conductance of the resistive element. The grid acts as 
an exponential filter that diffuses the current output from 
ERTEN AND GOODMAN ANALOG VLSI IMPLEMENTATION FOR STEREO CORRESPONDENCE 21 1 
(a) (b) 
Fig. 10. The rocks image (a) and its disparity map as reported by simulation (b). Image pair couflesy of L. H. Matthies of Jet Propulsion Laboratories 
(JPL). This is a photograph of a scene outside JPL in Pasadena, CA 
(a) (b) 
Fig. 11. 
maquette of a small town scene. 
The train image (a) and its disparity map as reported by simulation (b). Image pair courtesy of L. H. Matthies. It is a photograph of the 
the vertical conductance elements between neighboring 
pixels. The diffusion length of the resistive grid, which 
is analogous to the aexponential filter value we used in 
simulations, is inversely related to the square root of the 
product (RG). For instance, if one increases both G and 
R at the same time, the diffusion length, or pictorially the 
smoothing effect of this filter is diminished. There are 19 
inputs onto the 1-D resistive grid from the right image 
and nine inputs from the left image (to terminals labeled 
E, in Fig. 13). The four end pixels of both images are 
not used in the comparison to avoid distortion effects. 
2)  Pixel comparison array: The chip employs the bump 
circuit [ 161 for measuring voltage similarities between 
pixel pairs. Bump circuits’ output currents can be con- 
nected together to measure similarities between groups 
of pixels. This current summing feature, illustrated in 
Fig. 14, makes it very straightforward to compare a 1-D 
window of the left image with several of the same size 
windows in the right image. This comparison is made 
at each possible disparity value. The similarity metric 
value for each disparity is the summed current of bump 
circuits that compare the pixel windows corresponding 
212 IEEE TRANSACTIONS ON NEURAL NE?I?IORKS, VOL. I, NO. 2, MARCH 1996 




I l l  
VOLTAGE 
CORRELATOR 
o u t p l t s ~ t l  
'i 
RIGHT IMAGE FILTER OUTPCPI' 
Fig. 12. Analog VLSI architecture for stereopsis 
ONE DIMENSIONAL RESISTIVE GRID CIRCUIT DESCRIPTION 
ANALOG VLSI IMPLEMENTATION 
ri 
+ V ( i + 1 )  ,I, 3 V(1- 1) 
U U 
h R e s  h R e s  h R e a  
I I I < C  
Fig. 13. 1-D resistive grid on chip. 
to that disparity. Assuming that the window is (27 + 1) 






1 + cosh2(& * (Image,(i) - ImageJi - 6))) 
(12) 
bumpbias > i I I 1 I 
to other windows 
to evaluate I (sum) 
for confidence 
V 
I(0.t) at d;sparity = d 
Fig. 14. Bump window corresponding to disparity = d. 
3) 
here &ias is the current in the bias transistor of the bump 
circuit, 20 and K are circuit parameters, kT is a constant, 
6 is the disparity, and ImageR(.) and lmage,(z) are 
filtered pixel values of the right and the left image 
respectively. Note that the above equation is essentially 
the same as the hardware metric quantity, with the 
exception of a dimensional reduction. In our VLSI 
implementation, the window width is five pixels (i.e., 
Winner-take-all (WTA) circuit: There are 11 current 
sums of the kind shown in (12). These correspond to 
disparities in the range [-5,  51. These currents are 
input to a WTA circuit [17]. The highest current sum is 
declared the winner, and it determines the disparity value 
at the current pixel. Assuming that the allowed disparity 
range is between -A and A, this can be written as 
y = 2). 
Disparity(%) = 6: Iout(x,6) = max Iout(z,[) 
-n<c<n 
In our VLSI implementation A = 5. The last inequality 
is included to show that the current (and consequently 
the value of the metric) is limited to a fraction of 
This property could be exploited in computing the 
confidence metric, as well as in determining monocular 
regions. The WTA circuit's common gate connection il- 
lustrated in Fig. 15 dictates that, when the bias transistor 
is operating below threshold, only the transistor sinking 
the highest current can be operating in the subthreshold 
saturation region. Thus, essentially a single one of the 
transistors configured to supply the bias current is active. 
ERTEN AND GOODMAN: ANALOG VLSI IMPLEMENTATION FOR STEREO CORRESPONDENCE 213 
I ( d = + 2 )  
+, disparity estimate 
at current window 
tilted voltage 
+ V(2) 
, Take All 
Fig. 15. WTA circuit and disparity estimation. 
The maximum current typically generates a voltage 
between 2.0-2.8 V at the gate of this active transistor. 
The rest of the currents lead to voltages close to 0 V. 
Thus, these nodes can be used to set the conductances of 
current to the total current supplied by all bump circuits 
(14) 
Confidence(z) = m a a  lout 
C s I o u t  . 
a series of followers that are connected to a tilted voltage 
line, as shown in Fig. 15. The follower connected to 
the winning current will set the voltage on the common 
output node, which carries the disparity information. In 
our chip, the range of voltages assigned to disparities are 
between 1.25 V (for the maximum negative disparity, 
-5) and 3.75 V (for the maximum positive disparity 
In areas of the image with flat 
intensity values (i.e., no features), the comparison of 
bump circuit currents will not produce a clear maximum. 
Limitations and mismatches inherent to a physical im- 
plementation introduce further ambiguity. Therefore, the 
maximum current (and consequently the disparity) under 
such conditions may be arbitrary. 
Most computational approaches, in the absence of suffi- 
cient feature information (or targets), introduce window 
size adjustment to include enough targets for a meaning- 
ful match. To avoid this adjustment, which is difficult 
in hardware, we introduced a confidence metric in our 
algorithm. When this is below an adjustable threshold, at 
occlusion points or when the window size is inadequate 
for resolving ambiguities, low confidence is reported. 
The confidence metric value in our harduiare implemen- 
tation is determined by the ratio of the maximum bump 
+5).  
4) Confidence circuit: 
We designed a current fractioning method specifically 
for this purpose. Since the value of the ratio we are trying 
to compute is always less than one, thresholding a frac- 
tion of Cs Iout with the maximum current maxs lout 
serves a similar function: Instead of trying to divide a 
current by another, we take an adjustable fraction of the 
larger current and compare it to the smaller one. Thus, 
not only are disparity and confidence values computed 
in parallel, but also the confidence circuit is an extension 
of the WTA structure (Fig. 16). 
The confidence value signal is near 0 V when the 
disparity output carries a high confidence and near 2.0 V 
when it carries a low confidence. In smooth areas of 
the image as well as at distinct occlusion points, low 
confidence is reported. Disparity output collected across 
the image can be viewed as a sparse map with gaps 
corresponding to the low confidence points. A dense map 
can be obtained from the sparse values by interpolating 
between the high confidence values. This operation is 
very suitable for a surface interpolating resistive grid, 
where the confidence determines the conductance (G) 
through which the disparity is input onto the grid. 
IV. HARDWARE TEST RESULTS 
A set of TINYCHIP’s has been designed and fabricated in a 
2 pm, n-well CMOS process supported by MOSIS. The pack- 
274 IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 7 ,  NO. 2, MARCH 1996 
I (out) 
confidence-estimate (0 V = high confidence) 
from bump windows total current 
Fig. 16. Confidence circuit as part of the WTA structure 
age accommodates 40 pins and provides approximately a 2 mm 
by 2 mm workspace. Pixels from the two images (19 from the 
right image and nine from the left) are input in parallel. Each 
input in time corresponds to a single window centered around 
a single pixel. The chip contains five adjustable parameters: 
R value: This sets the value of the horizontal resistors 
in the 1-D resistive grid. (Fig. 13). 
G value: This sets the value of the vertical resistors in the 
1-D resistive grid. Its adjustment varies the gate voltage 
of the bias transistor of the followers (Fig. 13). 
WTA bias: This value determines the gate voltage on the 
transistor that biases the WTA circuit, and consequently 
its current capacity (Fig. 1.5). 
Bump circuit bias: This value determines the gate volt- 
age on the bump circuit bias, and consequently its 
current capacity, Ibias, in (12). 
Confidence bias: This value determines what fraction of 
the summed current will be compared to the maximum 
bump circuit window current (signal sum ratio bias) in 
Fig. 16). 
Input data limits can also be adjusted. Reasonable values 
range between 0.75 and 4.25 V. Most tests, however, were 
carried out using a smaller range 1.5 to 3.5 V with the thought 
of accommodating the silicon photoreceptor [ 151. 
For testing the chip was connected to a custom board with 
a PC interface. The board converts digital input from the PC 
to analog input to the chip. Similarly, it also converts the 
analog chip output to digital representation for storage. Pixel 
values from the image pairs were presented to the chip a 
window instance at a time. A 64 x 64 image creates 2944 
such instances. 
To provide a meaningful comparison between simulation 
and chip results, we used the exact same images that we sim- 
winner-take-all bias lH 
Fig. 17. RDS results. RDS results with different diffusion lengths. G is fixed 
at 1.2 V while R values from left to right are 0.5 V, 0.75 V, and 1.0 V. The 
raised surface in the center (shown in lighter gray) is readily visible in all 
three tests. 
ulated. We made both qualitative and quantitative comparisons 
between the two in the sections that follow. 
A. Random Dot Stereograms (RDS's) 
Various experiments were conducted to evaluate the per- 
formance of the hardware with random dot stereograms. The 
test results confirm expectations and compare favorably with 
simulation results of Section 11. We used an RDS size of 70 x 
70. All regions are at either zero or a single constant negative 
disparity. Holding the G value constant at 1.2 V, we obtained 
results for three different R values. No significant change 
is noted because the RG value is quite high in all settings. 
We also compared the chip output to the correct disparities. 
Average error qe (V) is around 0.2.5 V for all three chip 
outputs. The variance of the error c," (V) is also around 0.25 V. 
The average error can be viewed as a consistent offset that is 
an artifact of the circuit size mismatches that occur commonly 
ERTEN AND GOODMAN: ANALOG VLSI IMPLEMENTATION FOR STEREO CORRESPONDENCE 275 
Fig. 18. 
As before, the raised surface in the center (shown in lighter gray) becomes less visible as target density decreases. 
Adjusting the target density of an RDS, decreasing density from left to right. Simulation results from the same experiment were presented in Fig. 6.  
Fig. 19. 
To its right are disparity outputs from two chips. In all three, darker pixels correspond to surfaces further from the viewer. 
Synthesized image pair results. The leftmost image is a simulation result. It depicts the disparity map from a scaled version of the synthesized pair. 
Fig. 20. 
The two rightmost images are the raw disparity outputs from two different chips. 
The rock image pair disparity output. The left-most image is the original (left) photograph. To its right is the disparity map from simulation. 
in analog hardware. The error variance is more representative 
of the noise factor. 
Our simulation results already demonstrated that decreasing 
the target density in an RDS causes the matching problem to 
become more ambiguous. We also did hardware experiments 
to show the performance of the chip with target densities from 
50 to 5% (Fig. 18). It is clear that as was the case with the sim- 
ulation results, as target density decreases, chip performance 
declines. Table I shows the results of error analysis. Both 
the average error and its variance increase as target density 
decreases. 
128 pixels to 64 x 64 pixels. Hardware disparity map outputs 
are of dimensions 64 x 46 because of the trimming effect 
in data input. Fig. 19 shows the disparity maps. The leftmost 
map, which is the simulation output from the scaled image, 
is included for reference. This simulation output is not as 
good as the one we reported previously (Fig. 9): Both the 
reduced resolution and the reduction process itself add to 
produce worse than usual results. The two disparity maps on 
the right were obtained from two different chips. Error analysis 
indicates that average error, ve, is between 0.10-0.25 V. The 
variance of the error, a:, is between 0.2-0.4 V. 
B. The Synthesized Image Pair 
Simulation results with this image pair were presented in 
Section 11. To accommodate hardware disparity limits, image 
dimensions were reduced from their original size of 128 x 
C. The Rocks Image Pair 
We already reported simulation results from this image 
pair (Fig. 10). For hardware test the image dimensions were 
reduced to 60 x 64 pixels to accommodate the disparity 
216 
50 -0.25 
EEE TRANSACTIONS ON NEURAL NETWORKS, VOL. I, NO. 2, MARCH 1996 
0.25 
TABLE I 




I I I I 
-0.34 0.38 
-0 38 0.35 
-0.45 0.38 
, 
n S A  I n 7 2  
.n A T  I n ?r( 
- 
- 
Fig. 21. The rock image pair confidence output. The leftmost image is the 
original (left) photograph. The two rightmost images are the confidence values 
obtained from two different chips. Black pixels signal low confidence. 
Fig. 22. The rock image pair processed disparities (left most image from 
simulation). The left-most image is the disparity output from simulation. The 
two right-most images are the processed disparity values from two different 
chips. Results shown have been obtained by using a simple interpolation 
scheme, using only highconfidence points. Darker pixels signal surfaces 
further from the viewer. 
limits of the hardware implementation. Resulting disparity 
map dimensions are 60 x 46 pixels. Figs. 20-22 show chip 
outputs. The two rightmost images in Fig. 20 show the raw 
disparity outputs from two chips. The two leftmost images 
are the actual image and the disparity results from simulation. 
Fig. 21 shows confidence values. Pixels shown in white are 
the high-confidence pixels. Note that areas with flat intensity 
are marked in black (i.e., low confidence). The original image 
is included for comparison. Fig. 22 shows processed disparity 
values in comparison with simulation results (leftmost image). 
Disparity is “interpolated” using only high confidence disparity 
values already computed. 
V. CONCLUSION 
We have described a hardware stereo correspondence al- 
gorithm, its hardware implementation and results obtained 
from simulation and hardware test. All of these collectively 
show that the system is functional, expandable to solve real- 
world problems in real-time, and implementable with existing 
technology. The system possesses the many favorable features. 
Among them, the most prominant are simplicity, versatility, 
accuracy, and economy, both in cost and power use. 
Furthermore, the system described in this paper can be a 
precursor for a whole series of applications. We mention two 
specific areas of future work that will augment the capabilities 
of OUT stereo correspondence system significantly. First, for 
applications that require higher accuracy, the chip can be made 
part of a larger network of circuits that use its disparity output 
as a rough estimate or staking point. Iterative schemes that 
draw from a series of disparity maps obtained by slightly 
perturbing the camera positions can be utilized to obtain a far 
more accurate disparity map of the scene [18]. Second, a larger 
version of the chip using 2-D image patches and 2-D search 
areas can be built. Simulation results obtained using a 2-D 
matching region and a 2-D match search area were presented 
in Section n. These showed a significant improvement over 
the I-D results. Including the second dimension in hardware 
does not require any extensive design change to the existing 
architecture, merely an increased number of already described 
computation units, more VLSI area and 1-0 pins. The 1-0 
pin count limitation could be overcome by devising intelligent 
pixel scanning schemes. 
REFERENCES 
I. N. Parker, “VLSI architecture,” in VLSI Image Processing, R. J. Offen, 
Ed. New York McGraw-Hill, 1985. 
J. A. Webb and T. Kanade, “Vision on a systolic m a y  machine,” in 
Evaluation of Microcomputersfor Image Processing, L. Uhr, K. Preston, 
S. Levialdi, and M. J. B. Duff, Eds. 
J. M. Hakkarainen, J. J. Little, H. S. Lee, and J. L. Wyatt, “Interaction of 
algorithm and implementation for analog VLSI stereo vision,” in Proc 
SPIE. Visual Inform. Processinn: From Neurons to Chim. vol. 1473, VD. 
Orlando, FL: Academic, 1986. 
173-184, 199l“. 
A. K. Chhabra and T A Gronan, “Depth from stereo: Variational 
theory and a hybrid analog-digital network,” in Proc. SPIE, Image 
Understanding Man-Machine Inteface II, vol. 1076, pp. 13 1-138, 1989. 
A. Gruss, L. R. Carley, and T. Kanade, “Integrated sensor and range- 
finding analog signal processor,” IEEE J. Solid-State Circuits, vol. 26, 
no. 3, pp. 186191, 1991. 
M. Sivilotti, M. Mahowald, and C. Mead, “Real-time visual compnta- 
tions using analog CMOS processing arrays,” in Proc. Stanford Con$ 
VLSI, 1987. 
C. Mead, “Adaptive retina,” in Analog VLSI Implementation of Neural 
Systems, C. Mead and M. Ismail, Eds. Boston, MA: Kluwer, 1989. 
J. Hutchinson, C. Koch, J. Luo, and C. Mead, “Computing motion using 
analog and binary resistive networks,” IEEE Comput., vol. C-21, pp. 
53-63, 1988. 
A. Moore, J. Allman, and R. Goodman, “A real-time neural system for 
color constancy,” IEEE Trans. Neural Syst., vol. 2, pp. 237-247, 1991. 
M. Mahowald and T. Delbriick, “Cooperative stereo matching using 
static and dynamic image features,” in Analog VLSI Implementation of 
Neural Systems, C. Mead and M. Ismail, Eds. Boston, MA: IUuwer, 
1989. 
M. Mahowald, “VLSI analogs of neuronal visual processing,” Ph.D. 
dissertation, California Inst. Technol., Pasadena, 1992. 
G. Erten, “Analog VLSI architecture for stereo correspondence,” Ph.D. 
dissertation, California Inst. Technol., Pasadena, 1993. 
D. G. Jones and J. Malik, “A computational framework for determining 
stereo correspondence from a set of linear spatial filters,”, in Proc. 
European Vision Con$, 1992. 
R. Szeliski, Bayesian Modeling of Uncertainty in Low-Level Vision. 
Boston, MA: Kluwer, 1989. 
C. Mead, Analog VLSI and Neural Systems. Reading, MA: Addison- 
Wesley, 1989. 
T. Delbriick, “Bump circuits for computing similarity and dissimilarity 
of analog voltages,” California Inst. Technol., Comp. Neural Sci. Memo 
10, 1991. 
ERTEN AND GOODMAN: ANALOG VLSI IMPLEMENTATION FOR STEREO CORRESPONDENCE 211 
[17] J. P. Lazzaro, S. Ryckebusch, M. A. Mahowald, and C. A. Mead, 
“Winner-take-all networks of O( N )  complexity,” Caltech Comput. Sci. 
Dep., Tech. Rep. Caltech-CS-TR-21-88, 1989. 
[18] L. Matthies, T. Kanade, and R. Szeliski, “Kalman-filter based algorithms 
for estimating depth from image sequences,” Int. J. Compuf. Vision, vol. 
3, pp. 209-236, 1989. 
Gamze Erten (S’91-M’95) received the B.S. de- 
gree in electrical engineering from Stanford Uni- 
versity, CA, in 1985. After technical management 
training for an additional year, she joined the grad- 
uate program at California Institute of Technology, 
Pasadena, where she received the M S. and Ph.D. 
degrees in electrical engineering in 1991 and 1993, 
respectively. 
She worked on microprocessor and conventional 
computer architectures as Digital VLSI Design En- 
gineer at the Engineering and Manufacturing facil- 
ities of NCR AT&T in San Diego, CA, between 1985 and 1989. She has 
consulted for General Motors, Warren, MI, and has taught workshops on neural 
networks, fuzzy logic, and real-time intelligent computing. Since 1993, she 
has headed IC Tech, a small research and development firm in Okemos, MI 
The company’s mission is technology development and transfer in the areas 
of intelligent sensing and computing systems, including vision, speech, signal 
processing, and process control. She is currently the Principal Investigator of 
several projects in these areas. 
Dr. Erten serves on the Executive Committee and the Induqtrial Electronics 
Chapter of the IEEE Southeastern Michigan Section. She is the Chair 
of the Real-Time Prockss Control of the IEEE Control Society Technical 
Committee on Real-Time Computing and Signal Processing, and is presently 
Co-chair of the Robotics and Machine Vision session of the 1996 IEEE 
International Conference on Neural Networks (ICNN) She has been a 
reviewer of several Funding Agencies and several IEEE TRANSACTIONS 
including IEEE TRANSACTIONS ON NEURAL NETWORKS. 
Rodney M. Goodman (M’85) was born in London, 
England, on February 22, 1947. He received the 
B.Sc. degree in electrical engineering from Leeds 
University, Yorkshire, U.K., in 1968, and the Ph.D. 
degree in electronics at the University of Kent at 
Canterbury, U.K., in 1975. 
From 1975 to 1985 he was on the faculty of 
the University of Hull, U.K. In 1985 he joined 
the faculty of the California Institute of Technology 
where he is now Professor of Electrical Engineering, 
Computation, and Neural Systems. He is the Direc- 
tor of the National Science Foundation’s Center for.Neuromorphic Systems 
Engineering at Caltech. He is the Founder of three advanced technology 
research and development companies in both the U.S. and the U.K. He is 
currently a Consultant for the Jet Propulsion Laboratory, and for Pacific 
Bell. His research interests include communications, information theory, 
neural networks, and expert systems-from both a theoretical and a VLSI 
implementation viewpoint. His research has also included error control coding 
for VLSI memories, and neural network VLSI implementations including 
neural associative memories with large capacity. He has also developed new 
expert system technologies that have been successfully transferred to industry. 
These include a new class of rule-based neural networks which feature explicit 
knowledge in the form of human understandable rules. He has published over 
1.50 technical papers in his areas of expertise. 
Dr. Goodman is a Chartered Electrical Engineer of the IEE in the U.K. He 
is a reviewer for IEEE TRANSACTIONS O  COMPUTERS, IEEE TRANSACTIONS 
ON INFORMATION THEORY, and IEEE TRANSACTIONS ON NEURAL NFTWORKS, 
and has been actively involved in the organizing committees of many recent 
neural networks meetings including NIPS, IJCNN, and Snowbird. 
