Approaches to the implementation of binary relation inference network. by Tong, C. W. & Chinese University of Hong Kong Graduate School. Division of Systems Engineering and Engineering Management.
1 燃 - M i 
A P P R O A C H E S T O T H E I M P L E M E N T A T I O N O F 
B I N A R Y R E L A T I O N I N F E R E N C E N E T W O R K 
r -i 
f / r Y r - M > Pi \ , \ 
r . " 、 . . ‘ 一''“ 為 By 
_ 1 4 SEP Wj 
\ry
 1 H ju ！ C. W. TONG 
… … ; . ‘/‘：？/f 
笮 知 贊 s " : ， 為 / 
A Thesis Submitted in Partial Fulfillment of 
the Requirements for the Degree of 
MASTER OF PHILOSOPHY 
in 
the Department of Systems Engineering 
The Chinese University of Hong Kong 
Approved as to style and content by: 
1. ,、• tt i,..：義 __il議MI^ M^. 
. . . . . . . . . 
June 1994 

































In high performance computing, the digital parallel processing machines have been 
very successful, achieving great speedup in numerically intensive scientific and business 
applications?? However, in specific real-time applications like pattern recognition, in 
which the data to be analyzed are noisy and distorted, the parallel machines have not 
been able to provide satisfactory performance. There are active researches in finding al-
ternative computing structures to solve these problems efficiently, and neural networks 
have been found to be a promising architecture. Taking advantage of the compact 
realization of analog circuits and the parallelism in neural networks, there have been 
interests in applying analog circuits based neural networks in solving optimization prob-
lems. However, the learning capability in neural networks requires programmability to 
be built in these networks and this complicates the circuitry in building the analog 
computing machine. 
The binary relation inference network, a connectionist network, provides a platform 
for continuous-time domain parallel processing in solving optimization problems. The 
inference network is formed from the interconnection of self-contained computational 
elements, in a form of parallel and distributed computation. The structure and opera-
tions in the inference network are to be defined from the problem to be solved. In this 
thesis, several inference network designs have been devised to solve path problems in di-
rected and undirected TV-node graphs. The path problems solved are the shortest path 
problem, the transitive closure problem and the minimum spanning tree problem. In 
ii 
addition to using standard analog integrated circuit components, VLSI function blocks, 
including a magnitude preserving maximizer and a magnitude preserving minimizer, 
have been designed to build the required inference networks. Performance of the blocks 
and the networks have been studied through analysis, simulations and prototypes. As-
suming a synchronous operating mode, the worst case solution time of the inference 
network is shown to be of 0(log2 N) [85]. However, a practical size independence is 
promised by the parallel operations in the computational elements in an analog oper-
ating inference network. Issues in building large networks from small networks are also 
addressed. 
The binary relation inference network has been found to be well matched to the 
closed semiring algebraic structure. Sites and units in the inference network correspond 
to the extension and summary operators in the closed semirings, respectively. With this 
correspondence and a dynamic programming algorithm, the closed semiring provides 
a systematic way in solving problems on the binary relation inference network. The 
minimum spanning tree solution is obtained from a closed semiring representation of 
the problem. 
In dealing with problems that cannot be solved directly on the inference network, 
it is possible to have the inference network to act as an embedded engine to solve 
the involved subproblems. In particular, a framework has been devised for embedded 
application of the analog operating inference network for the "inverse shortest paths 
problem". The overhead in controlling the inference network with a host computer 
should be reduced by this framework, without which the potential speedup that can be 
obtained would be limited. 
iii 
ACKNOWLEDGMENTS 
I am greatly indebted to my supervisor, Dr. K. P. Lam, for the guidance provided. 
This thesis would not be possible without the insights and direction from him. 
I would also like to thank the staff of the Department of Systems Engineering, The 
Chinese University of Hong Kong for providing a nice environment to work in. Financial 
supports from the Research Grants Council for the necessary VLSI chip fabrication and 
for my conference attendance in Orlando, under Grant No. CUHK58/93E, are gratefully 
acknowledged. 
Contents 
1 Introduction 1 
1.1 The Availability of Parallel Processing Machines 2 
1.1.1 Neural Networks 5 
1.2 Parallel Processing in the Continuous-Time Domain 6 
1.3 Binary Relation Inference Network • . • 10 
2 Binary Relation Inference Network 12 
2.1 Binary Relation Inference Network 12 
2.1.1 Network Structure . • 14 
2.2 Shortest Path Problem . 17 
2.2.1 Problem Statement 17 
2.2.2 A Binary Relation Inference Network Solution 18 
3 A Binary Relation Inference Network Prototype 21 
3.1 The Prototype 2 2 
3.1.1 The Network . • 22 
3.1.2 Computational Element 22 
3.1.3 Network Response Time . • 27 
3.2 Improving Response 29 
3.2.1 Removing Feedback 29 
3.2.2 Selecting Minimum with Diodes 30 
3.3 Speeding Up the Network Response 33 
3.4 Conclusion . 35 
4 VLS I Building Blocks 36 
4.1 The Site 37 
4.2 The Unit 40 
4.2.1 A Minimum Finding Circuit • . • 40 
4.2.2 A Tri-state Comparator . • 44 
4.3 The Computational Element 45 
4.3.1 Network Performances 46 
4.4 Discussion 47 
5 A VLS I Chip 48 
5.1 Spatial Configuration 49 
5.2 Layout, . • , . • • • 50 
5.2.1 Computational Elements 50 
5.2.2 The Network • . 52 
iv 
CONTENTS  v 
5.2.3 I/O Requirements 53 
5.2.4 Optional Modules 5 3 
5.3 A Scalable Design • • . . . . _ . 5 4 
6 The Inverse Shortest Paths Problem 5 7 
6.1 Problem Statement 5 9 
6.2 The Embedded Approach 63 
6.2.1 The Formulation - 63 
6.2.2 The Algorithm 6 5 
6.3 Implementation Results 66 
6.4 Other Implementations • . 67 
6.4.1 Sequential Machine 67 
6.4.2 Parallel Machine 6 8 
6.5 Discussion 68 
7 Closed Semiring Optimization Circuits 71 
7.1 Transitive Closure Problem 72 
7.1.1 Problem Statement 72 
7.1.2 Inference Network Solutions 73 
7.2 Closed Semirings ， . • 76 
7.3 Closed Semirings and the Binary Relation Inference Network • • • • • • 79 
7.3.1 Minimum Spanning Tree 80 
7.3.2 VLSI Implementation 84 
7.4 Conclusion 86 
8 Conclusions 灯 
8.1 Summary of Achievements • . • 87 
8.2 Future Work • - 8 9 
8.2.1 VLSI Fabrication . . 89 
8.2.2 Network Robustness 90 
8.2.3 Inference Network Applications 91 
8.2.4 Architecture for the Bellman-Ford Algorithm 91 
Bibliography 92 
Appendices 99 
A Detailed Schematic 99 
A.l Schematic of the Inference Network Structures 99 
A. l . l Unit with Self-Feedback . 99 
A. 1.2 Unit with Self-Feedback Removed 100 
A. 1.3 Unit with a Compact Minimizer 100 
A.1.4 Network Modules . 100 
A.2 Inference Network Interface Circuits 100 
B Circuit Simulation and Layout Tools 107 
B.l Circuit Simulation . 107 
B.2 VLSI Circuit Design . • . . 110 
B.3 VLSI Circuit Layout I l l 
CONTENTS  vi 
C The Conjugate-Gradient Descent Algorithm 1 1 3 
D Shortest Path Problem on MasPar 1 1 5 
List of Tables 
3.1 A sample data set of a shortest path problem • . 28 
3.2 Number of op-amps and comparators required in each CE 30 
6.1 Run-time of a 19-cities problem 68 
6.2 Run-time of a 19-cities problem on a parallel machine 68 
vii 
List of Figures 
1.1 Analog computer for solving the Van der Val equation 8 
1.2 Solving simultaneous differential equations with an analog computer • . 8 
2.1 Interconnection in a general binary relation inference network 14 
2.2 A simple binary relation inference network • . • . 15 
3.1 The bus configuration in the prototype — . 23 
3.2 A typical non-inverting adder 23 
3.3 Unit function in a CEior 5-node graph inference network 24 
3.4 Feedback with no hold circuit 24 
3.5 Adding a hold circuit . . . . • • 25 
3.6 Network response 28 
3.7 Removing feedback in the unit circuit 30 
3.8 Network response with modified units 31 
3.9 Units using diodes to find minimum 32 
3.10 Finding the active site in a CE • 32 
3.11 Limiting output level of inactive sites 33 
3.12 Improved response when limiting output level of inactive sites . . . . . . 34 
3.13 Network response with reduced operating ranges 34 
3.14 Simulated network response with OP-27 op-amps used 35 
4.1 An operational transconductance amplifier 37 
4.2 An OTA based adder 38 
4.3 Simulated adder response 38 
4.4 A WTA-based magnitude preserving minimizer . 41 
4.5 A minimum selecting circuit 42 
4.6 Simulation result of the minimum selecting circuit 42 
4.7 A tri-state comparator 、 44 
4.8 Simulated transfer curve, Vrej = 2V 44 
4.9 Typical unit in the network 45 
4.10 Chip layout 46 
5.1 a. Computational element placed on a disk b. CEs placed in a torus . 49 
5.2 The layout of a module • f. 51 
5.3 The floorplan of a CE • . • 52 
5.4 Layout of a CE in a 6-object network . . 52 
5.5 Cascading inference network chips into a larger network 55 
6.1 Embedded approach for the inverse shortest paths problem 64 
viii 
LIST OF FIGURES  ix 
6.2 A system for solving shortest path problems 70 
7.1 A map for the transitive closure problem . • • • 74 
7.2 Measured network response in finding transitive closure 74 
7.3 A computational element for the transitive closure problem . 75 
7.4 A two-input maximizer with saturation guard 81 
7.5 A minimizer . 81 
7.6 A minimum spanning tree in a 4-node graph 82 
7.7 Simulated network response 82 
7.8 A simple 2-input maximizer 83 
7.9 An erratic network response 83 
7.10 Network response with fast op-amps in the maximizer 84 
7.11 Maximizer with OTA 8 5 
8.1 Photograph of the fabricated chip • . . 90 
A.l Schematic of unit with self feedback 101 
A.2 Schematic of Unit with self feedback removed . . . . • • . . • 102 
A.3 Schematic of a CE with diodes used in maximizer • 103 
A.4 Schematic showing connection of CE modules 104 
A.5 A computational element assembled in a module 104 
A.6 Schematic of backplane - sheet 1 of 2 . . . . • 105 
A.7 Schematic of backplane - sheet 2 of 2 106 
B.l A session with the Chipmunk digital simulator I l l 
B.2 Module hierarchy in the layout of the chip I l l 
B.3 A MAGIC session showing the stacking of site modules into ^ CE . . . . 112 
D. l The MasPar at initialization 
Chapter 1 
Introduction 
Parallel processing has been considered as an alternative to computation on a single 
processor or sequential processing for low cost, high performance computing [94, 79], 
and sometimes it may be the only alternative. Scientific problems in vision, climate 
model simulation, ocean circulation fluid turbulence, etc., require computing speed 
in the range of teraflops. Business applications like transaction processing in airline 
reservation systems require processing rates in the range of 100,000 TPS (transactions 
per second). These performance levels would only be available from parallel processing 
machines. For applications that require computing speed in the range of 10000 TPS, 
though a single processor machine may be sufficient, it is usually found that it would 
be more cost effective if parallel processing machines are used. The proliferation of 
various commercial offerings in parallel computers [95], which are previously limited to 
machines in research laboratories, do indicate the trend in the widespread employment 
of parallel processing machines. In the early days, the lack of software support and 
the difficulty of programming in parallel processing machines have once hampered the 
development of the machines. People have instead pursued high performance with 
1 
CHAPTER lh INTRODUCTION  2 
single processor machines, but soon find pushing hardware technology to the limits 
of chip building [35]. And even in single processor system designs, the application of 
parallel processing is still a necessity. This is evident from the pipelined designs to 
the recent employment of superscalar, superpipelined designs [20] in high performance 
microprocessors. 
One of the factors that contributes to the proliferation of parallel computers is the 
advances in VLSI technology. A parallel computer needs many processing elements to 
operate at the same time. The ease of implementing processing elements and memory 
in VLSI allows parallel computers to be built in a more manageable scale despite the 
complexity involved. 
1.1 The Availability of Parallel Processing Machines 
Research and development in parallel computers could be dated back to early 1970s 
with the first vector computer, Cray 1. Arithmetic operations in a vector computer 
are divided into stages; for example, a floating point addition could be divided into 
stages in comparing the exponents, shifting the operands and performing the additions. 
If two floating point additions are required, successive additions can be overlapped to 
increase the throughput. The parallelism exhibited in these operations is one of the low 
level parallelism. Other possible low level parallelisms include overlapped instruction 
execution, pipelining in arithmetic and logic units, etc. 
The first truly parallel computers go with the pioneering efforts in the C.mmp [102] 
and the Illiac IV [10]. There are many differences in the design of these two early par-
allel computers and a widely accepted classification of parallel computers is the scheme 
described by Flynn [28]. Under the scheme, there are two major classes of parallel 
CHAPTER lh INTRODUCTION  3 
computers. The first is the multiple instruction-stream multiple data-stream(MIMD) 
machines. Each processor executes its own set of instructions on different pieces of data. 
There is no global synchronization mechanism, and synchronization among the proces-
sors is based on asynchronous messages sent among the processors. The other class is 
the single instruction-stream multiple data-stream (SIMD) machines. Synchronization 
in SIMD machines exists implicitly as each processor executes the same instruction at 
the same time, but on different piece of data. Each processor can execute or ignore an 
instruction based on some data dependent flags. There is no clear evidence on which 
system is better. MIMD machines are most suitable for problems that can be divided 
into multiple heterogeneous tasks. Though a MIMD machine can emulate a SIMD 
machine, the lack of implicit synchronization requires additional programming for the 
synchronization control. This would lead to severe performance penalties. On the other 
hand, owing to the structure of the SIMD machines, they are very good at some appli-
cations, such as image processing algorithms that involve the same operations on each 
pixel of the image. An example is the calculation of the weight average of a pixel's value 
and its neighboring pixels. In this case, each pixel could be mapped to a processor with 
the operations carried out on all pixels at the same time. 
Though the first generation machines were not successful in attaining supercom-
puter performances, they laid the groundwork for many of the later general purpose 
processors. The second generation of parallel computers, which are also the first gen-
eration of commercial parallel computers, include machines with number of processors 
ranging from 8 to 65536. These include machines from the Thinking Machines Corpo-
ration [34], BBN [6], Intel [71], Sequent [80], etc. However they still did not provide 
supercomputer performance. Further, the difficulty in programming the parallelism 
CHAPTER lh INTRODUCTION  4 
and the lack of substantial I/O have caused many of these machines to be short-lived. 
The third generation of parallel computers includes the Sequent 2000 series [81], the 
Maspar MP-1 [22, 8], etc. Gaining experience from the previous generations, these 
systems focus on the design of the complete system in balancing the system to match 
the processor power. They are achieving mainframe or supercomputer levels of per-
formance. The lack of applications has once been a big hurdle in the acceptance of 
parallel computers. In these latest machines, this is overcome in a certain extent by the 
reuse of existing codes in a sequential computer. In the latest development of parallel 
machines, the CM-5 [88] and the Paragon [38] promise a processing power in the range 
of teraflops. 
It is evident from this brief overview on the commercial availability of parallel com-
puters, that parallel processing becomes a promising and well-received approach of high 
performance computing. 
One of the differences between parallel processing and sequential processing is the 
need of coordination among the processors in parallel processing. Since the multiple 
processors in a parallel machine are working on the same problem, the processors need 
to pass data among themselves. At some point of execution, they need to know the 
states of other processors in order to proceed. There are two aspects in this coordination 
process: communication and synchronization. The following will describe another form 
of parallel processing that has a different approach in achieving coordination among 
the many processors. 
CHAPTER lh INTRODUCTION  5 
1.1.1 Neural Networks 
Although the parallel computers are good at computation intensive problems, they 
cannot provide satisfactory performance in applications like pattern recognition, visual 
data processing, etc. There have been active researches in finding alternative compu-
tational structures dealing with these problems. 
Though parallel processing is not the main motivation of neural network studies [74], 
the neural networks provide a computational architecture with inherent parallelism. 
Inspired from studies in neurosciences, the structure of a neural network resembles 
that of an undirected graph with the nodes being the neurons and edges being the 
synapses. Two popular models of neural network are the feedback and feedforward 
models. In both models, the neurons perform a simple threshold function. Each input 
to the neurons has a weight associated with it. The neuron sums up the products of 
each input and its corresponding weight, then compares the sum with a threshold value. 
The output of the neuron depends on whether the sum is greater than, equal, or smaller 
than a threshold value. There are many variations in the functions that can be used 
in comparing the sum, such as the sigmodal, or the hyperbolic tangent functions. The 
weights associated with each input to a neuron are determined with a learning process. 
Different from the previous parallel computers, each processing element in a neural 
network performs computations that are much simpler than the ones performed by a 
microprocessor in a parallel computer node. Yet the neural network may outperform 
a supercomputer in some tasks. An example is the processing of visual information, 
e.g., object recognition under noisy, distorted data. It is because of the simple function 
required in each processing element that prompts the many developments of VLSI 
architecture for neural networks. However, neural network applications are not as 
CHAPTER lh INTRODUCTION  6 
general as those found in general computers and are usually limited to certain particular 
domains. 
Due to the preciseness, scalability and flexibility offered in digital circuits, there are 
many digital VLSI implementations of neural networks. These implementations can be 
divided into implementations based on multipliers, accumulators and the ones that use 
pulse density modulation. Implementations based on multipliers and accumulators have 
a SIMD structure [60, 4, 99, 25], in which the processors are optimized for fast multiply 
accumulate operations. The size of the processors depends on the desired resolution of 
the computation, which also determines the number of processors that can be placed 
on a chip. The number of processors built on a single chip ranges typically from 8 
to 64. The performance of the network can also be increased with the optimization 
of the network structure. For a fully connected large network, a broadcast bus can 
communicate the state of a processor to the other processors, thus reducing the 10 
bandwidth of the network. Systolic array structures are particularly suitable for sparse 
networks [59]. Another type of digital network uses pulse density modulation [66, 65]. 
This approach reduces the connectivity requirement of the network because the output 
of a neuron is only required to be carried over a single wire. Another advantage is the 
ability of asynchronous operation, which allows a higher cascadability. 
1.2 Parallel Processing in the Continuous-Time Domain 
The problem with the neural network digital implementations is the relatively large 
size of the neurons resulted. This limits the number of neurons that can be placed 
on a chip. Though the cascadability in the chips allows a larger network to be assem-
bled from a number of chips, it still would not be practical for large networks, which 
CHAPTER lh INTRODUCTION  7 
require interconnecting a large number of chips. As the required functions in the pro-
cessors are simple and observing that they could be implemented in a compact form 
with analog circuits, there are many researches in the analog implementation of neu-
ral networks [52, 58, 31]. There had been early attempts in using analog circuits for 
parallel processing. The special purpose electronic analog computers developed during 
the Second World War [39] provided a framework for the construction of the general 
purpose analog computers. One of the most important development during that time 
was the operational amplifier. The high gain, high input impedance and low output 
impedance characteristics of the operational amplifiers allow analog computers with an 
acceptable accuracy to be built. In the following years, there had been a lot of develop-
ments in analog computation. General purpose analog computers with 20 amplifiers to 
1000 amplifiers were built. Though it is not the key objective, these analog computers 
are early attempts in parallel processing in the continuous-time domain. With basic 
components like operational amplifiers, potentiometers, capacitors, etc., various build-
ing blocks such as integrators, multipliers, summers can be built. The building blocks 
are then connected into a network to solve problems. The interconnection pattern is 
determined by the particular problem to be solved. Figure 1.1 shows a configuration in 
solving the Van Der Val equation. 
x-e(l-x2)x + x = 0 
Function blocks can also be connected to solve different parts of the problem at the 
same time. Figure 1.2 is the analog computer in solving the following two simultaneous 
equations. 
CHAPTER lh INTRODUCTION  8 
© _ M u l t i p l i e r 
Figure 1.1: Analog computer for solving the Van der Val equation 
：：,,,. i + 3 冗 + 二 
y + 2y + 9y-bx= 0 
Analog computers were once widely used in solving systems of differential equations, or 
in simulating physical systems. However, because of the flexibility and accuracy pro-
r @ - n 
1 
Figure 1.2: Solving simultaneous differential equations with an analog computer 
vided from the advances in digital computing, there is not much continued development 
in using analog computers for general purpose computing. 
With the recent increased interests in neural networks, there are active develop-
ments in analog computations that are based on the neural network architecture [87, 
97, 24, 42, 15]. The analog networks have been applied in specific real-time applica-
tions, such as classification [75], optimization problems [105], image processing [43，54], 
motion detection [37], etc. As the processing elements, neurons, in a neural network 
are to perform some simple operations, they can be built in a compact form with ana-
CHAPTER lh INTRODUCTION  9 
log circuits. The implementations can be classified according to the connectivity of 
the neurons. In a network with highly interconnected neurons [64, 9, 77], the neurons 
are typically required to perform a multiply-accumulate operation on the inputs from 
the links and a nonlinear transformation on the sum as the neuron output. The ac-
cumulation is done by current or charge summing which is an easy and compact way 
to sum a large number of inputs. An analog multiplier is also more compact than a 
digital multiplier. In these highly interconnected neuron networks, a large chip area is 
reserved for routing the interconnections. In contrary to these designs, the second class 
of networks have neurons with a low connectivity. The neurons in these networks are 
connected to a few nearest neighbors only. The pioneering circuits are those from [57], 
where the designs are usually close imitation of biological circuits [50, 56]. 
Comparing with the digital networks, analog networks are usually more compact, 
and operate with a higher speed (digital operations spread over many time steps). 
However, there are shortcomings in an analog network. In an analog computing net-
work, the variables are represented by physical quantities like node voltages, currents 
in the system and the functions implemented in the processing elements are based on 
the physics of the devices realized. The performance of the network is then limited 
by the process parameter variations. These variations also limit the precision and dy-
namic range that can be achieved. Another disadvantage is that as the interconnection 
is physically fixed, it is generally not easy to have a scalable network. Despite these 
disadvantages, the analog networks find great promises in real-time applications that 
require moderate precision in computations. 
CHAPTER lh INTRODUCTION  10 
1J3 Binary Relation Inference Network 
In the analog neural network implementations, weights in the links have to be de-
termined with a learning process. This requires a weight storage mechanism to be 
implemented for each neuron. Binary relation inference network, also a connectionist 
type network, provides a parallel processing architecture for optimization problems. 
The weight of each link in an inference network does not need to be determined with a 
learning process. The topology of the binary relation inference network, the operations 
required in the computational elements and the weight of the links can be defined by the 
problem to be solved. The network has been applied in solving shortest path problems, 
transitive closure problems and assignment problems. The Inference network can be im-
plemented under various schemes like synchronous, asynchronous discrete-time domain 
operations and continuous-time domain operations. The simple operations involved in 
each processing element and regular connections allow implementations with simple 
hardware. In the following, continuous-time domain implementations of the binary re-
lation inference network will be studied. Problems involved in the implementations are 
discussed and solutions are suggested. 
Chapter 2 introduces the architecture of the binary relation inference network and 
its application in solving the shortest path problem. 
Chapter 3 describes a hardware prototype in solving the shortest path problem and 
discusses factors which limit the network performance. 
Chapter 4 presents VLSI circuits in building an inference network, which are nec-
essary in building networks to solve large scale problems. 
Chapter 5 discusses the chip and system aspects in building a large scale network. 
Chapter 6 describes one of the applications that would benefit from the chip built, 
CHAPTER lh INTRODUCTION  11 
namely, the inverse shortest paths problem. 
Chapter 7 describes other applications of the binary relation inference network. 
The shortest path inference network is also shown to be able to solve transitive closure 
problems; alternative circuit implementations are also discussed. The concepts of closed 
semiring are then introduced to provide a systematic and elegant representation for 
problems solved on the inference network. 
Chapter 8 is a conclusion on implementations of the binary relation inference net-
work. 
Appendix A gives details of the inference network prototype circuits and the neces-
sary interface circuits for the inference network to operate under the control of a host 
computer, the PC. 
Appendix B describes the CAD tools used in the simulation and layout of the 
circuits. 
Appendix C and D discuss the involved algorithms in the inference network appli-
cations that are described in the previous chapters. 
Chapter 2 
Binary Relation Inference 
Network 
The binary relation inference network has shown to be a powerful computing plat-
form for many constrained optimization problems related with graphs and network 
flows [85, 48]. The inherent parallelism, coupled with the asynchronous operating abil-
ity, allows the inference network to be implemented in the continuous-time domain, 
with all processing elements operating at the same time. This promises a potential 
speedup in solving problems. In the following, the architecture of the binary rela-
tion inference network would be reviewed and a particular application of the inference 
network, solving the shortest path problem, will also be discussed. 
2.1 Binary Relation Inference Network 
An AT-ary relation, Rn(xi,丨2，• • •，丨iv), is a relation that is defined on N distinct objects 
x i ,x2 , •:、.，xn- Instead of a single relation, R n can also be expressed in terms of a set 
12 
{：；' CHAPTER 3. A BINARY RELATION INFERENCE NETWORK PROTOTYPED 
of related binary r e l a t i o n s , X j ) , i J = 1,2, . ..,iV, together with a set of inference 
rules operating on the binary relations. In this way, a complex iV-ary relation is replaced 
with a set of simple relations, binary relations that are defined on 2 objects. As an 
example, consider a graph with N vertices. A connectivity relation Cn{vu 幻2,...,  vn) 
is to be defined to describe how vertices in the graph are interconnected. Instead of a 
single relation the connectivity can also be expressed by 
1. a family of relations C 2 ( 巧 , V j ) describing the connectivity between two vertices Vi 
and Vj in the graph and 
2. the inference rule 
C2(v“ Vj) = C2(p“ vk) & C2、vk, Vj) 
which indicates that {viyvj) is connected if {vi,vk) and (vk,vj) are connected 
pairs. 
Generally, there are a number of possible type of inference rules. Unary inference rules 
operate on a single relation like 
R2{xi,xj) = -R2(xj,Xi) 
Binary inference rules like 
R2(xi,Xj) = R 2 ( x i , xk) + R2(xk, Xj) 
operate on two relations, unifying the relations in the operation. For a general 7V-ary 
relation, a binary relation inference network can be defined with a set of binary inference 
{：；' CHAPTER 3. A BINARY RELATION INFERENCE NETWORK PROTOTYPED 
rules operating on a set of binary relations. The network serves as an alternative in 
defining the iV-ary relation. For each binary relation R2(xi,Xj) in the network there 
would be at most N inferences in defining its value, as described by: 
R2{Xi%Xk),R2{xk,Xj) 4 R2{xi,Xj) (2.1) 
Conflicts among the inference results are resolved to determine the value of R 2 ( x i , X j ) . 
2.1.1 Network Structure 
A binary relation inference network is formed by the interconnection of self-contained 
computational elements. There are sites and unit, the functional units, in each com-
putational element. Computations are to be performed at the sites and unit of each 
computational element. Computation results are communicated among the computa-
tional elements through links connecting the unit output to sites in other computational 
elements. Each computational element, denoted by represents the binary rela-
tion R2(xi,Xj) defined between objects Xi and Xj. Generally, there are N2 elements in 
a network defined for N objects. Figure 2.1 shows the structure of a computational 





Figure 2.1: Interconnection in a general binary relation inference network 
element and the interconnection in the network. In each element, there are N sites to 
{：；' CHAPTER 3. A BINARY RELATION INFERENCE NETWORK PROTOTYPED 
carry out the inference operations as described in Eq. (2.1). The different sites provide 
alternative ways in deriving the value of the corresponding relation. In addition to 
the sites, there is an unit in each element responsible for the inference operation to 
determine the value of the corresponding binary relation. The unit resolves the conflict 
among the site outputs to determine the value of the unit. 
For a general iV-ary relation, there are N2 computational elements in the binary 
relation inference network defined and in each computational element, there are 2N 
outgoing links and 2N incoming links. However, a number of simplifications are possi-
ble. In many problems, the self relations 夠)A > ^ > iV are not defined or do not 
have any effects on the system. The number of elements needed would then be given 
by C ^ . If the network is defined for reflective relations, ^j) = the 
number of computational elements needed would further be halved. Figure 2.2 shows 
a binary relation inference network that is defined for a reflective ternary relation. For 
the simplest network possible, the self relations R2{xil Xi) are assumed to have no effects 
on the system and are left out in the inference network. 
(1,2) 
unit ~ s i t e 
site —*~ unit — site unit 
Figure 2.2: A simple binary relation inference network 
The architecture of the binary relation inference network is particularly suitable for 
constrained optimization problems. The various sites provide a mechanism to determine 
{：；' CHAPTER 3. A BINARY RELATION INFERENCE NETWORK PROTOTYPED 
the value of a relation in different ways while the unit performs the optimizing function 
in determining the value of the relation from the many site outputs. The inference 
network has been applied in solving shortest path problems, transitive closure problems, 
assignment problems, etc. 
There are many possibilities in implementing a binary relation inference network. 
Though a sequential computer implementation is possible, it does not exploit the benefit 
offered in an inference network, i.e., the concurrency provided by the breakdown of a 
single operation into a group of simple operations. The inherent parallelism in the 
network promises a potential speedup in a parallel implementation. 
In parallel implementations, the network can be implemented in either the contin-
uous-time domain or the discrete-time domain. Discrete-time domain implementations 
on general purpose parallel computers provide flexibility and a scalable design through 
programming changes. In continuous-time domain implementations, the lack of general 
purpose continuous-time operating computing elements limits the flexibility of a net-
work. Since the operations in the sites and units are fixed at the time of construction 
and the resources in a computational element have to be physically predefined according 
to the network size, these prevent a larger network to be built from a simple cascading 
of small networks. Despite these limitations, computations in a continuous-time imple-
mentation are carried out by the circuit structures defined, which could result in a fast 
computing engine. 
{：；' CHAPTER 3. A BINARY RELATION INFERENCE NETWORK PROTOTYPED 
2.2 Shortest Path Problem 
2.2.1 Problem Statement 
Given a directed graph, G 二 (V’ A), of n vertices and m arcs, where V = {vuv2,.. • 
is the set of vertices, A 二 {q，a2, • •., am} is the set of arcs. Associated with each arc 
a i is a non-negative cost c“ An arc has a source vertex and a destination vertex, 
Vd{i). A path from vertex to vertex is the set of consecutive arcs connecting 
together the two vertices. A path p is thus given by 
P — { api,…， aPi(P) J 
where vs(pi) 二 vs(p), vd(pi(p)} 二 ” a n d t ;⑷外 )）二 ” • … ） V I < < < l(p), l(p) is the 
number of arcs in the path. The cost of a path is the sum of costs associated with each 
arc in the path. The shortest path problem is to find a path from the source vertex to 
the destination vertex with the minimum cost. An all pair shortest path problem is to 
find, for each ordered pair of vertices, a path with the minimum cost. 
The shortest path problem finds its existence in many applications. Direct applica-
tions include finding the shortest route to travel in a traffic network or a communica-
tions network. Other applications include compaction in VLSI layout [53], verification 
of constraint satisfaction of interface circuits in microprocessor designs [11], etc. 
Being a well-studied problem, there are many well established algorithms in solv-
ing the shortest path problem. In sequential algorithms, there are Dijkstra's algo-
rithm [23], which finds the shortest path between a given pair of vertices. Yen [104], 
Spira [82], and Moffat [63] find all the shortest paths originating from a particular ver-
tex. Yang [103], Farley [26], Floyd [27], Dantzig [17] are algorithms for the all pairs 
{：；' CHAPTER 3. A BINARY RELATION INFERENCE NETWORK PROTOTYPED 
shortest path problem. In parallel algorithms, there are the systolic algorithms by 
Rote [73] and Robert [72]. The solution times of these algorithms exhibit dependence 
on the size of the problem [85]. 
2.2.2 A Binary Relation Inference Network Solution 
The shortest path problem can be solved on a binary relation inference network with 
sites and unit in each computational element (CE) properly defined. Each CE ( i j ) 
computes the shortest path length between ^ and Vj. The number of CE needed is 
C^, as the diagonal C瑰 which correspond to the relations i?2(之 i = 1，. •.，n are not 
needed. Each site k in CE (i,j) will compute the cost of the path from Vi to Vj passing 
through vk, which is the sum of the two incoming signals (i,k) and (kj) at each site. 
To have the output of the CE representing the minimum cost path, the unit selects the 
minimum from the site outputs. For convergence reasons [47], the C五 output has to be 
considered in the unit function, as a form of feedback. Denoting the unit function in 
CE ( i j ) by g{ij), the site functions by S]^ then the operations required in the sites 
and units are 
Sl j = 9(i,k) + g(kJ) (22) 
g{hj) = rnmk{Sl
J,g(iJ)} 
where k 二 1,2,.. 孕 i,k 妾 j. 
From the principle of dynamic programming, all constituent paths in an optimal 
path are optimal. The local optimization at each CE'm the inference network will then 
ensure the global, optimality of the solution when the network stabilizes. As outlined 
in [85], a measure could be devised to indicate the distance from the global minimum. 
{：；' CHAPTER 3. A BINARY RELATION INFERENCE NETWORK PROTOTYPED 
Let 
I： i + 3 
i 3 
I E(t) = EEte(^' i) - mm{gt(iJ),S^}}2 i ^ j 
i 3  1 
where gt{ij) is the value of g(ij) at time t， 
At the global minimum, E(t) = 0，and since {gt(ij) ~ minV p{^}} 2 > {9t(hj) ~ 
Assuming each CE is a first order system, the dynamics of the CE can then be 
described by 
^ g l i l 二 - K M h j ) + (2.3) 
at  vp 
where A is the open-loop system pole of the CE. 
The convergence rate of the network is then given by 
dE(t) ^sT" 
dt  = W dg八i,j) dt 
^ J 
From Eq. (2.3)， 





< 0 (2.6) 
This indicates that the network will always converge and the rate of change becomes 
zero when E(t) = 0, i.e., when the system reaches the global optimum solution. 
In the following chapters, various implementations of the inference network in elec-
{：；' CHAPTER 3. A BINARY RELATION INFERENCE NETWORK PROTOTYPED 
tronic circuits will be discussed. It will be shown that the solution time of an analog 
operating binary relation inference network is practically independent of the problem 
size. 
Chapter 3 
A Binary Relation Inference 
Network Prototype 
The design of a binary relation inference network in solving the shortest path problem 
is described in 2.2.2. Besides the possible discrete time domain implementations [85], 
the network is also shown to be able to solve the problem in the continuous-time do-
main with the assumption of a first-order system response in the transfer characteristic 
of each computational element. The ability of a continuous-time domain network im-
plementation provides a possibility of building the network with analog circuits, which 
allows a compact realization and promises a great potential speedup in solving opti-
mization problems. In this chapter, a prototype of the inference network in solving the 
shortest path problem is described. The prototype is built with commercial available 
components and serves as a vehicle in studying the behavior of the inference network. 
21 
CHAPTER 3 A BINARY RELATION INFERENCE NETWORK PROTOTYPE22 
3.1 The Prototype 
The major difference between the inference network based and neural network based 
optimization networks is that, the weights of links in an inference network are fixed 
and do not have to be determined with a learning process. This simplifies the circuits 
in building the inference network. For simplicity reasons, the prototype will implement 
a small inference network, which has been attempted previously in [101]. It will solve 
shortest path problems that are defined on a 5-node graph with undirected arcs. As 
described previously, the inference network is formed by the interconnection of self-
contained computational elements (CE). The operations of the network are defined by 
the sites and unit residing in the CEs. 
3.1.1 The Network 
For the inference network to solve the shortest path problem, each CE in the network 
is to determine the shortest path between a vertex pair. For a 5-node undirected 
graph, there would be 10 distinct vertex pairs. As a result, there are 10 CEs in the 
inference network prototype. In each CE, there are 6 input links and 6 output links. 
A bus configuration, Figure 3.1, is selected to connect the CEs. The bus configuration 
reduces the number of links in each C五 from 12 to 7 and provides a regular connection 
in building the network. Each CE is then built on a separate module to be plugged 
onto the bus. 
3.1.2 Computational Element 
For a 5-node undirected graph inference network, there are 3 sites and 1 unit in each 
CE. The operations in the sites and unit are defined in Eq. (2.2): a two-input adder is 
{：；' CHAPTER 3. A BINARY RELATION INFERENCE NETWORK PROTOTYPED 
(1,2) . . # ^ 
(1,2) * 
(1.3) • 一 
(1.4 ) * * 
(1.5) * 
(2.3) • 
(2.4) — • J 告 
(2.5) — * 
(3.4 ) 
(3.5 ) 
(4,5),, — _ 
Figure 3.1: The bus configuration in the prototype 
required in the sites, while a minimum selecting circuit is needed in the unit. 
The Site 
In the prototype, a typical non-inverting adder is used to add the two incoming signals 
at the site. Figure 3.2 is a schematic of the adder. The output of the adder, Vout, is 
given by 
二 1 ^ ( 巧 1 + % 2 ) 体 1 ) 
where A is the open loop gain of the op-amp. With typical values of A > 10000, 
Vout 
A^AAA/ • sAAAA/ 
〉——A——>Vout 
Vil «AAAA/——•~ + 
Yi2 )> H^ N 
Figure 3.2: A typical non-inverting adder 
The Unit 
The unit needs to find the minimum from the site outputs and the CE output. Fig-
ure 3.3 shows a block diagram of the unit. A bank of comparators is used to locate the 
{：；' CHAPTER 3. A BINARY RELATION INFERENCE NETWORK PROTOTYPED 
minimum. Each comparator in the bank compares a pair of input signals at a time. 
Six comparators are then required to select the minimum from the 4 input signals, 3 
site outputs and a CE output. The comparator outputs form a code to indicate which 
—^  1 
Comparator Bank Multiplexer 
Site A > - ； ^-j— I 
CE. . L i— - — <» * 
； 1J ^ ^ I L - r ^ ^ CEij 
j ； INIT 
I 丨 Select 
_ ; _ 丨 Encoder 
丨 拳 丨 
！ j • 
！ • 
SiteB > • 1 — ^ ^ ^ ^ i 
SiteC > 1 j—- ； 
START > " 
Figure 3.3: Unit function in a CEfor 5-node graph inference network 
signal is the smallest. With the comparator outputs driving an encoder, a binary code 
is obtained to drive a multiplexer to select the corresponding signal as the output. The 
multiplexer output is buffered before driving the network. The binary code from the 
encoder also serves as an indicator to the route taken in a shortest path. 
The problem with this approach is that when the output is the minimum among 
the inputs at the unit, the unit output has to drive itself through the multiplexer, as 
illustrated in Figure 3.4. Any dropouts in the loop will then force the unit output to 
Site A CE output 
SiteB MUX 
Site C 
Select ^ ― 
Figure 3.4: Feedback with no hold circuit 
{：；' CHAPTER 3. A BINARY RELATION INFERENCE NETWORK PROTOTYPED 
decay to zero. To remedy this, a hold circuit is required to maintain the output level. 
Figure 3.5 shows the addition of the hold circuit.(See Appendix A for details of the 
hold circuit.) 
Comparator Bank Multiplexer 
Site A > I ^ ^ ^ ！ , 
CE.. 1— I 丁 ^ 
ij ： ： " 
! I INIT — 
• i • ！ ” ， I I I I 
.拳 丨 參 丨 Encoder 
• i • ； — 
i .； • 
i * 
SiteB > ! — ^ ^ ^ j 
SiteC > ! 
START > 
Figure 3.5: Adding a hold circuit 
Network Operation 
A PC, acting as the host computer, controls the operation of the inference network 
prototype. Appendix A shows details of the interface circuits. A data acquisition I/O 
card in the PC provides the control signals and the analog interface between the analog 
operating inference network and the host computer. S/H circuits are used to store the 
initial CE output voltages, otherwise each CE would require a separate D/A converter. 
Operation of the network is divided into 3 stages. 
1. Network initialization 
The inference network needs to be initialized to obtain the solution of a given 
problem. The output of the CEs are initialized to values corresponding to the 
{：；' CHAPTER 3. A BINARY RELATION INFERENCE NETWORK PROTOTYPED 
cost of the direct path between the corresponding vertex pair. If c is the cost of 
the direct path from nodie i to node j , the output of ( i j ) will be initialized to c. 
For CEs corresponding to vertex pairs with unknown direct path cost, they need 
to be initialized to a value larger than the maximum cost of all shortest paths 
in the network. Since this maximum cost would not be known until the problem 
is solved, a ,worst case value of (N - l)cmax 二 4 c腦们 cmax is the maximum of 
the known costs, is assigned as the initial value for CEs that correspond to vertex 
pairs with unknown costs. Network initialization is under the control of the signal 
START in Figure 3.5. When START is at logic low, the minimum selecting circuit 
is disconnected from the unit output and with the unit output forced to the 
initialization voltage I NIT. 
2. Network converging 
When all the CEs are initialized, the network is ready to find the solution. When 
START is driven to logic high, the minimum selecting circuits are connected to the 
network. Interaction among the CEs will drive the network to a stabilized state 
with the output of the CEs representing the cost of the shortest path between the 
corresponding vertex pairs. 
3. Getting the results 
After all CEs in the network settled, the cost of the shortest path between a 
vertex pair is given by the output of the corresponding CE. The route taken for 
a shortest path has to be traced from the binary code output in the unit, as the 
binary code will only indicate one of the intermediate node in the shortest path. 
CHAPTER 3. A BINARY RELATION INFERENCE NETWORK PROTOTYPE27 
3.1.3 Network Response Time 
When the inference network begins converging from the initialized state, a step change 
would appear in the CE outputs. This step change is caused by START being driven 
from logic low to logic high, forcing the CE output to change from the initialization 
voltage INIT to the output of the minimum selecting unit. The time necessary for 
the CEs to settle is determined by the maximum current available in discharging the 
capacitances present in the circuit nodes. The critical signal path in a CE is shown as 
bold lines in Figure 3.5. The dominant factor in determining the step response would 
be the time needed to discharge the hold capacitor. Let Id be the current driving the 
hold capacitor, Chold be the value of the hold capacitor and ys be the size of the step 
change. The time, r, needed for the CE to settle to the final value is given by, 
T =
 V s C h o l d (3.2) 
Id 
The worst case solution time of the inference network can be determined by assuming 
that each C丑 will only respond when all signals at the input are stable. Let the network 
starts converging at t = 0. At t = r, paths containing 1 arc will be minimized. At 
t = 2r, paths containing at most 2 arcs will be minimized. And 3,t t = Nr, paths 
containing at most 2N~X arcs will be minimized. In a iV-node graph, the maximum 
number of arcs in a path is iV-1 and it would take「log2(iV -1)1 + 1 steps to reach the 
optimum solution. The worst case solution time of an iV-nodes shortest path problem 
will then be given by 
( l+[ log 2 ( iV-l ) l ) r (3-3) 
Figure 3.6 shows the network behavior with the data set in Table 3.1. To cope 
{：；' CHAPTER 3. A BINARY RELATION INFERENCE NETWORK PROTOTYPED 
Table 3.1: A sample data set of a shortest path problem 
Unit Cost Scaled Input Unit Cost Scaled InpuF" 
10 OA (24)~~~^ DT"— 
(1.3) 5 0.2 (2,5) 10 0.4 
(1.4) 15 0.6 (3,4) 2 0.08 
(1.5) oo 1.7 (3,5) oo 1.7 
(2,3) 2 0.08 (4,5) 20 0.8 
1.8 1 —1 1 1 1 1 ‘ 1 " ~~1 
1.6 \ “ 
r . \ .: ^ 〜 . . . 、 . : . . （ : : . 、 . . . • : 、 : 
工.\ V • : v..、:.; 
与 % 
O 、\\ 
> 1 - % -
a % v 、V. -
g o.8 
8 l 一 ― ― _ _ 一 — — — — — — — 0 . 6 r 
\ 
o 4 \V \ 
\ \ \ - - - 鈔-w 
0 I • I 1 1 1 < 1 1 1 
0 1 2 3 4 5 6 7 8 9 10 
Time (ms) 
Figure 3.6: Network response 
with the operating limits of the circuit, the data is scaled before applying to the circuit, 
as indicated in the "Scaled Input" column. The solution time is around 3.2ms. The 
network responses show that the outputs are changing at a maximum rate of about 
20V/s. With the hold capacitor being a 47/xF capacitor, this implies a discharging 
current of 22mA, which matches with the typical sink current of LM324 being 20mA. 
The network response speed is limited by the hold capacitor and the output drive 
capability of the op-amp. 
Considering the worst case solution time, given Id = 22mA, Chold = 47/xF and 
ys = 1.7V，the worst case solution time of the prototype network will be around 10.9ms. 
{：；' CHAPTER 3. A BINARY RELATION INFERENCE NETWORK PROTOTYPED 
In reality, there would be overlaps in the operations and the solution time would be 
much improved over the worst case solution time given by Eq. (3.3). 
3.2 Improving Response 
3.2.1 Removing Feedback 
The response of the network could be improved by reducing the value of the hold 
capacitor or using op-amps with larger output current drive capability. However, it 
is noted that the hold circuit will be necessary only when the unit output is at the 
minimum among the inputs at the unit, and this indicates that the direct path is the 
shortest path (an active site indicates paths with intermediate vertices). The hold 
circuit could be removed by introducing an input at the unit to represent the direct 
path cost between the corresponding vertex pair. The modified unit function is given 
by 
g(iJ) = mm{Si j,init} (3.4) 
Figure 3.7 shows a block diagram of the modified circuit. The INIT input serves 
both as the initial C五 value and an input in the unit function. As shown by the critical 
path in the block diagram, the determining factor in the network response would be 
the slew rate of op-amp. 
With the same data set in Table 3.1 driving the network, the response of the network 
with the modified unit is shown in Figure 3.8. The solution time is around 7.5/^ s which 
is limited by the op-amp large signal response, the slew rate. The slope of the responses 
corresponds to the slew rate of the op-amp used, LM324, which is typically 0.25V//xs. 
{：；' CHAPTER 3. A BINARY RELATION INFERENCE NETWORK PROTOTYPED 
Comparator Bank Multiplexer 
Site A > i 
丽 』 - L — = p - J ^ T - ^ . . 
i ) •“ ij 
• j • I ~ ~ r - T — 
_ 丨 參 Encoder 
• ： 參 丨 
、，5«. I I 
丨 • 
SiteB > ^ ^ ^ j • 
SiteC > 1 — ^ ^ ^ j 
START > 
Figure 3.7: Removing feedback in the unit circuit 
3.2.2 Selecting Minimum with Diodes 
The drawback of the previous implementations is the large component count. The 
number of comparators required in each unit is (iV-l)(iV-2)/2, where N is the number 
of nodes in the graph with the shortest path problem defined. With the modified circuit 
in Figure 3.7, Table 3.2 shows the required op-amps and comparators in each unit as the 
number of nodes increases. The complexity lies in the implementation of the minimum 
Table 3.2: Number of op-amps and comparators required in each CE 
No. of nodes No. of op-amps No. of comparators 
3 2 2 ™ 
4 3 3 
5 4 6 
6 5 10 
• • • 
N N-l 
selecting circuit, which requires a comparator to compare each pair of inputs. An 
alternative implementation of the minimum selecting circuit is shown in Figure 3.9. 
I CHAPTER 3 A BINARY RELATION INFERENCE NETWORK PROTOTYPED! 
1.8 1 1 r 1 1 1 * 1 1 




邑 ” 、V _ 
S XV . 
g o-B ^ s ：、 \ 
8 、•、.、’�: 一 一 
0.6，、 、\ \ 、 I 
、- \、 、、.’、-
0 4 •••-.-.-；• : : : : 
^ 、、\ …-
0.2 丄 “ 一 〜 : : : :： : : : : :二二 
0 | | I I I 1— 1 1
 1 
0 1 2 3 4 5 6 7 8 9 10 Time (|is) 
Figure 3.8: Network response with modified units 
This circuit uses the blocking behavior of diodes to replace the comparators, encoder 
and multiplexer in finding a minimum. 
The following describes the circuit operation. First, consider only op-amps that 
are responsible for the site function, Uk,k = 1,.. .,n,k 乒 i,k 妾 j and assuming SW1 
is open. The voltage level at the positive input of the op-amps is given by Vk+ 二 
^ykl^Vk2)/2 and the voltage level at the negative input is given by Vmin/2. For op-amps 
with inputs Vki+Vk2 > Vmin, Vk+ > Vk- and Yk will be driven to the maximum output 
level allowed due to the large open loop gain. For op-amps with inputs < Vmin, 
< Vk- and Yk will be driven towards the negative rail. When Yk falls below Vmin, 
Dk becomes forward biased and closes the feedback loop. Yk will stay at Vk+Vj^ —Vdf， 
where VDF is the forward drop of Dk. As a result, Vmin 二 min^VH + T42). With SW1 
open, the unit output is given by the initialization voltage Vinit. 
When switch SW1 closes, similar argument applies except that there is no summing 
action at C7j, and Vout = Vmin = minfc(Vk + 142, Vinu). 
{：；' CHAPTER 3. A BINARY RELATION INFERENCE NETWORK PROTOTYPED 
A 
100k 1 
, 100k ^ ^ W 
("AA/V—^^ WV/ > v— 
_ 100k 100k 
“ 
100k 
I I 气 ) Y k 
r—AAA^J—VNA 
- ± r 100k 100k 、 - . _ I 个 
START N \ / \ Li0k 
U| 等 
•NIT )> I C E 。 〈 _ 
H 
Figure 3.9: Units using diodes to find minimum 
Using comparators to detect if the op-amp output is at the maximum output level 
provides a code to indicate the contributing site in a shortest path. Figure 3.10 shows 
the circuit in finding the route of a shortest path. 




+ > ~ 1 — ^ j ? yyI J： < D o 
10K < 74H"44 / DK 
f > > < D' 
丄 10k > > -jg- A1 Y1 - i -
— " f " f j f " AZ Y2 -§—1 
1 74HC244 
V k / ^^1339 
Figure 3.10: Finding the active site in a CE 
The network response is similar to the response in Figure 3.8, which is also limited 
by the op-amp large signal response. 
I CHAPTER 3 A BINARY RELATION INFERENCE NETWORK PROTOTYPED 
3.3 Speeding Up the Network Response 
There are a number of further possible speedups, all aimed at reducing the time spent 
in the transitions.. 
1. Reducing the transit ranges 
The transit” range can be reduced by limiting the output level of the non-con-
tributing sites. In the circuit of Figure 3.9, when sites are not active, the corre-
sponding op-amp saturates at the maximum output level. This can be avoided 
by adding a diode as shown in Figure 3.11. For non-contributing sites, Dk2 will 
• 
100k 1 
, 100k ^ 
r - A A A ^ - h ~ ~ \ 4 ~ I > vmi" 




-1咖 I ^ A , ~ 1 • 
\ l00k _\ A 
START ^ \ 
； 1 钤 
r-VV^ -1 0， I , 
100k I ^ T I d»2 
- A A A r — 1 w \ / ~ \Y. __ 100k lobk 7" 
Figure 3.11: Limiting output level of inactive sites 
be forward biased and keeps the feedback loop closed. The output of the op-amp, 
Yk will stay at Vki + Vki 一 Vdf- Figure 3.12 shows a simulation of the network 
response of the network with the modified circuit. The solution time is around 
6.5/^s. 
2. Reducing the node operating ranges 
{：；' CHAPTER 3. A BINARY RELATION INFERENCE NETWORK PROTOTYPED 
2 r i 1 1 1 1 1 I ‘ 1 
1 . 8 " " 
1.6 - \ “ 
\ 
1 . 4 - \ " 
•u \ 
§ 1.2 - \ _ 
一 \ u \ 
a 1 _ \ 
g \ 
8 0.8 \ \ - “ 
乜.6 、 \ ^ ： 
\ ,�. •“•_•__ 
\ 
0.4 ^ \ ^ 
\ - _ 
0.2 z 〜 〜 二 
z — 
0 1 I I I I I 1 1 ‘
 1 
0 1 2 3 4 5 6 7 8 9 10 
Time (|ls) 
Figure 3.12: Improved response when limiting output level of inactive sites 
Another way to reduce the transit time is to reduce the unit operating voltage 
by further scaling the voltages representing the costs of the paths. Figure 3.13 
shows the resulting response when the "Scaled Input" in Table3.1 is halved. The 
I —I 1 1 1 1 I 1 1 
1.4 - “ 
1 . 2 - “ 
- 1 - 一 
4J rH 
O 
； o.8 、\ 
8 °-6 Ss ^ ^ 
.、-、,'•-. ,----"、、 
\ — 
0 4 \ \”…7:::::::. , 
0.2 „„----—、 - - - - 二 : : ： . 
• 、 ’ 、 . 、 - … 一 
二 ••二：二二 • -二 , * ’一—二 一 
0 I I 1 ‘ ‘ 1 1 1 
0 1 2 3 4 5 6 7 8 9 10 
T i m e ( j i s ) 
Figure 3.13: Network response with reduced operating ranges 
solution time is approximately halved. However, with a smaller operating range, 
there will be a high demand on the precision of the circuit components. 
{：；' CHAPTER 3. A BINARY RELATION INFERENCE NETWORK PROTOTYPED 
3. Larger driving capability 
The transit time can also be reduced by driving the nodes with a larger output 
current. Figure 3.14 shows the improved network response when the original 
op-amp in the network is replaced with an op-amp with larger output driving 
capability. The solution time is approximately 5/xs. 
2 I 1 
1 . 8 - ‘ 
1 . 6 - “ 
1.4 - “ 
^ !• - — - - *一 勹 I -
^ ！“2 * 一 一 ……一 
一 1 
| 
I °-8 一 , ： 
0.6 - ！ \[ 二 ^ ^ 
0.4 -
 ：： 
0 . 2 - 、 
° 0 5 10 15 
T i m e ( [ i s ) 
Figure 3.14: Simulated network response with OP-27 op-amps used 
3.4 Conclusion 
The prototype developed demonstrates the feasibility of implementing a binary relation 
inference network with analog circuits to solve shortest path problems. Factors in lim-
iting the performance of the network are identified. As noted in the network responses, 
the performance is limited by the large signal response of the components used in the 
circuits. 
Chapter 4 
VLSI Building Blocks 
An analog circuit based binary relation inference network is shown to be able to solve 
shortest path problems. Although the worst case solution time is shown to be of 
0(log2 N) for a iV-sized problem, the inherent parallel operation of the network promises 
a practical size independence. However, the circuit employed in building the prototype 
would not be appropriate for practical problems. A more compact structure is required 
for building large networks. 
Advances in VLSI circuits and structures provide a medium for implementing large 
scale structures and there have been reports in using analog VLSI circuit techniques in 
solving optimization problems [24, 14, 36]. Most of these approaches are neural network 
based and programmability in the circuit structures is required for the desired learning 
capability. This requirement complicates the design of the circuits. In contrast, the 
weights of the links in a binary relation inference network is defined by the problem 
definition, a learning process is not required in applying the inference network to solve 
problems: The simple operations involved in the computational elements, together 
with the regular interconnection pattern render the inference network very suitable for 
36 
； CHAPTER 4. VLSI BUILDING BLOCKS  37 
a VLSI implementation. The following describes the function blocks that are necessary 
in a VLSI implementation of the inference network in solving shortest path problems. 
\ 4.1 The Site 
A number of possible configurations are possible to define an adder in the site function 
as described in Eq. (2.2). Signals appeared in the form of currents can be added by sim-
ply connecting the signals together. However, current mode operations would require 
additional circuits for voltage to current conversions, as most signals and references are 
more readily available in the form of voltages. A more convenient configuration would 
have the signals manipulated in voltage form. Another desirable feature of the adder 
is to have a size as small as possible, since this would allow a larger network to be 
implemented. 
Operational amplifiers are widely used in analog circuits. The high gain of the 
amplifier, when combined with a negative feedback configuration, could reduce the 
adverse effects caused by device imperfections or mismatches. Though simple, the 
operational transconductance amplifier (OTA) [3, 57] in Figure 4.1 provides sufficient 
gain as a building block in building the binary relation inference network. To a first 
V J J 
V b i a s > — ^ [ m S 
V + > V M 3 “ M4^][o < V -
丨I—— 丨I >VOUt 
M i l l ~ ~ | [ M 2 
• Vee>— 1 
Figure 4.1: An operational transconductance amplifier 
approximation, an unloaded OTA can be considered as an op-amp with a voltage gain, 
； CHAPTER 4. VLSI BUILDING BLOCKS  38 
A, given by 
一 9md 
9d2 + 9d4 
where gmd Is the transconductance of the differential pair, and gdi is the conductance 
of Mi. 
Unlike a discrete op-amp, resistive divider networks are not suitable for configuring 
the OTA into an adder as the small current drive capability and high output resistance 
of the OTA require a large resistive value to be used in the divider network. The low 
resistivity of the material in the VLSI process would demand a large area in implement-
ing the resistive devices, defeating the purpose of selecting the OTA for its small size. 
An alternative is to use switched-capacitor circuits [32], but the speed of the adder is 
then limited by the frequency of the clock driving the circuit. 
Adapted from a switched-capacitor design, Figure 4.2 shows the circuit of the adder 
that allows fast operation with capacitive dividers. Switches are used to initialize the 
h 1 
v . ^  I j —  ——— — —~«• I 
C, VbiasHL I 4""J \ 
V2 ^ U h - V / \..y—1 • 
I VI 1 ^ ‘ • 
T^r VeeH 0 1 2 3 4 
“ Time(ms) 
Figure 4.2: An OTA based adder Figure 4.3: Simulated adder response 
charges on the capacitors and (f>2 are non-overlapping logic signals. At initialization, 
如 is at logic high and (j>2 is at logic low, charges on the capacitors are removed. 
then switches to logic low while (f>2 stays at logic low. Following this, <j)2 then rises to 
a logic high with _ stays at logic low. If the rate of change of signals at Vi and V2 is 
； CHAPTER 4. VLSI BUILDING BLOCKS  39 
much slower than that of 沴2, Vi and V2 can be assumed to be stationary when <t>2 rises 
to a logic high. Because of the ,conservation of charges, 
I {vx - y+)Ci = (y+ - v2)c2 
Solving for 
— G+Q (4.1) 
=(Fi + V2)/2 for Ci = C2 = C 
As a result, there is a step change of at V+. The output of the amplifier is given 
by the product of the gain of the amplifier A and the difference in the inputs. 
I U 二 肌 ― K ) 
= - ^ f ) 
Solving for Vout, 
V o u t 二 辦 (4.2) 
=V1 + V2 
for A 》 2 . Figure 4.3 shows a simulation of the circuit response.1 
As noted in both Eq. (4.1) and Eq. (4.2), the accuracy of the adder depends on the 
accuracy of the capacitor ratios; and this depends on the value of parasitic capacitances 
at the V+ and V_ nodes. To improve the accuracy, it is needed to reduce the parasitic 
capacitances in the layout and C has to be large when compared with the parasitic 
capacitances. In the current design, C is 330fF while the parasitic capacitances are 
1 Simulation results are obtained from SPICE simulations with device parameters obtained from a 
typical MOSIS 2Aim CMOS process run. {W/L)NMOS = {W/L)PMOS = 1 . 5， ( W / L ) S W I T C H = 2. 
； CHAPTER 4. VLSI BUILDING BLOCKS  40 
around 20fF. The error could be further reduced by trimming the capacitor sizes for a 
first order compensation. 
The speed of the adder is determined by the time required to charge the capacitors 
when (j>2 rises to logic high. In a iV-node network, there would be 2(N 一 2) inputs 
connected to each CE output, which is the buffered unit output. Let IMAX be the 
maximum outpirt current of the buffered unit output, and VMAX be the maximum 
voltage level assigned to the costs. The time required then is 
I {n_2)^MAX 
V 1M AX 
As an example, let N = 6,VMAX = 2V,C =330fF and IMAX 二 80nA, the propagation 
delay in the adder would be 33ns. For large networks a larger IMAX is required to 
reduce the worst case response time of the adder. 
4.2 The Unit 
4.2.1 A Minimum Finding Circuit 
A minimum selecting function block is required in the unit output function (Eq. (3.4)). 
Several circuits are reported in finding the maximum or the minimum from a group of 
signals. Many of these circuits are based on the Winner-Take-All(WTA) network [51]. 
In [13, 83], the circuits aim to improve the precision and speed of the WTA circuit. 
These circuits find the maximum and work with current mode signals. In [41], the 
WTA circuit is modified to find the minimum. The implementation in [33] also finds 
the minimum but works with voltage signals. However, the outputs of these circuits only 
serve as an indicator in identifying the signal that is at the minimum or the maximum, 
J ‘ 
； CHAPTER 4. VLSI BUILDING BLOCKS  41 
they do not preserve the magnitude of signals in the output. A magnitude preserving 
霧 
1 minimum circuit block can be built with these circuits by using the circuit outputs to 
！ drive a multiplexer, or a switch array which selects the source that is at the minimum 
_、 • 
to be the output of the magnitude preserving minimum block. A block diagram with 
this approach is shown in Figure 4.4. The problem with this approach is that there 
would be losses when signals pass through the multiplexer. 




J 1 ^ ^ V o u t 
Switdi Array 
Figure 4.4: A WTA-based magnitude preserving minimizer 
Some more compact magnitude preserving minimum selecting circuits are found 
in [76, 78, 29], but they work with current mode signals. 
Figure 4.5 shows the circuit of a 2-input magnitude preserving minimum selecting 
block. For more inputs, the dotted region is repeated for each additional input. This 
circuit is an adaptation from the standard-IC circuit in [89]. Operation of circuit is 
similar to the one built with standard-IC circuit, in which the blocking behavior of 
diode connected transistors is used to find the minimum. Consider a typical block as 
enclosed in the dotted region. If > V M I N , the amplifier output YI will be driven 
near to the positive rail because of the large open loop gain ( A is reverse biased). If 
Vi < V M I N , will be driven towards the negative rail. When YI falls below VMIN, Di 
becomes forward biased, closing the feedback loop and yi will be given by 
YI = A(VI 一 VMIN) 
； CHAPTER 4. VLSI BUILDING BLOCKS  42 
I 1 
Y d d 
J V b i a s > — J 
I T .foil 
V If171 " ^ J 11<vob ^ i ！ 
r - • r11 Time(ms) 
’ T 
y e e > 1 
Figure 4.6: Simulation result of the min-
Figure 4.5: A minimum selecting circuit imum selecting circuit 
where A is the gain of the OTA, and 
VMIN = Vi + V d f 
where VDF is the forward drop on A . Solving for Y“ 
—A(VJ - VDF) 
V i 二 A + l 
From Eq. (4.3), 
AVI + VDF 
I V M I N = A + l 
and VMIN « VI for A > 1. As a result, VMIN 二 min^ Figure 4.6 shows a simulation 
of the response in a circuit with 2 inputs. The error in the output is given by 
V d f ~ Vi (4.4) 
A + l 、 ; 
For a particular A, this error can be reduced by minimizing VDF- The terminal behavior 
； CHAPTER 4. VLSI BUILDING BLOCKS  43 
of a drain-gate connected transistor is given by [3] 
、_ {vGS - VT)2, VGS > VT 
ID= \ ( J 
0, VGS < VT 
< 
where VT is the threshold voltage, given by 
VT = VTO + I[\P\M + VSB ~ 
and 
VTO = threshold voltage at VSB = 0 
j = bulk threshold parameter 
(j)F 二 strong inversion surface potential 
V S B 二 voltage difference between source and bulk 
From Eq. (4.5) and Eq. (4.6), a smaller vDF - vGs can be accomplished by forcing 
VSB = vs - VEE small by using the largest VEE possible. When compared with an 
NMOS differential pair OTA, a PMOS differential pair has a smaller gain but it allows 
output level nearer VEE. AS a result, a larger VEE could be used in a PMOS differential 
pair OTA. VEE needs to be negative to allow an output level of 0V at VMIN and is -IV 
in the present implementation. 
The step response of the circuit is determined by the time needed to charge the 
parasitic capacitances at the node VMIN- This is given by (N - 1)CMIN/IOTA in a 
iV-object network, where IOTA is the bias current of an OTA, CM IN is the parasitic 
capacitance at the output of a typical block (dotted area in Figure 4.5). For a 6-object 
network with IQTA 二 80nA and CMIN = 50fF, the response time is less than 4ns. 
； CHAPTER 4. VLSI BUILDING BLOCKS  44 
4.2.2 A Tri-state Comparator 
To determine the route of a shortest path, it is needed to find the active site in each 
unit. This can be determined by checking if the corresponding op-amp is saturated, as 
a saturated op-amp indicates that the site is not active. There are iV — 2 signals in each 
u n i t to be fetched into the host computer to determine the route of the shortest path. 
A bus configuration could reduce the number of connections between the host computer 
and the inference network from N ( N - 1 ) ( N - 2) to (iV - 2) + iV(iV - 1). To facilitate 
reading the route information in a bus configuration, tri-state capability in the signals 
is desired. Adding a tri-state buffer after the comparator would incur much area for 
the tri-state buffer. A more compact configuration is shown in Figure 4.7, which adds 
tri-state control to a two-stage comparator [3]. 
v d d V d 
Vbias)——~c]〔M5 o|[ 5 1 1 1 
1__|-M| M6 4 I \ 
丨 j1 vout f j I 
i r f - I - j ！ . 
M1lM~l[M2 °r.—r—.—— . . 1 
r -1 0 1 2 3 4 5 
I • j 一 Vin(Volt) 
Figure 4.8: Simulated transfer curve, 
Figure 4.7: A tri-state comparator Vref 二 
When RD is at logic high, M1-M7 form a two stage comparator. Figure 4.8 shows 
the simulated transfer curve of the comparator. With RD at logic low, both M6 and M7 
will be turned off, resulting in a high impedance state at VOUT. 
The output yi of an inactive site in the minimum finding circuit is given by yi = 
A(VI - VMIN)- If a fixed detecting level, VREF, is employed in the comparators, the 
CHAPTER 4. VLSI BUILDING BLOCKS  45 
I , • 
minimum difference required between the minimum input V M I N and the other inputs 
is given by 
m i n ( 灼 - V M I N ) > VREF/A 
where A is the voltage gain of the OTA used in the minimum finding circuit. With 
yREF = 3y and A = 100, the minimum difference required is 30mV. This could be 
improved by setting VREF to ” M I N , the minimum difference required will then be 
limited by the offset in the comparator which is typically less than lmV. 
4.3 The Computational Element 
With the building blocks defined, Figure 4.9 shows the circuit of a typical CE in the 
network. For clarity purpose, the route extracting comparators are not shown. Be-
( 厂 厂 丨 
Vbias>—0|f Vb ias>—<j[ 
1 in° rA n W T Wl^ -1 
• f I •• I y To comparator input 
1 4 . V e e > ^ Vee>- I 
I 伞 〉 i — — I > To comparator input 
v 广 a u j . 
Vbia^HC ^ " H L \_Jf 
o ~ ^ ——H 么 H ^ n H h m 
in2> yJ J W J ^ j ^ 1 卜袖 
*,> 
T T 
Vb ias>-o|r Vbias>—0|f 
ioit>Hr^ 1 i 1—p^。ut 
Vr 
V c c > J V e e V 
> To comparator input 
Figure 4.9: Typical unit in the network 
sides the site and unit output functions, circuits for buffering the unit output and the 
； CHAPTER 4. VLSI BUILDING BLOCKS  46 
• 瞧 S 愿 悬 響 暴 暴 & 赫 夢 
謹 纖 — — 會 _ 
• 邏 ： 二 等 藥 驗 麵 職 機 総 激 灘 激 職 激 觀 觀 濃 I 圍 • 
漏 ： ； 1 i n s 
n i i ( i s 
……； 、- i^ E^Ss^® 
Figure 4.10: Chip layout 
initialization voltage are included. A prototype chip has been designed to test the 
performances of the circuits. 
4.3.1 Network Performances 
Figure 4.10 shows the layout of the prototype chip that has been sent for fabrication. 
A 6-object CE and a 3-object network are being implemented on the chip. 
The worst case propagation delay of a unit, T u n i t is given by the sum of delays in the 
adder, the minimum finding circuit and the output buffer. For a 6-object network, run i t 
is 37ns. To have an estimate on the solution time, assume that each unit will start to 
respond only when all inputs at the unit are stabilized. Let the circuit starts operation 
at 亡二 0，at i 二 Tunit, paths containing 1 arc will be minimized. At 亡二 2runit, paths 
containing at most. 2 arcs will be minimized. And at 亡 = N r u n i t ) paths containing at 
most 2n~x arcs will be minimized. 
In a TV-node graph, the maximum number of arcs in a path is N - 1 and it would 
； CHAPTER 4. VLSI BUILDING BLOCKS  47 
take「log2(iV 一 1)] + 1 steps to reach the optimum solution. The worst case solution 
time of an iV-nodes shortest pa?th problem will then be given by 
I … ( l + r i o g 2 ( i V - l ) l ) w (4.7) 
The solution time for a 6-object problem would be 148ns. In reality, there would be 
overlaps in the operations and the solution time would be much improved over the 
worst case solution time given by Eq. (4.7). 
4A Discussion 
Analog VLSI circuits in implementing a binary relation inference network to solve 
shortest path problems are presented. The circuits presented allow practical problems 
to be solved with the inference network. With the circuits, it is able to implement a 
6-node inference network in an area of 2mm x2mm. 
Operations of the circuits are verified with SPICE simulations. Errors in the circuits 
have been reduced by employing the OTA in a negative feedback configuration. As 
noted in Eq. (4.2) and Eq. (4.3), with A > 100, the error in the adder will be less than 
2% and the error in the minimum selecting block will be less than 5mV. 
urnmm^ 




A VLSI Chip 
With the computational elements defined, it is needed to define the interconnections 
I 
among the elements in order to complete the inference network. As there would be a 
large number of CEs in the networks that are designed for solving practical problems, 
considerations in the layout of the CEs are required both in the chip level and system 
level. In the chip level, minimizing the area for wires interconnecting the CEs allow 
more CEs to be placed on a chip. However, there must be a limit on the number of CEs 
that can be placed on a chip. If the inference network is to solve practical problems, 
a multichip configuration is a must. In the inference network, the structure of a CE 
、 
depends on the target size of the network, some tricks or techniques are required to 
define a multichip configuration. In this chapter, problems in the layout of the chip and 
building an inference network chip for practical problems that involve a large number 
of computational elements are considered. 
48 
CHAPTER 5. A VLSI CtllP  49 
5.1 Spatial Configuration 
A good placement of the CFs is necessary to reduce the area required for interconnecting 
the CEs. Though not fully interconnected, there are still many interconnections defined 
in an inference network. In addition, these interconnections are not limited to the 
i nearest neighbors if the CEs are placed in a plane. Generally, in an inference network for 
a shortest path problem on a iV-node graph, there would be N(N-l) CEs with 2(N-2) 
input links and 2(N - 2) output links in each CE. A good spatial configuration will 
reduce the length of the interconnections, resulting in reduced propagation delay among 
the CEs and the area necessary for the wires running the interconnections. Another 
[ desired characteristic of a good spatial configuration is to minimize the crossover paths, 
thus reducing interferences among the signals. 
: . 
There are a number of possible spatial configurations. 
H^ I I I 
參 Torus 
In a torus configuration, each CE is placed on a disk like structure as in Fig-
ure 5.1a. The unit lies in the center of the disk with sites attached on perimeter 
i vfty 、te:o>' 
^ ^ ^ ^ Bidirectional Links 
Figure 5.1: a. Computational element placed on a disk b. CEs placed in a torus 
of the unit. Connections to other CEs are to be terminated on the perimeter of 
the disk, The disks are then stacked together with connections among the CEs 
defined on a cylindrical surface wrapped on the disks. 
CHAPTER 5. A VLSI CtllP  50 
As the torus configuration involves a 3D structure, it is not suitable for planar 
implementations, such as that in a VLSI implementation. 
• Bus 
As depicted in the inference network prototype, for an inference network that is 
defined for a iV-node graph, a bus configuration reduces the number of links in a 
CE from 4(N - 2) to 2(iV - 2) + 1. In a bus configuration, each CE is built as a 
module that is to be plugged onto the network bus. 
However, the bus configuration is not appropriate for a large network. In a large 
network, the number of I/O connections of a CE, 2(N - 2) + 1 is much less than 
the width of the network bus,iV2. As a result, a large area would be used in 
routing the network bus, but without connecting the CEs. 
• Matrix 
In the inference network, there are no connections between CEs ( i j ) and (kj) 
if i,j,k and I differ. When the CEs are placed in a matrix configuration, the 
interconnections would be limited to CEs in the same row or column. 
5.2 Layout 
5.2.1 Computational Elements 
The following are the guidelines in the layout of the CE 
• Minimum feature size transistors are used wherever possible to reduce the size of 
the CEs in the network. 
CHAPTER 5. A VLSI CtllP  51 
• As the number of sites in a OT varies with iV, the size of the graph on which the 
shortest path problem is defined, it would be desirable to have the sites in building 
a CEbe modularized. With modularized sites, a CEfor a different sized network 
can be built by adding or removing site modules, otherwise, custom layout has to 
be performed for each size of the inference network to be built. 
• Another goal of the layout is to reduce the random wiring. This allows easy 
adaptation to larger networks. It would also be possible to have a program to 
generate the desired chip. 
Figure 5.2 is the layout of the module that is to be stacked for each additional site in 
the CE. In addition to the adder, a part of the unit, that corresponds to each additional 
site is also placed in the module. 
In Figure 5.2, it is noted that the capacitors are the most space demanding compo-
nents, this is due to the need for improving the accuracy of the capacitor ratios over the 
presence of parasitic or stray capacitances. Connection at the critical nodes, the inputs 
of the OTA, is kept short to minimize the parasitic capacitances. Control signals are 
brought to the perimeter of the module to facilitate stacking of the modules. 
Figure 5.2: The layout of a module 
In an TV-sized inference network for the shortest path problem, there are N — 2 sites 
in each CE. Figure 5.3 is the proposed floorplan of the CE. Site modules are stacked 
CHAPTER 5. A VLSI CtllP  52 
[ Site Inputs ._ _, . 
1= Site A 
d SiteB 
Site Inputs I 
巨 SiteC 
d Site D 
Site liqiuls ^ 
. Initialization Control ceo» 艸 [ Unit 
crn • • Controb 
Figure 5.3: The floorplan of a CE 
together with the inputs lined along one side. The common parts of the unit function, 
including the control circuit, the initialization circuit and the output buffer are placed 
together in a block lying below the stacked site modules. Figure 5.4 shows the layout 
_ 
Figure 5.4: Layout of a CE'm a 6-object network 
of a 6-node CE. 
5.2.2 The Network 
The CEs are arranged in a matrix configuration to have interconnections among the 
CEs to be limited to lie in the same row or column. CEs in adjacent rows and columns 
are placed in a mirror configuration, such that the network buses between two rows or 
columns can be routed in the same routing channel, which reduces the area in wiring 
-CHAPTER 5. A VLSI CtllP  53 
the interconnections. In the layout of a 6-node network, the interconnections occupy 
_ _ 
5.2.3 I /O Requirements 
For each CE'm the network, there are one input for the initialization input, one output 
for the CE output, and N - 1 outputs for reading the route information taken in a 
shortest path. If all the input and output signals are to be brought off-chip in operating 
the network, N(N - 1) (N1) I/Os are required, which is too much to be feasible. The 
number of I/Os required can be reduced by 
參 A route Bus 
As described in section 4.2.2, the use of tri-state circuits in route-reading allows 
the route data of all CEs to be connected in a bus. A RD input in each CE 
then controls whether the route bus will be driven with the route data from the 
corresponding CE. 
• A decoder for the readout control 
Instead of bringing all RDs off-chip, an on-chip decoder could reduce the required 
connects from N(N - 1) to |"log2 N(N - 1)]. 
For simplicity reason, the decoder is not added in the present layout. For an N size 
network, the number of I/Os required is then given by 3N(N - I ) N - I. 
5.2.4 Optional Modules 
If the inference network is to be controlled with a digital computer, a number of modules 
are necessary for a self contained inference network. These include 
霄 
CHAPTER 5. A VLSI CHIP  54 
• D /A converter and S/H 
A digital to analog (D/A) converter converts the digital code to an analog form to 
drive the analog operating inference network. The analog inputs of the inference 
network are for the initial output of the CEs. One sample and hold (S/H) circuit 
is required in each CE for maintaining the initial CE output otherwise a D/A 
converter has to be included for each CE. 
• A /D converter 
An analog to digital converter is needed for reading the CE outputs which corre-
spond to the costs of shortest paths. A multiplexer is also required to read the 
CE outputs one at a time. 
However, these modules are not essential to the operation or behavior study of the 
inference network, and are being implemented with off-chip commercial components in 
the present design. 
5.3 A Scalable Design 
Regardless of the small size of the computational elements that can be achieved with 
VLSI techniques, there would still be a limitation on the number of elements that can 
be placed on a chip. The limitations may be imposed by the I /O pins that are available, 
a reasonable process yield of a certain die size in a particular fabrication technology, 
etc. For solving problems that involve large networks and given the above limitations, 
it would then be desirable to be able to cascade small networks into a larger network. 
For the binary relation inference network, generally it is not possible to cascade 
some small networks into a larger network. This is because the number of sites in each 
CHAPTER 5. A VLSI CHIP  55 
CE of 3, N sized network is fixed at TV - 2. A larger network will need more sites in 
each CE, but the additional sites cannot be added without doing a new layout. This 
limits the ability of the network in scaling up. The same problem occurs in the number 
of links required. However, it is possible to design a OEfor a M-sized network to work 
in an AT-sized network, M > N. This is done by connecting the inputs of the unused 
sites to Vcc, thuf disabling the corresponding sites. I/O pins have to be reserved in the 
chip for the connections. A M-sized network can then be cascaded from the iV-sized 
network chips as shown in Figure 5.5. The width of the row and column buses have 
— — - — 一 一 、 产 - — — — “：“二 ~ 、 
I 广 \ r \ | 1 ；‘ *. ： *• ••‘ \ 
-! — n — n — f r " ^ ^ 
^ ^ \ I ji : ： : : ：] Row Bus 
_ t ^ ^ V y Vs. •
 1 . *., ‘ ' 
3-node ^^ 1 •丨 1 
Inference network chip , ^ v • j •• 、• •• •• ••• \ 1 
！ … . ^ ^ 慰 工 ^ 
Diagonal chips^  _!£ > f \ • ；‘ A 
1" \ ~ t r r = r , 
[L-J U v i i v - … … , t -二：二二：：一 、 Non-Diagonal chips 
r~ > •• •，、" ^^ /• '、 ： 、• / 
I t ； j ； ! ；：： f i i !i / 
• :、 ； •、 •• •• / 11 •• ••• 、、 •••_ / 
！ •‘ *. ‘“ 、 •_ ！ ！ :• •• ！ I 
—！~！ ！-1 HTi ！ r ：； \ 
( e、::M:];if=: ;.........1........i \ 
T ~ r r ~ r r ~ n r " " “ r \ 
!： ： ： ； ： ； ；.： ； ： ； ！ \ 
• *- . . . . . . . “- ------* / ^ *•------* *• • — --‘ y 
/ ^ ^ _ 二 」 
Column Bus 
Figure 5.5: Cascading inference network chips into a larger network 
also to be designed for a M-sized network. 
For large networks, though it may not be feasible to place all CEs on the same chip, 
it is possible to construct the networks from chips with fewer CEs built. When a M-
sized network is to build from chips of a A^-sized network, (M/N)2 chips are required, 
and CE in the chips has to be built for a M-sized network, i.e., having M - 2 sites in 
each CE. 
CHAPTER 5. A VLSI CtllP  56 
As there are no CEs along the diagonal in the matrix layout, two types of chips are 
needed: 
• Diagonal Chips These chips are to be placed along the diagonal of matrix and 
there is no CEs placed along the diagonal on the chip. 
• non-Diagonal These chips are to be placed not on diagonal and need to have all 
CEs to be placed in the matrix. 
Some limitations are to be observed in cascading chips. 
• M/N must be an integer 
• CEs in a chip have to be designed for the M-sized network. 
• The width of the column and row buses have to be designed for a M-sized network. 
• Fanout in a large network 
When N chips cascade to form a larger network, there would be an iV-fold increase 
in the loading of the drivers in each CE. This would also increase the response 
time of the network by N times since the speed of the inference network is mostly 
determined by the large signal behavior of the output drivers. In order to maintain 
the same operating speed when chips are cascaded, the output drive capabilities 
of the output drivers have to be increased by N times. 
_ . . 
Chapter 6 
The Inverse Shortest Paths 
Problem 
In the previous chapters, the binary relation inference network is shown to able to solve 
the shortest path problem in a time that is practically independent of the problem size. 
Besides being a stand-alone problem, the shortest path problem also finds its existence 
as subproblems in many problem settings. In this chapter, an embedded application of 
the inference network is described. 
Shortest path problems, as a stand-alone problem, find presence in many applica-
tions, such as in traffic control, communications network routing, etc. As an example, 
in a fire services department, it is always the goal to have the fire-fighters arriving at 
the scene in the shortest time possible. The traffic planner has to choose a route from a 
usually densely populated road map. There are many well established algorithms, such 
as Floyd's, Dijkstra's [2], in finding the shortest path between two nodes in a graph if 
all the arc costs in the graph are known. 
However, in many practical situations, some of the arc costs are usually unknown. 
57 
CHAPTER 6. THE INVERSE SHORTEST PATHS PROBLEM 58 
Consider the previous example again, the exact values of the road costs depend on 
factors like road length, maximum traveling speed, the frequency of traffic jams, etc.; 
and some of these factors may even be time varying. As a result, the shortest paths 
that are determined from the given set of road costs may not be the optimum. On 
丨 the other hand, daily drivers on the road would have a priori knowledge (or expert's 
knowledge) of some of the shortest routes available; but they have no idea on how the 
costs are being assigned to the roads. In such cases, it is desirable to modify the road 
costs, such that the a priori optimal routes are incorporated as the shortest paths in the 
map. The inverse shortest paths problem arises as the need to estimate the unknown 
arc costs in a graph, and hence to determine all of the possible shortest paths, based 
on the given expert's knowledge of some shortest routes. 
Reference [12] described another similar but interesting situation in seismic tomog-
raphy. Semisic perturbations from earthquake observations are used to determine the 
transmission characteristics between different points in the geologic zone. Different 
geologic zones correspond to different nodes in a graph while the transmission charac-
teristics between the zones will correspond to the costs of the paths among the nodes. 
Similar to the previous situation, it is required to calculate the costs from the given 
shortest paths, which correspond to the semisic perturbations from the earthquake 
observations. 
These problems are instances of the inverse shortest paths problem which can be 
formulated mathematically as follows. 
CHAPTER 6. THE INVERSE SHORTEST PATHS PROBLEM 59 
6.1 Problem Statement 
Let a graph with N vertices m arcs {a,-}^, each arc being associated with 
an non-riegative cost be denoted by the triple G = (V, A, C) where V is the set of 
vertices, A is the set of arcs and C is the set of costs. Each arc a{ is defined by its 
source vertex and destination vertex denoted as ai = A path 
P j in G is given by a set of consecutive arcs pj 二 (ah ,aj2,,.., ajl{j)) where l(j) is the 
number of arcs in path P j and d{ji) = s(ji+1) for i = - 1. The cost of path 
Pj is given by T,i\aiev3
 Ci. 
Given a graph G, and a set of p paths P 二 t h e inverse shortest paths 
problem requires the costs to be modified such that all pjS are to be the shortest 
paths in the graph. In other words, find c- such that 
E x： 4 ( “ U ， （6-1) 
jlajEqi 
and 
c；- > 0, (i 二 l，...,m) (6.2) 
where qi is any path with the same source and destination vertices as in pi. 
There are a number of characteristics observed in the inverse shortest paths problem. 
• The solution is not unique. 
• The number of inequalities involved in Eq. (6.1) is generally very large. This 
depends on the number of given paths p and the connectivity of the graph, which 
determines the number of possible paths between a vertex pair in the graph. 
• There would also be a large of number of redundancies in Eq. (6.1). 
CHAPTER 6. THE INVERSE SHORTEST PATHS PROBLEM 60 
The problem can be solved by formulating it into an optimization problem and 
applying the many available optimization techniques. 
In one of the previous approaches [12], minimum change in the set of costs is imposed 
as an additional constraint. In that approach, the least square norm, /2, is selected to 
represent the change of costs. It is selected out of the many available representations, 
/1?/2? /⑷，etc., as it provides tractable computation methods, and is also widely used in 
other problem formulations. With the least square norm chosen, the inverse shortest 
paths problem becomes a quadratic programming problem and is formulated as follows. 
1 m 
(6.3) 
c‘ 1 i=i 
subject to the constraints in Eq. (6.1) and Eq. (6.2)，c;s and c,-s are the new and original 
path costs respectively. There are several approaches and numerous algorithms [30] 
in solving quadratic programming problems. The Goldfarb-Idnani(GI) [30] algorithm 
is employed by the authors to solve the problem because of its efficiency in dealing 
problems with a large number of constraints. 
The GI algorithm is of the active set type. An active set is a subset of the constraints 
that are being satisfied with the current solution in C'. Suppose K = {1, • • A;} denotes 
the set of indices of the constraints, where k is the number of constraints in Eq. (6.1) 
and Eq. (6.2), and A C K denotes the set of active constraints. At each iteration 
of the GI algorithm, a subproblem P{A) is defined to be the quadratic problem with 
the objective function in Eq. (6.3) subject to the constraints indexed in A. A violated 
constraint h e K - Ais incorporated into A. The new subproblem P(AU{h}) is solved 
for a new solution C'. When all the constraints are incorporated into the active set, the 
CHAPTER 6. THE INVERSE SHORTEST PATHS PROBLEM 61 
optimal solution is obtained. The algorithm starts with the active set being the empty 
set and this makes the unconstrained minimum of Eq. (6.3) the initial solution. 
With some specialization resulting from the nature of the inverse shortest paths 
problem, several simplifications are possible in 
• detection of violating constraints 
• calculation of the dual step direction 
• determination of the new costs 
These improve the efficiency of the algorithm in solving the problem. 
In another approach [44], the inverse shortest paths problem is solved in two phases. 
In phase I, a subgraph Cl = (Vp,益尸)is formed from the vertices and arcs in P, Vp is 
the set of vertices in P and AP = {aj | aj e Pi,i = is the set of arcs in P. 
Let denotes the set of paths in Cl with source at and destination at 
s(i) and d(i) correspond to the source vertex and destination vertex of path pi 
respectively. C is revised to C" by solving the following. 
c'l = Ci AP 
； min Yu (《— c02 (6-4) 
i\ai£Ap 
subject to 
cf > 0 
and 
E E ( “ v . . , p ) ， 
CHAPTER 6. THE INVERSE SHORTEST PATHS PROBLEM 62 
where qi = ⑷ ， 小 ) ) - p ^ £ is a positive and sufficiently small value. In this phase, 
the objective is to revised the costs so as to make paths pj to be the shortest path in 
In phase II，define f 二 谂 to be the set of paths in (F, A, C ,f) with the same 
source vertex and destination vertex as p“ with the cost of the path less than pi. Let 
A^ = {ai I ai e be the set of arcs in 沴 . T h e costs correspond to the arcs in A斗-AP 
are then increased, such that pis are the shortest paths in the graph with the revised 
costs. The costs C" are revised to C' according to the following. 
f c'i = c'l i I 叫 这 A 妙 一 Ap 
I min X ： (c： - cf)2 (6-5) 
C • 
subject to 
c'i > c'l i\aieA^- AP 
E ci> E c；书 
where qi is the set of paths in tp with source and destination vertices the same as pi. € 
is positive and sufficiently small. 
The quadratic programming problems, Eq. (6.4) and Eq. (6.5), are solved with the 
modified simplex method. Comparing with previous approach, the two phase approach 
is able to reduce the size of the constraint set in the optimization problem formulated 
but with an expense in graph tracing. However, there could still be a large number of 
constraints, depending on details of the particular graph. 
The objective of these approaches is to minimize the cost changes while satisfying 
CHAPTER 6. THE INVERSE SHORTEST PATHS PROBLEM 63 
the shortest paths requirements in Eq. (6.1) and Eq. (6.2). The following describes 
an alternative approach using m embedded connectionist network, the binary relation 
inference network. 
6.2 The Embedded Approach 
6.2.1 The Formulation 
In the proposed approach for solving the inverse shortest paths problem, the costs are 
revised to drive the costs of the paths in P to be smaller than the costs of the shortest 
paths obtained from the graph G. As a result, the given paths in P will become the 
shortest paths between the corresponding vertex pair. The objective function in this 
formulation is 
minX:( E X：。- e)2 ( 6 . 6 ) 
subject to Eq. (6.2), where n(s(i),d(i)) is the shortest path between ” 明 and v明 in 
(V, A, C). The only constraint in this formulation is the non-negativeness of the costs. £ 
is a small positive constant, introduced to ensure the uniqueness of 内 being the shortest 
path in the solution. A widely used technique in solving general constrained nonlinear 
programming problem, the penalty function transformation is selected to solve the 
problem. The transformed unconstrained problem is then solved with the General 
Conjugate Gradient-Golden Section Acceptable Point(GCG-GSAP) [40] algorithm. 
With the penalty function transformation, the constrained optimization problem is 
turned into the optimization of an unconstrained function. 
min / (c , r ) = m c i n | ^ ( ^ ^ c j ~ A (6.7) 
' [i=l jhjGpi ) i=l J 
CHAPTER 6. THE INVERSE SHORTEST PATHS PROBLEM 64 
where 
r o Ci>Q 
购=\ci Ci < 0 
The solution of the constrained problem is obtained from the solutions of a sequence of 
the unconstrained problem with decreasing values of T and as r 0. 
In evaluating the objective function in Eq. (6.7), it is needed to find the shortest 
paths for all the source-destination vertex pairs of the paths in P. As the analog 
operating inference network provides an efficient way in solving the all pair shortest 
path problem, it is incorporated in the algorithm. Figure 6.1 shows pictorially how the 
inference network is embedded in the algorithm in solving the inverse shortest paths 











All pair shortest 
> path 尹 
analog circuit 
V / 
Binary Relation Inference Network 
Figure 6.1: Embedded approach for the inverse shortest paths problem 
problem. The embedded inference network helps finding the shortest path in evaluating 
the objective function. 
CHAPTER 6. THE INVERSE SHORTEST PATHS PROBLEM 65 
6.2.2 The Algorithm 
The following describes the proposed algorithm in solving the inverse shortest path 
problem. For additional details on the conjugate gradient algorithm, see Appendix C 
and [69]. 
1. Initialization 
With Eq. (6.7), transform the constrained optimization problem in Eq. (6.6) into 
an unconstrained problem. 
Set T = 0.1 
Set the initial solution c to the given set of costs C. 
2. Solving the unconstrained problem: 
(a) Initialization 
k = 0 
Take the current set of costs be ck. 
Set s° 二 —g0, g is the gradient of /(c, r). 
(b) Let ck+1 = cfc + Xks
k where Xk is determined by the GSAP algorithm 
(c) If || ||< €, optimum reached and 
c * 二 广 二 / (
c
奸 i ) 
(d) To reduce the effect of roundoff errors, at a predetermined number of itera-
tions, RESET, oJk+i is set to 0. This makes the search to discard information 
accumulated during the previous iterations. 
I CHAPTER 6. THE INVERSE SHORTEST PATHS PROBLEM 66 
I If k> RESET, let c° 二 cfc+1, s° = k = 0, goto step b; otherwise 
goto step e 
(e) Set 二 + 叫 w h e r e 
I 
1： (gfc+1 ~ gfc) • gfe+1 
叫 . 1 一 (gfc+i — gfc) . sfc+i 
(f) k = k^-l 
(g) Goto step b. 
This solved c for a particular r. 
3. r 二 r x 0.09 
If r > le — 6, goto step 2. Otherwise stop. 
As there is no explicit expression for the gradient, gk, of the objective function 
/(c, r), the gradient is evaluated using the forward differences in the objective function. 
6.3 Implement at ion Results 
The data in the test run is generated as follows: 
1. Generate all the shortest paths in the graph with the given costs 
2. The set of costs are altered in a randomized manner, as a result, some of the 
previous generated paths will not be the shortest paths 
As an example, consider a problem on a 5-node graph with bi-directional arcs. The 
arc costs are: 
CHAPTER 6. THE INVERSE SHORTEST PATHS PROBLEM 67 
c12 10 c i 4 4 c23 20 c 25 2 c35 12 •、. • 
C13 8 C i 5 6 C 2 4 OO C 3 4 15 C 4 5 0 0 
with the known shortest paths being 
Pi 二 1,2 p6 = 2,1,4 
I 
P2 = 1,3 P7 = 2,5 
P3 = 1,4 P8 = 3,4 
Pa = 1,2,5 p9 = 3,5 
二 2,5,3 pio = 4,3,5 
Running on a PC, the solution time is 1.05s with the following modified costs 
C12 7.60 C14 8.95 c23 20 c25 1.56 c35 7.78 
C13 8.73 c i 5 9.16 c24 0 0 c 3 4 10.05 c 4 5 0 0 
The updated arc costs indeed have the given paths incorporated as the shortest paths. 
There are 263 evaluations of the objective function, with 120 involved in determining 
the gradient. 
6.4 Other Implement at ions 
As the present inference network hardware limits the problem size that can be handled, 
other ways to implement the algorithm have also been investigated. These serve as a 
study on the performance of the algorithm on problems with a large number of nodes. 
6.4.1 Sequential Machine 
In evaluating the objective function, it is needed to determine the shortest path between 
the vertex pairs of the paths in P. Floyd's all pair shortest path algorithm is used to 
CHAPTER 6. THE INVERSE SHORTEST PATHS PROBLEM 68 
find the shortest paths. If the number of given paths, p’ is small, p < n, using a single-
pair shortest path algorithm for each vertex pair would be more efficient. Table 6.1 
shows the program runtime on a 19-cities problem. 
Table 6-1: Run-time of a 19-cities problem 
Algorithm used Dijkstra Floyd 
Number of expert p a t h s 1 0 9 0 " 1 7 1 10 90 171 
Run-time 16.5 268.3 510.5 53.9 104.7 109.5 
6.4.2 Parallel Machine 
Since the algorithm depends a lot on the execution time in the shortest path algorithm, 
a fast shortest path algorithm solver would reduce the solution time in solving the 
inverse shortest paths problem. 
A systolic array implementation [85] of the inference network in solving the shortest 
path problem replaced Floyd's algorithm in another implementation. Table 6.2 shows 
the runtime of the program on a 19-cities problem. 
Table 6.2: Run-time of a 19-cities problem on a parallel machine 
-- ― ― — W i t h parallel machine Without parallel machine 
Number of expert Paths 1 0 ~ 9 0 171 10 90 171 
Run-time(s) 91.9 142.4 146.4 62.3 155.7 343.3 
6.5 Discussion 
The analog binary relation inference network is shown to be able to perform as an 
embedded engine in solving the inverse shortest paths problem. It was found to be 
an efficient way in finding the shortest paths for all vertex-pairs in a graph, required 
I. 
CHAPTER 6. THE INVERSE SHORTEST PATHS PROBLEM 69 
during the evaluation of the objective function in the nonlinear optimization process. 
In the present implementation, there are overheads in incorporating the analog 
operating inference network. The inverse shortest paths problem solution time can be 
improved by removing the overheads in interfacing the host computer with the inference 
network. There are two major overheads: 
1 - ^ … 
1. Network operations control 
As the inference network operates with analog signals, the cost values have to be 
converted to analog values. In the present implementation, this is accomplished 
with a data acquisition add-on card on the host computer, a PC. The add-on card 
also provides the digital interface to the inference network in reading the route 
information. To cope with the operating range of the analog inference network, 
the set of costs needs to be scaled before converting to analog signals and applied 
to the inference network. In the previous described test run, the measured time 
needed for converting and initializing the network is 0.15ms. 
2. Interpretation of the results 
The route indices obtained from the inference network only indicate the index 
of the contributing site in a shortest path, and they need to be converted to the 
corresponding node index in the graph. The measured time for finding the routes 
of the shortest paths is 0.8ms. 
When compared with the running time of the network, 10/^ s, these overheads lower 
the anticipated performances that can be obtained from the inference network. A sug-
gested system for removing the overheads from the host computer is shown in Figure 6.2. 
The system is interfaced to the host computer as 3 memory blocks, one for storing the 
CHAPTER 6. THE INVERSE SHORTEST PATHS PROBLEM 70 




Addr ——> - ^ ― … ， — — ^ ― f , 
RAM network 
I 
I I 厂 一 
Enable 
—— DeMux S/H Encoder 
Decoder M u x 1 






Write R A M 」 
, 
Data 乙 
Figure 6.2: A system for solving shortest path problems 
costs, one for the resulting costs of the shortest paths and one for the route indices. 
Solving a shortest path problem would then require initializing the memory block with 
the arc costs, waiting for the end of operation in the inference network, and reading 
results from the other two memory blocks. This will remove most of the overheads in 
operating the inference network as an embedded engine. 
r 
Chapter 7 
Closed Semiring Optimization 
Circuits 
Though the previous chapters have been focused on the implementation of the binary 
relation inference network in solving shortest path problems with analog processing 
circuits, there are other possible inference network applications, as discussed in [85]. For 
some applications, the binary relation inference network can also be implemented with 
self-timed digital circuits; this is in contrast to the conventional use of digital circuits. 
With self-timed or asynchronous operation, it avoids the problem of keeping a large 
number of processing elements in synchronous operation, which usually has problems 
in distributing the master clock [100]. In this chapter, some further applications of 
the inference network and its practical implementations in electronic circuits will be 
investigated. 
Many path problems in directed graphs, including shortest path problems, can be 
defined with an algebraic structure, the closed semiring. With the closed semiring 
representation, and a dynamic programming solution, an inference network solution 
71 
I CHAPTER 7. CLOSED SEMIRING OPTIMIZATION CIRCUITS 72 
can be easily devised for solving the problems. An example is presented in this chapter. 
• • 
7.1 Transitive Closure Problem 
I 
7.1.1 Problem Statement 
Given a directed graph G 二 (V，A), where V is the vertex set and A is the arc set. The 
transitive closure of graph G is defined by the graph G* 二 {V, A*) where 
A* 二 {(之 j ) ： there is a path from vertex i to vertex j in G) 
b 
j In the transitive closure G*, an arc would exist between i and j only if there is a 
directed path between i and j in G. G* indicates the connectivity of G. 
The transitive closure of a graph can be computed with Floyd's algorithm [16]. Each 
arc in A is assigned a weight of 1. After running Floyd's algorithm on the adjacency 
matrix D, if there is a path from i to j , the shortest-path weight dij will be smaller 
than iV, the number of nodes in the graph. Otherwise, d^ 二 oo. 
As noted in the definition of the transitive closure, two states are sufficient for 
the representation of weights in the closure graph, G*. With a binary representation, 
arithmetic operations in the Floyd's algorithm can be replaced with the logic operations 
AND and OR, this reduces the computational load. 
Besides the sequential algorithm, there are also parallel algorithms developed in 
finding the transitive closure of a graph [46, 49, 68, 86，93, 96]. The following describes 
an inference network solution which provides a compact implementation and a solution 
time that is practically independent of the problem size. 
I CHAPTER 7. CLOSED SEMIRING OPTIMIZATION CIRCUITS 73 
7.1.2 Inference Network Solutions 
Solving with Analog Processing Inference Network(as a special case of short-
est path problems) 
As described above, the transitive closure G* can be found by a shortest path prob-
lem algorithm. Similarly, the shortest path problem inference network developed in 
Chapter 3 can be used in solving the transitive closure problem. 
For the transitive closure problem, each computational element ( i j ) in the inference 
network is to compute the connectivity between nodes i and j. 
With the shortest path problem inference network, each CE ( i j ) computes the 
length of the path between i and j. The longest path in an A^-node graph has AT - 1 
arcs, and the corresponding CE will then have its output at (iV — l)Va if each arc is of 
length Va. CEs that correspond to disconnected arcs will be assigned a value of Vsat, 
where Vsat is the saturated output voltage of the op-amps used in the inference network. 
To differentiate between CEs that correspond to disconnected and connected arcs, it is 
then required 
{N - l)Va < Vsat 
At initialization, CEs correspond to arcs in A are initialized to ^ f f , where Vmax 二 
Vsat 一 <Hs a small positive value. Other CEs, i.e., CEs that correspond to initially 
disconnected arcs, are initialized to Vmax-
When the network settles, a CE output at Vsat indicates that there is no path 
between the corresponding vertices. A CE output less than Vmax indicates that the 
corresponding vertex pair is connected. As an example, consider the 5-node graph 
in Figure 7.1. Since the shortest path problem inference network is developed for 
I CHAPTER 7. CLOSED SEMIRING OPTIMIZATION CIRCUITS 74 
2.5 J r • ‘ ‘ ' ^ T I T I 
㈣：： 
1.5 
15 \ \ \ 
i , \ \ 、\ -
1 1" \ \ \ . 
5 5 、 \ z � — : 
yj\ P 05 • % = 
-A\/ = 
o n(. I • • I 1 1 1 '
 1 
o 2 4 6 8 10 12 14 16 18 20 
Time(ns) 
Figure 7.1: A map for 
the transitive closure Figure 7.2: Measured network response in finding transi-
problem tive closure 
undirected path problems, only an undirected graph is considered. Figure 7.2 shows 
the measured response of the inference network in finding the transitive closure of the 
5-node graph. 
A Self-Timed Digital Inference Network 
As noted in the transitive closure definition’ two states are sufficient in representing 
the elements in G*: let 1 indicates that there is a path and 0 indicates that there is no 
path between the corresponding node pair. A binary representation would naturally 
lead to a digital implementation [85] of the inference network. 
With logical operators, the site function Sf^ in each CE^j is given by 
S^CE^CEw 
which states that there would be a path between nodes i and j if there are arcs con-
I CHAPTER 7. CLOSED SEMIRING OPTIMIZATION CIRCUITS 75 
necting nodes i, k and k j . The unit function is given by 
C E , , = CE^J V (V v 為 ） 
for 0 < i 爹 j 妾 k < N, which states that there would be a path between nodes i,j if 
there are paths passing through the other nodes. 
The Circuit 
The proposed circuit in [85] maps the AND and OR operations required in the site 
and unit functions into NAND operations to simplify the implementation. However, 
there is a problem not being considered. The unit and site functions in the CEs are 
non-decreasing functions, once the network stabilized, the network cannot be set to an 
arbitrary state and this prevents the network from solving problems. As a result, the 
network has to be initialized to a new state by breaking up the interconnection of the 
network. Figure 7.3 shows the circuit with the necessary initialization control. With 
the initialization control START active, the interconnection breaks up and output of 
each processing element is set at 0. 
3-3 V S ^ y ^ ^ O ^ C E output 
INIT> ’ 
START > 
Figure 7.3: A computational element for the transitive closure problem 
In the inference network, each CE works asynchronously, there is no master clock 
needed as in synchronous operating networks. The absence of a master clock eliminates 
the synchronization problems that typically found in large networks; such as clock 
！ . 
I CHAPTER 7. CLOSED SEMIRING OPTIMIZATION CIRCUITS 76 
latency, clock skew, wiring delay, etc. 
Network response is limited by the propagation delay of the gates. Similar to the 
inference network for the shortest path problem, the worst case response time is given 
by 
I Tso/ = ( l + [ l o g 2 ( n - l ) l ) w 
where Tunit is the propagation delay of a CE. In the above circuit 
Tunit = 2m AND + TAND 
where TNAND is the propagation delay of a NAND gate and RAND is the propagation 
delay of an AND gate. 
Compared with the analog operating inference network solution, the network re-
sponse is faster because of the smaller propagation delay of each CE. 
7.2 Closed Semirings 
The ability of the analog processing network in solving the shortest path problem and 
the transitive closure problem is not incidental. Both problems are instances of an 
algebraic structure, the closed semiring. 
A closed semiring [16, 1] is a system consisting of the following elements: (i) 5, a 
set of elements, ( i i )㊉ , t he summary operator and ©，the extension operator, both are 
binary operators on S (iii) 0, 1, elements in 5, and with the following properties: 
1. (5, 0) is a monoid: 
• 5 is closed under ㊉:a ㊉& € for all a,b e S. 
^ ^ 響 I : : " , 
CHAPTER 7. CLOSED SEMIRING OPTIMIZATION CIRCUITS 77 
• ㊉ is associative: a ㊉(&㊉ c) = (a ㊉ ㊉ c for all a,b,c£ S. 
• 0 is an identity for ㊉：ct ㊉ 0 二 G ㊉ a 二 a for all a G 
Similarly {5, ©, X) is a monoid. 
2. 0 is an annihilator: a © 5 = 0 © a 二 0 for all a e 
3.㊉ is commutative: a ㊉ 6 二&㊉ a for all a, 6 e 
4 .㊉ is idempotent: a ® a = a for all a € 5. 
5. 0 distributes over a0(6©c) = (a©6)e(a0c) and (6®c)©a = (6©a)®(c©a) 
for all a, 6, c G S. 
6. If ai, a2, a 3 , . . . is a countable sequence of elements of 5, then c^ ® a2 ® a3 ① . • . 
is well defined and in S. 
7. Associativity, commutativity, and idempotence apply to infinite summaries. 
8 . © distributes over infinite summaries: a © ( 办 丄 ㊉ & 2 ㊉ & 3 ㊉ . • •) = (a © ㊉ ( a © 
㊉(a©&3)㊉••• and ( � ㊉ a2 ㊉ a3 ㊉•..) © & = (尉 ©&) © (a2©&)㊉(a3©&)㊉.• •• 
With the properties defined above, the closed semiring can be applied in defining 
a calculus of directed graphs. Given a directed graph G = {V, E) and a labeling 
function X ： V X V S which maps all the ordered pairs in V into a codomain S. 
Let the label of edge (u, v) e E be denoted by X(u,v). With the extension operator, 
©，operating on the label of edge, the label of path p =< ...,vk>,is given by 
X(p) 二 v2) 0 A(v2,幻3) © . . . © 入(你-l, vk) 
The summary operator ㊉ summarizes the path labels with semantics specific defined 
for the applications. The problem is then to find the summary of all path labels from 
I CHAPTER 7. CLOSED SEMIRING OPTIMIZATION CIRCUITS 78 
i to j for all the vertices pairs G V^ 
If hj 二 ㊉ i / C P ) (7.1) 
The commutativity and associativity of ㊉ allow the summary operation to be taken in 
any order. Properties 6 and 7 allow complex graphs to be considered as a countable 
infinite number of paths may exist in these graphs. 
With different operations assigned to the extension and summary operators, the 
summary in Eq. (7.1) corresponds to a number of well-known problems defined on 
graphs. Some examples of closed semirings are: 
1. Shortest path problem 
The semiring U oo,min,+,oo,0) corresponds to the shortest path problem. 
The shortest path problem is to be defined on a set of nonnegative real numbers, 
R>° u oo. The labeling function returns the cost of the path,入(i, j ) 二 The 
extension operator is the arithmetic operator +. The summary operator is the 
min operator. 
2. Transitive closure problem 
The semiring ({0，1}，V,八,0,1) corresponds to the transitive closure problem. The 
labeling function X(i,j) returns 1 if there is an arc between i and j , otherwise it 
returns 0. The summary operator is the logical OR operator while the extension 
operator is the logical AND operator. 
3. Minimum Spanning Tree problem 
The semiring (R U {oo, 一oo} , min,max,oo,—oo) finds the minimax for each pair 
p p p 賢 一 ：人 
CHAPTER 7. CLOSED SEMIRING OPTIMIZATION CIRCUITS 79 
of vertices in a graph. Arcs that have its minimax value equal to the arc cost are 
in the minimum spanning tree. 
The following is a dynamic programming algorithm in computing the summaries of 
all the path labels kj. 
1. 
2. for i — 1 to n 
3. do for j — 1 to n 
4. do if i = j 
5. then 4 0 ) l I：㊉ A “ , j ) 
6. else 卜入(i，j) 
7. for fc 1 to n 
8. do for « 1 to n 
9. do for j ^― 1 to n 
10. d o f (ri} ©(^ )® 
11. return L ^ 
7.3 Closed Semirings and the Binary Relation Inference 
Network 
Problems described by the closed semiring structure are particular well suited to be 
solved on the binary relation inference network. Each computational element ( i j ) in 
the inference network would be responsible for determining hj, which is the summary 
of all path labels from i to j. The site function then corresponds to the extension 
operator while the unit function corresponds to the summary operator. Since the 
summary operator is associative, there is no fixed order on the summary operation and 
it is performed at the same time in the inference network. Instead of computing the 
label summary one at a time as suggested in the dynamic programming algorithm, the 
inference network computes all the label summaries at the same time. 
I CHAPTER 7. CLOSED SEMIRING OPTIMIZATION CIRCUITS 80 
The shortest path problem and the transitive closure problems have been solved 
on the inference network as described in the previous sections. Although the network 
operations are defined independent of their closed semiring representation, they match 
closely with the definition of the operators in the corresponding closed semiring repre-
sentation. In the following, the minimum spanning tree problem will be solved on the 
inference network, in which the network operations are derived from the closed semiring 
representation of the problem, 
7.3.1 Minimum Spanning Tree 
Given an undirected graph G = {V,A) with n 二 nodes, m = |A| arcs, and with 
a cost C i j associated with each arc ( i j ) G A, a spanning tree T of the graph G is a 
connected acyclic subgraph that spans all the nodes in G. The cost of the spanning 
tree is the sum of cost of the arcs in the spanning tree. The minimum spanning tree is 
the spanning tree with the smallest total cost. 
Two algorithms that solve the minimum spanning tree problem are Kruskal's al-
gorithm [45] and Prim's algorithm [70]. Both algorithms work with a greedy strategy 
when growing the spanning tree, in which one arc from a candidate list is added at each 
iteration. They differ in the way the candidate list is selected. 
With the closed semiring representation, the inference network can solve the mini-
mum spanning tree in a compact realization. In the following, an implementation using 
commercial available components will first be discussed. 
I CHAPTER 7. CLOSED SEMIRING OPTIMIZATION CIRCUITS 81 
An Inference Network Solution 
As noted in the closed semiring representation, if the inference network is to solve 
the minimum spanning tree problem, the site function needs to implement a maximizer 
operator while the unit function needs a minimizer operator, which is the same as that in 
an inference network for solving the shortest path problem. A possible implementation 
with standard-IC components will be discussed first. 
1. The Site 
At each site in a CE, a maximum is to be selected from the two site input sig-
nals. Figure 7.4 shows the proposed circuit. Operation of the circuit is similar 
StOk 
+ 、 I >vmIn 
J—Wv~VSA/ 
U2 _\ Y Uk x y 
i—vw-J n ^ i °2 、 I ^ 。《 
丄 D z t ^ SITE OUTPUT 1 — 
:IVW—'VNA/ 个 
- L 隱 ） \ I 
<iok SWM 110k 
L： I 術、 
- 5 V ‘ ' 
>V. 
Figure 7.4: A two-input maximizer with 
saturation guard Figure 7.5: A minimizer 
to the minimum finding circuit in the unit of the shortest path problem inference 
network. For op-amps with its input value, Vi, smaller than Vout, the correspond-
ing output Yi will be driven towards the negative rail. Yi will then be isolated 
from the node Vout because of the reverse biased diode D“ For op-amps with 
its input Vi larger than Voutl the forward biased diode closes the feedback loop 
and a voltage follower results, driving Vout at the value of the larger input. A s 
I CHAPTER 7. CLOSED SEMIRING OPTIMIZATION CIRCUITS  82 
are added to prevent the op-amp associated with the smaller input from going 
into saturation. Suppose Vi is the smaller input, when Yi is forced towards Fee, 
the forward-biased Dls brings the negative terminal of Ui to Vi/2 with Yx at 
一 VDF- This stops Yi falling further towards Fee. As a result, Ux will not 
saturate. 
2. The Unit 
The unit function, being a minimum operator, is required to find the minimum 
among the site outputs. This is the same as the unit function in solving the 
shortest path problem. The circuit is shown in Figure 7.5, it is similar to the unit 
in the shortest path problem inference network (Figure 3.9, but with the adders 
removed). 
Figure 7.7 shows the simulated response of the network, which corresponds to the 
problem defined on the graph in Figure 7.6. 
1.8 I 1 1 1 1 1 1 1 “ 1 I 
L—•、 C|| — 
0 1 2 3 4 5 6 7 8 9 10 
Time(ns) 
Figure 7.6: A minimum spanning tree in 
a 4-node graph Figure 7.7: Simulated network response 
With START driven active, the output of CEs that represent arcs in A are initialized 
to the corresponding arc costs. Output of the other CEs, which represent initially 
disconnected arcs, are initialized to an arbitrary value that is larger than all the known 
values. After initialization, the network begins converging to the solution by driving 
I CHAPTER 7. CLOSED SEMIRING OPTIMIZATION CIRCUITS 83 
START inactive at t = l//s. When the network settles at t « 6.5/xs, CEs having output 
values the same as the initial values indicate that the corresponding arcs are in the 
minimum spanning tree. If there are more than one minimum spanning tree in the 
graph, all CEs corresponding to arcs in the minimum spanning trees will have the 
output value the same as the initial value. The settling time of the network, similar 
to the previous citcuits, is determined by the large signal behavior of the op-amps. 
However, for the minimum spanning tree problem, the solution time can be made 
arbitrary fast since any output with its value changed is not in the minimum spanning 
tree. It would only be necessary to determine which CEs are having their outputs 
stable, which indicate that they would be in the minimum spanning tree. 
If the op-amps in the maximizer are allowed to go into saturation, they will cause 
errors in the network. This is because once the op-amps saturate, they need a long 
recovery time to go out of saturation when the corresponding input switches to be the 
maximum input. Figure 7.9 shows the erratic behavior when the simple maximizer 
in Figure 7.8 is used in the network. The final value of CE (2,4) should be equal to 
1.8 I 1 1 1 1 1 1 1 1 1 1 I I 1 I 1 1 1 1 1 I 
> — — ； ： \ 讓謹 
_ i :L \  
I ^ SITE OUTPUT 0 6 F \ \ \ j 
r :[\A—————VJ 
0 1 ~ 广 、 . ： •••_••, •鲁—i~I }f 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 
_5V Timo(Ms) 
Figure 7.8: A simple 2-input maximizer Figure 7.9: An erratic network response 
0.43V(corresponds to 10 in the graph). With the simple maximizer, the output of the 
site that finds the maximum of (2,3) and (3,4) fails to respond to the changes at (2,3). 
I CHAPTER 7. CLOSED SEMIRING OPTIMIZATION CIRCUITS S4 
Instead of settling at 0.43V，the site output tracks the value of (2,3) and causes error in 
the output of CE (2,4). When the op-amps are stopped from entering saturation, the 
response time of the maximizer improves but there are still errors during the time the 
network converges. This is observed in the dip of the output of CE (2,4) in Figure 7.7. 
A faster op-amp is needed in the maximizer circuit. Figure 7.10 shows the correct 
1.8 1 1 r r 1 1 ' 1 1 
: 。 、 jl^i 
i e
 \ (1-4 … " -
1.6 - \ (2,3 — 
\ (2,4) 
\ (3,4) 
1.4 - \ " 
\ \ 
1.2 - \ \ 
\ \ 
\ \ 
I 1 * \ \ “ 
^ \ \ 
3 \ \ s- '‘ \ -= 0.8 - \ 、. 
O \ lu \ \ 
O —— \ \ 
0.6 - \ \ \ ‘ 
\ \ \ •、.、 
0.4 - " 
0.2 \ “ 
n ^ • • ‘ • 1 1 ' 
U 0 1 2 3 4 5 6 7 8 9 10 
Time (us) 
Figure 7.10: Network response with fast op-amps in the maximizer 
network response when a fast op-amp is used in the maximizer circuits. 
7.3.2 VLSI Implementation 
This section describes the necessary VLSI building blocks in constructing an inference 
network in solving large scale minimum spanning tree problems. The VLSI building 
blocks provide a more compact inference network realization. Similar to other imple-
mentations of the binary relation inference network, there are two key function blocks, 
the site function and the unit function. Interconnection of CEs is the same for all infer-
ence networks and has been described in the shortest path problem inference networks. 
Since the minimum spanning tree problem is defined on undirected graphs, the number 
I CHAPTER 7. CLOSED SEMIRING OPTIMIZATION CIRCUITS 85 
of required CEs is N(N 一 l)/2 for an iV-node graph. 
The Site Function 
Figure 7.11 is a proposed circuit for the maximizer to be used in the site function. The 
operating principle of the circuit is similar to the one in Figure 7.4, with the op-amp 
replaced with the OTA. The output VMAX is given by 
AVI - VDF 
V M A X = A + l 
where A is the voltage gain of the OTA, VDF is the forward drop of the diode connected 
r T 
Vbias>—<|[ 
Vce> — ^ " 
v a > — — ^ r ^ h - J L 
I 1 t t I _ 1 >VMAX 
V c e > ^ ~ 4 -
Figure 7.11: Maximizer with OTA 
transistor D\. Vi is assumed the larger input in the above expression. 
The Unit Function 
The unit function used in solving the minimum spanning tree problem is the same as 
that used in solving the shortest path problem, finding the minimum among the site 
outputs. 
I CHAPTER 7. CLOSED SEMIRING OPTIMIZATION CIRCUITS 86 
Arcs in a Minimum Spanning Tree 
A comparator is required to check if an arc is a member of the minimum spanning tree. 
If the output value of the CE is equal to the initial value, the corresponding arc is in 
the minimum spanning tree. 
7 A Conclusion 
The closed semiring structure provides a systematic way in formulating a problem to be 
solved on the binary relation inference network. The extension operator © in a closed 
semiring corresponds to the site function while the summary operator © corresponds 
to the unit function. As an example and a further application of the binary relation 
inference network, the minimum spanning tree problem is solved with a closed semiring 
structure on the inference network. The parallel operating inference network provides 




8.1 Summary of Achievements 
The binary relation inference network provides a platform for implementing analog or 
digital processing networks in solving many constrained optimization problems. The 
network consists of self-contained computational elements interconnected in a regular 
pattern. In a digital processing network, the asynchronous operating ability of the infer-
ence network removes the need to keep all the computational elements in synchronous 
operation, which is a tough problem in the case of large networks. Continuous-time 
domain operation allows implementation in analog processing circuits, which usually 
provide more compact forms for computation realization. 
This thesis investigates the possibility of implementing the binary relation inference 
network solutions with practical electronic circuits. In each computational element, 
there are two function blocks that are vital to the operation of the network: the site 
and the unit. Operations of the site and unit are determined by the desired problem to 
be solved on the network. Although conceptual designs of some of the blocks have been 
87 
CHAPTER 8. CONCLUSIONS  88 
discussed previously, practical considerations require initialization circuits to be intro-
duced for proper network operation. In this thesis, three function blocks are designed: 
an adder, a magnitude preserving maximizer and a magnitude preserving minimizer. 
Operations of the function blocks have been verified with simulation and testing on pro-
totypes. A compact design has been achieved with VLSI circuits. With the designed 
blocks, the shortest path problem, transitive closure problem and minimum spanning 
tree problem have been solved on the resulting inference network. While the worst case 
solution time of the inference network is of 0(log2 N), which is network size dependent 
as with most of the previous solutions on these problems, the overlapped operations in 
the computational elements promise a size independence in practical situations. 
In general, for an iV-objects network, there are N2 computational elements in the 
network. And in each computational element, there are 2N outgoing links and 2N 
input links. It is because of this complexity involved, a compact realization of the 
computational elements is required for building large networks. However, despite the 
compactness of the computational elements, there would still be a limit on the number 
of computational elements that can be placed on a chip. The structure of the inference 
network does not allow simple cascading of small networks into larger networks. A 
scheme has been devised to allow cascading of networks, however, the computational 
elements in the network have to be designed for the maximum network size desired. 
In analog operating inference networks, the accuracy and dynamic range of the com-
putations depend on the device characteristics of the components used in the circuits. 
It would be • inferior to that of digital inference networks. However, analog inference 
networks usually provide a more compact realization. For applications that require in 
the range of around 8-bit resolution, the analog inference network provides a compact 
CHAPTER 8. CONCLUSIONS  89 
and fast problem solver. 
In embedded applications, the overhead in controlling an analog operating inference 
networks with a digital operating host computer limits the potential speedup that can 
be obtained from the inference network. A framework has been introduced to reduce 
the overhead for embedded applications of the inference network. The objective of the 
framework is to offload the housekeeping functions in operating the inference network 
from the host computer. Function blocks are defined to control the operation of the 
inference network with dedicated circuits, A necessary interface to a host computer has 
also be defined. 
The introduction of closed semirings provides a systematic way in building an infer-
ence network for the many path problems in directed or undirected graphs. Based on a 
dynamic programming algorithm, the solution process can be carried out on the binary 
relation inference network. The extension operator and summary operator in the closed 
semiring representation correspond to the site and unit functions in the binary relation 
inference network, respectively. Instead of finding the solution in a sequential way, the 
inference network finds the solution of the all-pair paths problem at the same time. The 
minimum spanning tree solution on the binary relation inference network is obtained 
from a closed semiring representation of the problem. 
8.2 Future Work 
8.2.1 VLSI Fabrication 
Though the VLSI circuits for implementing the inference networks have been designed, 
fabrication work is required to verify the network operation under practical situations. 
CHAPTER 8. CONCLUSIONS  90 
I 
In a previous attempt of a chip fabrication (see Figure 8.1)，the chips did not function 
as expected. One of the reasons in the failure is the error in the offchip analog output 
Figure 8.1: Photograph of the fabricated chip 
buffer designed in the fabricated chip. A buffer capable of positive and negative swing 
is desired, however a positive swing only output buffer was put in the fabricated chip. 
This disables the ability to observe the network behavior inside the chip. Further work 
is required in completing a VLSI inference network chip. 
With the use of voltage mode operation in the present design, errors in the circuits 
caused by device imperfections are minimized. However, the required operational am-
plifiers consume much chip area and power. A more promising approach is the use 
of current mode circuits [29, 5，98]. Although interfacing to real world signals would 
require additional voltage/current conversion circuit, the compact size achieved with 
current mode circuits would be more pronouncing in large networks. 
8.2.2 Network Robustness 
As there are many computational elements and interconnections in an inference net-
work, the behavior of the network under the presence of faulty computational elements 
or interconnections is worth considered. Depending on the failure mode, there may 
have some robustness in the inference network. In a shortest path inference network, 
CHAPTER 8. CONCLUSIONS  91 
if the output of a site in a CE is shorted to ground, it would cause a bias to have 
the shortest paths passing through it as it provides a minimum cost arc in the path. 
However, if the output is shorted to Vcc, it will not affect the solution provided that 
there are alternative paths in the network. 
Simulations study of the network behavior under faulty operations is desirable. 
8.2.3 Inference Network Applications 
Path problems like the shortest path problem, transitive closure problem, the mini-
mum spanning tree problem, etc., find their existence in many problem settings, it is 
interesting to observe how the inference network solutions would help in solving these 
many problem settings. The inverse shortest paths problem, an embedded application 
of the shortest path problem inference network, has been reported in Chapter 6. Other 
possible applications that would benefit from the inference network solutions include 
the maximum flow problem, minimax path problem, etc. 
8.2.4 Architecture for the Bellman-Ford Algorithm 
The closed semiring provides an elegant representation for the all-pairs path problems. 
For an AT-node problem, it requires N2 computational elements in an inference network. 
Alternatively, the Bellman-Ford algorithm [7] may be used to implement an inference 
network to solve single-destination path problems. A potential advantage is the reduced 
number of CEs required in the network, resulting in a much simplified design for the 
chip. This approach requires N chips to solve an all-pairs problem, but there would be 
only N CEs on a chip, a design much simpler than that in solving the all-pairs problem 
in a single chip. 
Bibliography 
[1] Alfred V. Aho, John E. Hopcroft, and Jeffrey D. Ullman. The Design and Analysis 
of Computer Algorithms. Addison-Wesley，1974. 
[2] Alfred V. Aho, John E. Hopcroft, and Jeffrey D. Ullman. Data Structures and 
Algorithms. Addison-Wesley, Reading, Mass., 1983. 
[3] P.E. Allen and Douglas R. Holdberg. CMOS Analog Circuit Design. Holt, Rine-
hart and Winston : New York, 1987. 
[4] K. Asanovic, N. Morgan, and J. Wawrzynek. 'Using Simulations of Reduced 
Precision Arthmetic to Design a Neuromicroprocessor'. Journal of VLSI Signal 
Processing, 6:33-44, 1992. 
[5] I. Baturone, J. L. Huertas, A. Barrigan, and S. Sanchez-Solano. 'Current-mode 
Multiple-input Max Circuit', Electronics Letters, 30(9):678-680，1994. 
[6] BBN Laboratories. Butterfly Parallel Processor Overview, June 1986. 
[7] D. Bertsekas and R. Gallager. Data Networks, chapter 5. Prentice-Hall Interna-
tional, Inc., second edition, 1992. 
[8] T. Blank. 'The MasPar MP-1 Architecture'. Proceedings of IEEE Compcon Spring 
1990, February 1990. 
[9] B. Boser and E. Sackinger. 'An Analog Neural Network Processor with Pro-
grammable Network Topology'. Digest of technical papers - IEEE International 
Solid-State Circuits Conference, pages 184-185, 1991. 
[10] W. J. Bouknight, S. A. Denenberg, D. E. Mclntyre, J. M. Randall, A. H. Sameh, 
a n d D.L. Slotnick. The Illiac IV System'. Proceedings of the IEEE, pages 369-
388,4 1972. 
[11] J. A. Brzozowski, T. Gahlinger, and F. Mavaddat. 'Consistency and Satisfiability 
of Waveform Timing Specifications'. Networks, 21:91-107, 1991. 
[12] D. Burton and Ph.L. Toint. 'On an instance of the inverse shortest paths problem'. 
Mathematical Programming, 53:45-61, 1992. 
[13] Joongho Choi and Bing J. Sheu. 'A High-Precision VLSI Winner-Take-All Cir-
cuit for Self-Organizing Neural Networks'. IEEE Journal of Solid-State Circuits, 
28(5):576-583, 1993. 
92 
BIBLIOGRAPHY  93 
[14] L. O. Chua and L. Yang. 'Cellular Neural Networks: Theory'. IEEE Trans. 
Circuits Syst, 35:1257-1272, 1988. 
[15] Andrzej Cichocki and Rolf Unbehauen. 'Neural Networks for Solving Systems of 
Linear Equations - Part II: Minimax and Least Absolute Value Problems'. IEEE 
Transactions on Circuits and Systems-II: Analog and Digital Signal Processing, 
39(9):619—633，1992. 
[16] Thomas H. Cormen, Charles E. Leiserson, and Ronald L. Rivest. Introduction to 
Algorithms. The MIT Press, 1990. 
[ 1 7 ] G. B. Dantzig. 'All Shortest Routes in a Graph'. Technical Report 66-3，Opera-
tions Research House, Stanford University, 1966. 
[18] David Gillespie and John Lazzaro. Chipmunk User's manual, 1990. 
[19] Department of Electrical Engineering and Computer Sciences, University of Cal-
ifornia, Berkeley, Ca. SPICES Version 3f User's Manual, 1992. 
[20] Digital Equipment Corporation. DECChip 21064-AA RISC Microprocessor Pre-
liminary Data Sheet, 1992. 
[21] Digital Equipment Corporation. DECmpp Sx Architecture Specification, 1993. 
[22] Digital Equipment Corporation. DECmpp Sx System Overview Manual, 1993. 
[23] E. W. Dijkstra. 'A Note on Two Problems in Connexion with Graphs'. Nu-
merische Mathematik i , pages 269-271, 1959. 
[24] R. Dommguez-Castro, A. Rodriguez-Vazquez, J.L. Huertas, and E. Sanchez-
Sinencio. 'Analog Neural Programmable Optimizers in CMOS VLSI Technolo-
gies'. IEEE Journal of Solid-State Circuits, 2T(7):1110-1116, 1992. 
[25] Moritoshi Yasunaga et al. 'A Self-Learning Neural Network Composed of 1152 
Digital Neurons in Wafer-Scale LSIs'. Proc. Int. Joint Con}. Neural Networks, 
111:1844-1849, 1991. 
[26] B. A. Farley, A. H. Land, and J. D. Murechland. 'The Cascade Algorithm for 
Finding All Shortest Distances in a Directed Graph'. Management Science, 14:19-
28, 1967. 
[27] R. W. Floyd. 'Algorithm 97 - Shortest Path'. Communications of the ACM, 
5:345, 1962. 
[28] M. J. Flynn. 'Very High-Speed Computing Systems'. Proceedings of the IEEE, 
54:901-909, December 1966. 
[29] Barrie Gilbert. 'Current-Mode Circuits From a Translinear Viewpoint: A Tuto-
rial'. In C. Toumazou, F.J. Lidgey, and D.G. Haigh, editors, Analogue IC Design 
：the Current-Mode Approach, pages 11-91. Peter Peregrinus, 1991. 
[30] D. Goldfarb and A. Idnani. 'A numerically stable dual method for solving strictly 
convex quadratic programs'. Mathematical Programming, 27:1-33, 1983. 
BIBLIOGRAPHY  94 
[31] H.P. Graf, E. Sackinger, and L.D. Jackel. 'Recent Developments of Electronic 
Neural Nets in North America'. Journal of VLSI Signal Processing, pages 19-31, 
1993. 
[32] Roubik Gregorian and Gabor C. Temes. Analog MOS Integrated Circuits for 
Signal Processing. Wiley, 1986. 
[33] Y. He and E. Sanchez-Sinencio. 'MIN-NET Winner-Take-All CMOS Implemen-
tation'. Electronics Letters, 29(14):1237-1239, 1993. 
[ 3 4 ] W. D. Hills. The Connection Machine. MIT Press, 1985. 
[35] B. Hoeneisen and C.A. Mead. 'Fundamental Limitations in Micro-electronics-I. 
MOS Technology'. Solid-State Electronics, 15:819-829, 1972. 
[36] J. J. Hopfield. 'Neurons with Graded Response Have Collective Computational 
Properties Like Those of Two-state Neurons'. In Proc. Natl. Acad. ScL, volume 81， 
pages 3088-3092, 1984. 
[37] T. Horiuchi, J. Lazzaro, A. Moore, and C. Koch. 'A Delay-line Based Motions 
Detection Chip，. In R. P. Lippmann, J. E. Moody, and D. S. Touretzky, edi-
tors, Neural Information Processing Systems, volume 3, pages 406-412. Morgan 
Kaufmann, 1991. 
[38] Intel Corporation. Paragon XP/S Product Overview, 1991. 
[39] Albert S. Jackson. Analog Computation. McGraw-Hill, I960. 
[40] G.E. Johnson and M.A. Townsend. 'An Acceptable-Point algorithm for Design 
Optimization'. In R.C Johnson, editor, Optimum Design of Mechanical Elements, 
pages 488-507. Wiley-Interscience, 1980. 
[ 4 1 ] L. G. Johnson and S. M. S. Jalaleddine. 'MOS Implementation of Winner-Take-
All Network with Application to Content-Addressable Memory'. Electronics Let-
ters, 27(11) :957-958, 1991. 
[42] M. P. Kennedy and L. O. Chua. /Neural Networks for Nonlinear Programming'. 
IEEE Transactions on Circuits and Systems, 35:554-5??, 1988. 
[43] H. Kobayashi, J. L. White, and A. A. Abidi. 'An Active Resistor Network for 
Gaussian Filtering of Images'. IEEE Journal of Solid-State Circuits, 26:738-748, 
1991. 
[44] Patrick S.L. Koh. On Computing Shortest Paths from known Shortest Paths. 
Technical Report AR-92-03, City Polytechnic of Hong Kong, 1992. 
[ 4 5 ] j . B. Kruskal. 'On the Shortest Spanning Subtree of a Graph and the Travelling 
Salesman Problem'. Proceedings of the American Mathematical Society, 7:48-50, 
1956. 
[46] S. Y. Kung and S. C. Lo. 'A Spiral Systolic Architecture/Algorithm for Tran-
sitive Closure Problems'. IEEE International Conference on Computer De-
sign,ICCD7 85, pages 622—626，1985. 
BIBLIOGRAPHY  95 
[ 4 7 ] K.P. Lam. 'A Continuous-time inference network for minimum cost path prob-
lems'. In Proc. of the IEEE/INNS International Joint Conference on Neural 
Networks, volume 1, pages 367-372，Seattle, 1991. 
[48] K.P. Lam and C.J. Su. 'On a Binary Relation Inference Network'. In Proc. of 
the IEEE 5th International Parallel Processing Symposium, pages 250—255, 1991. 
[ 4 9 ] Hans-Werner Lang. 'Transitive Closure on an Instruction Systolic Array'. In-
ternational Conference on Systolic Arrays : proceedings, May 25-27, 1988, pages 
295-304, 1988. 
[50] J. Lazzaro： 'A Silicon Model of an Auditory Neural Representation of Spectral 
Shape'. IEEE Journal of Solid-State Circuits, 26:772-777, 1991. 
[51] J. Lazzaro, R. Ryckebusch, M. A. Mahowald, and C. A. Mead. 'Winner-Take-
All networks of O(N) complexity'. In D. Touretzky, editor, Advances in Neural 
Information Processing Systems, volume 1, pages 703-711. Morgan Kaufmann, 
1989. 
[52] Bang W. Lee and Bing J. Sheu. 'Design and Analysis of Analog VLSI Neural 
Networks'. In Bart Kosko, editor, Neural Networks for Signal Processing, pages 
229-286. Prentice-Hall, 1992. 
[53] T Lengauer. Combinatorial Algorithms for Integrated Circuit Layout. Wiley, 1990. 
[ 5 4 ] A. Lumsdaine, J, L. Wyatt, and I. M. Elfadel. 'Nonlinear Analog Networks for 
Image Smoothing and Segmentation'. Journal of VLSI Signal Processing, 3:53—68, 
1991. 
[55] Robert N. Mayo, Michael H. Arnold, Walter S. Scott, Don Stark, and Gordon T. 
Hamachi. '1990 DECWRL/Livermore Magic Release'. WRL Research Report 
90/7, Digital Western Research Laboratory, 1990. 
[56] C. A. Mead, X. Arreguit, and J. Lazzaro. 'Analog VLSI Model of Binaural 
Hearing'. IEEE Transactions on Neural networks, 2:230-236, 1991. 
[ 5 7 ] Carver Mead. Analog VLSI and Neural Systems. Addison Wesley, 1989. 
[58] Carver Mead and Mohammed Ismail. Analog VLSI Implementation of Neural 
Systems. Kluwer Academic Publishers, Boston, 1989. 
[59] R. W. Means. ‘A New Two-dimensional Systolic Array for Image Processing and 
Neural Network Applications'. Proc. Int. Joint Conf. Neural Networks, II:A—925, 
1991. 
[60] R. W. Means and L. Lisenbee. 'Extensible Linear Floating Point SIMD Neuro-
computer Array Processor'. Proc. Int. Joint Conf. Neural Networks, 1:587-592, 
1991. 
[61] MicroSim Corporation. The Design Center: Analysis Reference Manual, 1992. 
[62] MicroSim Corporation. The Design Center: Analysis User's Guide, 1992. 
BIBLIOGRAPHY  96 
[63] A. Moffat and T. Takaoka. 'An All Pairs Shortest Path Algorithm with Expected 
Time 0(n2 log n)\ SIAM Journal on Computing, 6:1023-1031, 1987. 
[64] T. Morishita, Y. Tamura, and T. Otsuki. 'A BiCMOS Analog Neural Network 
with Dynamically Updated Weights'. In Proceedings of IEEE ISSCCT90, pages 
142-143, 1990. 
[65] A. F. Murray, D. Del Corso, and L. Tarassenko. 'Pulse-stream VLSI Neural 
Networks Mixing Analog and Digital Techniques'. IEEE Transactions on Neural 
Networks, 2:193-204, 1991. 
[66] A. F. Murray and A. V. W. Smith. 'Asynchronous VLSI Neural Networks Using 
Pulse-stream Arithmetics'. IEEE Journal of Solid-State Circuits, 23:688-697, 
1988. 
[67] National Semiconductor Corporation. Linear databook, 1982. 
[68] F. J . Nunez and M? Valero. 'A Block Algorithm for the Algebraic Path Problem 
and its Execution on a Systolic Array'. International Conference on Systolic 
Arrays : Proceedings, May 25-27, 1988, pages 265-274, 1988. 
[69] William H. Press. Numerical recipes in C: the art of scientific computing. Cam-
bridge, 1988. 
[70] R. C. Prim. 'Shortest Connection Networks and some Generalizations'. Bell 
System Technical Journal, 36:1389-1401, 1957. 
[71] J. Rattner. 'Concurrent Processing: A New Direction in Scientific Computing'. 
AFIPS Proceedings, 54, 1985. 
[72] Y. Robert and D. Tryst ram. 'Systolic Solution of the Algebraic Path Problem'. 
In W . Moore, A. McCabe, and R. Urquhart, editors, Systolic Arrays: Paper 
Presented at the First International Workshop on Systolic Arrays, Oxford, 2-4 
July 1986, pages 171—180. Adam Hilger, 1986. 
[73] G. Rote. 'A Systolic Array Algorithm for the Algebraic Path Problem (Shortest 
Paths; Matrix Inversion)'. Computing, 34:191-219. 
[74] David E. Rumelhart and James L. McClelland. Parallel distributed processing 
；explorations in the microstructure of cognition, volume 1. MIT Press, Cam-
bridge,Mass., 1986. 
[75] E. Sackinger, B. E. Boser, J. Bromley, Y, LeCun, and L. D. Jackel. 'Application 
of the Anna Neural Network Chip to High-speed Character Recognition'. In 
Proceedings of International Joint Conference on Neural Networks, pages 498— 
505, 1992. 
[76] Mamoru Sasaki, Takahiro Inoue, Yuji Shirai, and Fumio Ueno. 'Fuzzy Multiple-
Input Maximum and Minimum Circuits in Current Mode and Their Analyses Us-
ing Bounded-Difference Equations'. IEEE Transactions on Computers, 39(6):768-
774, 1990. 
BIBLIOGRAPHY  97 
[77] S. Satyanarayana, Y. Tsividis, and H. P. Graf. 'A Reconfigurable VLSI Neural 
Network'. IEEE Journal of Solid-State Circuits, 27:67-81, 1992. 
[78] E. Seevinck. Analysis and Synthesis of Translinear Integrated Circuits. Amster-
dam :Elsevier, 1988. 
[ 7 9 ] Charles L. Seitz. 'Concurrent Architectures'. In Robert Suaya and Graham 
Birtwistle, editors, VLSI and Parallel Computation, pages 1-84. Morgan Kauf-
mann, 1990. 
[80] Sequent Computer Systems, Inc. Balance 8000 System Technical Summary, 1985. 
[81] Sequent Computer Systems, Inc. Symmetry Technical Reference Manual, 1991. 
[82] P. M. Spira. 'A New Algorithm for Finding All Shortest Paths in a Graph of 
Positive Arcs in Average Time 0(n2log2 n)\ SI AM Journal on Computing, 2:28— 
32， 1973. 
[83] J. A. Starzyk and X. Fang. 'CMOS Current Mode Winner-Take-All Circuit with 
both Excitatory and Inhibitory Feedback'. Electronics Letters, 29(10):908-910, 
1993. 
[84] C. J. Su and K. P. Lam. 'Systolic Mapping of Inference Network for the All-Pair 
Shortest Path Problem'. In Proc. IEEE/INNS International Joint Conference on 
Neural Networks, pages 917-922, Singapore, November 1991. 
[85] Crystal Jinghua Su. A Binary Relation Inference Network For Constrained Op-
timization. PhD thesis, University of British Columbia, 1992. 
[86] R. Tamassia and J. S. Vitter. 'Optimal Parallel Algorithms for Transitive Closure 
a n d Point Location in Planar Structures'. Proc. of ACM Symposium on Parallel 
Algorithms and Architectures, pages 399-408, 1989. 
[87] D. A. Tank and J. J. Hopfield. ‘Simple "Neural" Optimization Networks: An A/D 
Converter, Signal Decision Circuit, and a Linear Programming Circuit'. IEEE 
Transactions on Circuits and Systems, CAS-33(5):533-541, 1986. 
[88] Thinking Machines Corporation. The Connection Machine CM-5 Technical Sum-
mary, 10 1991. 
[89] C. W. Tong and K. P. Lam. 'Analog Implementations of a Binary Relation Infer-
ence Network for Minimum Cost Path Problems'. In Proc. IEEE International 
Conference on Neural Networks, pages 1081-1085, San Francisco, 1993. 
[90] C. W. Tong and K. P. Lam. 'An Embedded Connectionist Approach for the In-
verse Shortest Paths Problem'. To appear in Proc. IEEE International Conference 
on Neural Networks, Orlando, 1994. 
[91] C. W. Tong and K, P. Lam. 'Closed Semiring Optimization Circuits for Par-
allel and Distributed Computation'. Submitted to International Conference on 
Parallel and Distributed Systems, 1994. 
BIBLIOGRAPHY  98 
[92] C. W . Tong and K. P. Lam. 'VLSI Implementation of Binary Relation Inference 
Network in Solving Shortest Path Problems'. To appear in Proc. IEEE Interna-
tional Conference on Neural Networks, Orlando, 1994. 
[ 9 3 ] A. A. Toptsis. 'Parallel Transitive Closure Computation in Highly Scalable Mul-
tiprocessors'. In Advances in Computing and Information-ICCI ,91 : Interna-
tional Conference on Computing and Information : Proceedings, pages 197—206. 
Springer-Verlag, 1991. 
[ 9 4 ] Arthur Trew and Greg Wilson. Past，Present, Parallel : A Survey of Available 
Parallel Computer Systems. Springer-Verlag, 1991. 
[95] John Van Zandt. Parallel Processing in Information Systems : With Examples 
and Cases. Wiley, 1992. 
[96] B. F. Wang and G. H. Chen. 'Constant Time Algorithms for the Transitive Clo-
sure and Some Related Graph Problems on Processor Array with Reconfigurable 
Bus Systems'. IEEE Transactions on Parallel and Distributed Systems, pages 
500-507, 1991. 
[ 9 7 ] j . Wang. 'Electronic Realization of Recurrent Neural Network for Solving Simul-
taneous Linear Equations'. Electronics Letters, 28(5):493-495, 1992. 
[98] Z. Wang. 'Current-Mode CMOS Integrated Circuits for Analog Computation and 
Signal Processing: A Tutorial'. Analog Integrated Circuits and Signal Processing, 
1:287-295,1991. 
[99] T. P. Washburne, M. M. Okamura, D. F. Speche, and W. A. Fisher. 'The Lockheed 
Probabilistic Neural Network Processor'. Proc. Int. Joint Con}. Neural Networks, 
1:513-518, 1991. 
[100] Neil H. E Weste and Kamran Eshraghian. Principles of CMOS VLSI design : a 
systems perspective. Addison-Wesley, Reading, Mass., 1993. 
[101] Kenny Wu. 'Analog Implementation of a Continuous-Time Inference Network for 
5 cities Shortest Path Problems'. Elec 475 Project Report, University of British 
Columbia, April 1991. 
[102] William Allan Wulf, Roy Levin, and Samuel P Harbison. HYDRA/C.mmp，An 
Experimental Computer System. McGraw-Hill, New York, 1981. 
[103] L. Yang and W. K. Chen. 'An Extension of the Revised Matrix Algorithm'. In 
IEEE Proceedings of International Symposium on Circuits and Systems, pages 
1996-1999, 1989. 
[104] J. Y. Yen. 'Finding the Lengths of All Shortest Paths in N-node Nonnegative-
distance Complete Networks Using |iV3 Additions and N3 Comparisons'. Journal 
of the Association for Computing Machinery, 19:423-424, 1972. 
[105] Shengwei Zhang and A. G. Constantinides. 'Lagrange Programming Neural Net-
works'. IEEE Transactions on Circuits and Systems-II: Analog and Digital Signal 
Processing, 39(7):441-452, 1992. 
K V； ；‘ ^ 
Appendix A 
Detailed Schematic 
A . l Schematic of the Inference Network Structures 
A.1.1 Unit with Self-Feedback 
Figure A. l is the schematic of the unit with the unit output OUT appears as an input in 
the unit function. A hold circuit is necessary to prevent the loop dropout from causing a 
decaying C ^ output as described in 3.1.2. U1A, U1B, U1C and U1D in Figure A. l form 
the holding circuit. The hold capacitor CI , with the help of U1A and D l , will track the 
negative peak, i.e., the minimum of signal at A, the multiplexer output, U1B is used 
to bootstrap the peak follower output in driving the multiplexer and this compensates 
for any dropouts in the feedback path. The negative input bias current at the op-amp 
inputs will charge the hold capacitor and a compensating circuit [67] is employed to 
reduce the effect of the leakage current. 
99 
APPENDIX A. DETAILED SCHEMATIC  100 
A.1.2 Unit with Self-Feedback Removed 
Figure A.2 is the unit with the feedback action of CE removed. The corresponding 
input at the unit function is replaced with the direct path cost I NIT. 
A.1.3 Unit with a Compact Minimizer 
Figure A.3 is thecircuit in which diodes are used in the minimizer, a compact realization 
results. 
A.1.4 Network Modules 
Figure A.4 shows the configuration of prototype inference network. CEs are built 
into modules plugged onto a network bus and Figure A.5 is a photograph showing a 
fabricated module, with the network in the background. 
A.2 Inference Network Interface Circuits 
Figure A.6 and Figure A.7 shows the necessary circuits in interfacing the analog oper-

































































































































































































































































































































































































































































































































































































































































































































































 ^ ^ ^ ^ ^
 ^ ^ 
— —
 s
 z 9 s t £ z u o
 〔 ) 

















 , ^ 






 . ^ A ^ u u p
 SN-at 




 O A / ^ a l
 Krer 
ED^















 v o o n




 _ — —
 nIAHM\T_
 囊
 „ J 
D p w
 « s
 h i ?
 ^
 / 
E p p A 
^ ― ^ ― ^ ― — — ^ ― • ^ ― ― ^








APPENDIX A. DETAILED SCHEMATIC  104 
O O O O O O O O Q O 
l | 
Figure A.4: Schematic showing connection of CE modules 
Figure A.5： A computational element assembled in a module 
5 0 1 
g
r o z p




 t ， 








 s g " s s >
 s s > 
T o l f
 < ^ T 8 P
 — ^ T l f
 < ^ T S r 
^ r ^
 ^ I c o p
 < ^ 1 3 ，
 O N
 。 N










 d i l j s d a i s
 d u l s d a s
 d 3 s d
 2 
_ _ — ^




^ 丨 s f
 l A ^ i ^
 < ^ _ s r





s ^ I Q
 ^ 1 勻
 1 
^ 1 5 ，
 C ^ I K
 ^ l l o - »
 ^ l M r
 I
 < ^ 1 1 ，
 t 
h e 
: > s 
I n ^ i
 T
s
i n ^ u
 9 I 0 >
 a x ^ ^ m ^ a u
 fee-
/ s ^ - i 。 ；
 n ^ x a .
 x a l ^ l
 ^ n ^
 7/nnJne 
N ^ x i
 X M ^ X i
 x a l A S
 x g l ^ l
 」 l l i ^ _ ^
 ^ 







 N ^ x i -
 x a l ^ l
 N a l ^ I
 x a l ^ l
 、 i p " A " J
 Qfe)ac 
x x i , N i i ,




 X ^ X A - .
 N a l ^ l
 M ^ l
 v x -
 ^ ^ A L i t e
 c 
竹
 x a l A T ?
 x a T A T ?
 X I A Z ,
 H A T - _ .
 _
 . u 
A&sdwvm 
3 M N a l ^ i
 i n ^ A - .
 A ^ ^ K ^ I A U t e
 ^ 
0 N ^ ^ x . ?
 /SBr-^lss
省 T Z ^ I X T J 
s
 x ^ x ?
 ^ s i
 N a l ^ s ,
 N a l ^ I .
 ^
 / n H
 «L6 
卫 X ^ X I
 ^ x i
 N ^ I ^ A - .
 x a l A I
 x a l ^ u
 A 
^
 x i A ,
 x a l ^ x ,
 x g l ^ n
 —re 
ILNal^i,
 N a n ^
 ^
 T n n J
 ^ 
^ n ^ x a -
 N a l ^ s
 ^^AT^AUfeig 
T/X^VAi
 h l ^ i
 H A T q
 H A T ^
 i A i
 X T J A O U r ^ K i
 F 
^






 ^ o ^ a t t q
 te 
•
 【 i o 】 N
 【 6 0 1
 s ^
 E i S S ®
 w o f
 x ^
 A ^
 ^ U r 6 v
 办V 
IXff; N < s E p p A 
06 
1 




 - g — j
 1
 1 — 






 J f 
t e » D
 g o - f e







t o f t o
 « n














 a o f c
 卜
 a 










_ _ — s r
 _ _ ―
T 
T
 ― < 丨 岩
 s o
 y l o l p J — —




 a j 貶
 ―
 l^i)a 
^ — s r
 ^ _ s - »
 ^ — 2 - 3
 ^ — a ，
 3
 . _ b 
o
 < ^ _ s l ，
 ^ I c l f
 ^ _ a 「 . I _ ^ I s .
 —rOT 
n r l L
 € o g o i r > 、 i c 
A
 +Aii













 J ^ ― ^ T I i A . 5
 , 
s






















 O T T i
 o t t i
 o t t i /
 o t t i
 0 7 T I





 f T a ^
 f T I
 I T I
 ？ T P
 5 T P
 /
 L^ r^it1 
• ^ z u
 o f t
 。 豹
 O T r
 O T r
 r j
 ^





 e f U
 l
 i t r 
4.XI""Kdl―
 i s s ― 【 l 。 ： 。 。 】 s
 s ^ .






 a . ^ .
 M
 a
 J . r 
D
 ^ - g f u
 < h
 a a i . s _ . 
N
 5 
E p p A 
Appendix B 
Circuit Simulation and Layout 
Tools 
A number of CAD tools have been employed in the development of the circuits presented 
in the previous chapters. 
B . l Circuit Simulation 
Some of the behavior of the standard-IC circuits are studied with simulation results from 
PSPICE [62, 61], in which models of commercially ICs are readily available. Following 
is a listing of the input file for a simulation of the inference network in solving a 5-node 
shortest path problem. 
Circuit for finding shortest path among 5 cities 
* Main circuit 
X2 1003 1008 1004 1009 1005 1010 start init_2 1002 node 
V2 init_2 0 1.00 
X3 1002 1008 1004 1014 1005 1015 start init.3 1003 node 
V3 init_3 0 0.98 
X4 1002 1009 1003 1014 1005 1020 start init一4 1004 node 
V4 lnit_4 0 0.14 
107 
APPENDIX B. CIRCUIT SIMULATION AND LAYOUT TOOLS  108 
X5 1002 1010 1003 1015 1004 1020 start init_5 1005 node 
V5 init_5 0 1.70 
X8 1002 1003 1009 1014 1010 1015 start init_8 1008 node 
V8 init_8 0 1.70 
X9 1002 1004 1008 1014 1010 1020 start init^9 1009 node 
V9 init_9 0 1.70 
X10 1002 1005 1008 1015 1009 1020 start init^lO 1010 node 
V10 init_10 0 0.31 
X14 1003 1004 1008 1009 1015 1020 start init一14 1014 node 
V14 init_14 0 0.52 
X15 1003 1005f1008 1010 1014 1020 start init_15 1015 node 
V15 init_15 0 1.70 
X20 1004 1005 1009 1010 1014 1015 start init_20 1020 node 
V20 init_20 0 0.55 
Vin start 0 pwl(0 0.0 lu 0.0 l.Olu 1.5) 
VCC $G_VCC 0 5 
VEE $G_VEE 0 -5 
* 
.subckt node 100 101 102 103 104 105 lstart linit lout 
X100 100 101 dout noil一inverting一adder 
X101 102 103 dout non_inverting一adder 
X102 104 105 dout non_inverting_adder 
X50 linit 199 $G_VCC $G_VEE 190 LM324 
X51 199 lout $G_VCC $G_VEE lout LM324 
Rll dout $G_VCC 10k 
R12 199 $G_VCC 10k 
SI dout 199 lstart 0 control 
Dll 199 190 D1N4148 
.ends 
* 
.subckt non_inverting_adder 201 202 2out 
R21 201 203 100k 
R22 202 203 100k 
R23 0 204 100k 
R24 204 2out 100k 
X21 203 204 $G_VCC $G.VEE 205 LM324 
D21 2out 205 D1N4148 
.ends 
* 
.model control vswitch 
.lib linear.lib 
.lib diode.lib 
.options nobias noecho 
.tran 1.0u 20u 
.probe/csdf v(1002) v(1003) v(1004) v(1005) 
APPENDIX B. CIRCUIT SIMULATION AND LAYOUT TOOLS 109 
+ v(1008) v(1009) v(1010) 




For VLSI circuits, SPICE [19] simulations are used to verify the operation of the 
proposed circuits. As an example, the input file for the minimizer is listed in the 
following; 
Ml min min node98 100 nmosl L=4UM W=6UM 
Vmc2 nodel node98 0 
M2 node2 site_2 node3 1 pmosl L=4UM W=6UM 
M3 node3 min nodel 1 pmosl L=4UM W=6UM 
M4 1 Vbias node3_b 1 pmosl L=4UM W=6UM 
Vmb node3 node3_b 0 
M5 100 node2 nodel 100 nmosl L=4UM W=6UM 
M6 node2 node2 100 100 nmosl L=4UM W=6UM 
M7 min min node99 100 nmosl L=4UM W=6UM 
Vmcl node4 node99 0 
M8 node5 site_l node6 1 pmosl L=4UM W=6UM 
M9 node6 min node4 1 pmosl L=4UM W=6UM 
M10 1 Vbias node6 1 pmosl L=4UM W=6UM 
Mil 100 node5 node4 100 nmosl L=4UM W=6UM 
M12 node5 node5 100 100 nmosl L=4UM W=6UM 
M16 min min node97 100 nmosl L=4UM W=6UM 
Vmc3 node8 node97 0 
M17 node9 site一3 node 10 1 pmosl" L=4UM W=6UM 
M18 nodelO min node8 1 pmosl L=4UM W=6UM 
M19 1 Vbias nodelO i pmosl L=4UM W=6UM 
M20 100 node9 node8 100 nmosl L=4UM W=6UM 
M21 node9 node9 100 100 nmosl L=4UM W=6UM 
M22 min min node96 100 nmosl L=4UM W=6UM 
Vmc4 node11 node96 0 
M23 node12 site_4 nodel3 1 pmosl L=4UM W=6UM 
M24 node13 min node11 1 pmosl L^4UM W=6UM 
M25 1 Vbias node13 1 pmosl L=4UM W=6UM 
M26 100 nodel2 node11 100 nmosl L=4UM W=6UM 
M27 node12 node12 100 100 nmosl L=4UM W=6UM 
M13 node7 99 0 100 nmosl L=4UM W=6UM 
APPENDIX B. CIRCUIT SIMULATION AND LAYOUT TOOLS 121 
M14 1 iiodeT node7 1 pmosl L=4UM W=6UM 
M15 node999 node7 1 1 pmosl L=4UM W=6UM 
Vmob min node999 0 
Vdd 1 0 5 
Vee 100 0 -1 
Vinl site_l 0 0 sin(1.0 0.5 1000) 
Vin2 site_2 0 0 sin(1.5 1 1000 0.75ms) 
Vin3 site_3 0 0 pulse(2 0.69 0.3ms 100ns 100ns 0.2ms 10ms) 
Vin4 site_4 0 0.7 
Vbias Vbias 0 4.12 
Vob 99 0 0.8 
.tran 0.05us 2ms 
.model nmosl nmos (level=2 ld=0.25u tox=400e-10 nsub=l.85e+16 vto=.80 
+uo=650 uexp=0.13 ucrit=7.0e+4 delta-1.5 vmax=5.0e+4 xj=0.2u neff=3.0 
+rsh=34 cgdo=2.25e-10 cgso=2.25e-10 cj=2.88e-04 mj=0.65 cjsw=4.70e-10 
+mjsw=0.3 pb=0.7) 
.model pmosl pmos (level=2 ld=0.25u tox=400e-10 nsub=6.0e+15 vto=-,80 
+uo=245 uexp=0.35 ucrit=9.0e+4 delta=1.0 vmax=3.0e+4 xj=0.1u neff=1.5 
+rsh=121 cgdo=2.15e-10 cgso=2.15e-10 cj=2.88e-04 mj=0.5 cjsw=4.00e-10 
+mjsw=0.3 pb=0.7) 
.END 
B.2 VLSI Circuit Design 
Besides SPICE, the Chipmunk tool set [18] has also been used in verifying the oper-
ation of the circuits. Chipmunk provides a quick turnaround time in verifying design 
concepts through its integrated environment. However, the capability in simulating 
analog circuits is limited to circuits with positive power supply only. 
Figure B.l shows a snapshot of a Chipmunk digital simulator session in verifying the 
functionality of an inference network for the transitive closure problem (section 7.1.2). 
APPENDIX B. CIRCUIT SIMULATION AND LAYOUT TOOLS 111 
II^^HI 
Figure B.l: A session with the Chipmunk digital simulator 
B.3 VLSI Circuit Layout 
The layout of the chip is done with MAGIC [55]. To ease the layout job, a modular-
ized approach is employed. Figure B.2 shows the module hierarchy in the layout of 




Computational Element I 
十 1 + 、 
Functional modules: adder, Minimizer, etc. 
I — 
Figure B.2: Module hierarchy in the layout of the chip 
ules in building the computational elements The CE modules are then used to 
build the network. Figure B.3 shows how site modules are stacked together to form a 
APPENDIX B. CIRCUIT SIMULATION AND LAYOUT TOOLS 112 
computational element. 




隱 ！ 费 ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ 夢 赞 總 總 二 ^ ^ ^ ^ ^ 您 ^ ^ ^ 總 德 ^ ^ ^ 
隱'…、，''’：' 
麗 < 2 _ 
I 翻 
I ®He1J 
圆 _=___•___丨:：丨哪^ “ ‘ j 
I ‘“：'咖二 ‘…。’’： ' ' ' 
I sttel^ O 
I ^ M ^ ^ ^ m l 
Figure B.3: A MAGIC session showing the stacking of site modules into a CE 





The conjugate gradient algorithm is one of the general descent methods that is moti-
vated by a first order Talyor series expansion of a function /(x) about a point The 
expansion is given by 
/ ( x 奸 i) = /(x 勹+g f c .A X f c (C.l) 
where gk is the gradient vector. 
Let 
xfc+1 = + Xksk (C.2) 
with Axfc 二 Xksk, sk being the search direction to be determined. From Eq. (C.l) and 
Eq. (C.2) 
r/(x fc+i) - /(xfc)i k k 
lim ^ — — ( 八 7 = g k . s k 
[ 入 k • 
This shows that if gfc • sk < 0, then 3Xk > 0 such that /(xfc+1) < /(xfc). Repeated 
113 
APPENDIX C. THE CONJUGATE-GRADIENT DESCENT ALGORITHM 114 
application of Eq. (C.l) will lead to the optimum /(x*) < /(x) in || x - x* ||< 5 for 
some 5 > 0. 
Two directions are conjugate respect to A if 
= o 




(gfc+1 - gfc) . gfc+1 
叫十1 - (gfc+i — g ” . 
From convergence proofs, the step length is given by 
localmin . f(xk + A s 勺 , A > 0 (C.4) 
An exact solution to Eq. (C.4) would involve excessive computations so 入 is deter-
mined by a golden section acceptable point search (GSAP). The GSAP search finds an 
acceptable point in the direction sk by a golden section search with an initial bracketing 
T 
search along the line originating at x . 
Appendix D 
Shortest Path Problem on 
MasPar 
The MasPar [22, 21] is a massively parallel system with a single instruction multiple 
data(SIMD) architecture. The processing elements(PE) in the MasPar are arranged in a 
two-dimensional toroidal-wrapped matrix. In a SIMD machine, all processing elements 
e x e c u te the same instruction stream on individual data. Each processing element can 
either execute or ignore an instruction based on a data-dependent condition code. Com-
munications among processing elements in the MasPar are provided via XNet routers 
and a global router. The XNet router allows communication with neighboring PEs, 
which can be performed on any number of PEs at the same time but the communica-
tion is Hmited along straight lines and each sending PE must send data from the same 
internal address to a neighbor in the same relative geographical position. The global 
router allows communication from one PE to any other PE. 
In a matrix configuration of the inference network, each computational element 
has to communicate with all CEs that lie in the same row or the same column. A direct 
115 
APPENDIX D. SHORTEST PATH PROBLEM ON MASPAR  116 
implementation of the inference network on the MasPar would involve routing across all 
PEs in the same row or column. Instead of a direct implementation, a systolic-like [84] 
mapping is used. This limits data to be routed to the nearest neighbors only and allows 
using the XNet communication capability in the MasPar. 
Each processing element in the MasPar corresponds to a computational element 
in the binary relation inference network. Shift chains are formed along the rows and 
columns of the PE matrix. The chains wrapped around at the border of the matrix. 
In each PE, 3 variables are defined, one is the output of the processing element, one 
is the value of the row shift chain and the last one holds the value in the column shift 
chain. At initialization, the output of each processing element is set to the known 
value of direct path cost if known, an arbitrary large value if unknown. Outputs of 
diagonal PEs are set to zero. The row and column shift chains are initialized as shown 
in Figure D.l . During each iteration step, the current output is compared with the 
~—— —©" d 
0 0,0 R 0,1 I R o J R 0.3 ^ / ^ J 
( ^ / 1 ( \ I ( N row shift chain 
_ _ ^ Q - < 3 / 
Q 1)0 Q 1,1 @ 1,2 ® 1，3 ( ^ ) V=h=(2n-x-y)%n 
t nr v I / —^ —^ ^ 
n = numer of nodes in graph 
[ v.yj x’y 
te r ^ ^ H " " 1 ® ! ^ 1 ^ ® ! ! \ \ 
,I * ^ - n / Processing Element 
M 2,0 M 2,1 M 2，2 U i l 2 3 column shift chain 
‘ J L_T_ v ^ 
、 ’ q ! F - * ® ] — 0 
0 3,0 0 3,! I B 3,2 j 3,3 
‘ ^ 
Figure D.l: The MasPar at initialization 
sum of the current values in the row and column shift chains. The output is updated 
APPENDIX D. SHORTEST PATH PROBLEM ON MASPAR  117 
if the sum is smaller than the current output. The chains are then right shifted and 
down shifted one element. This shift and compare operation is repeated N times, which 
I then completes one iteration step. At each shift and compare, the operation in each 
processing element is described by 
g{ij) 二 m i n (於, j ) ,补 ( i , j ) ) l < h j < N i + k,j + k 
【I When N loops are completed in each iteration step, the operations involved are the 
i same as that defined in the inference network. The iteration is stopped when there are 
i : no changes in the output of the PEs or log{N - 1) steps are finished, whichever comes 
first. The output value of each PE is the cost of the shortest path of corresponding 
vertex pair. The following is the algorithm steps. 
Initialize PE output 
Initialize row shift chain 
Initialize column shift chain 
do 
change=0 
for each PE 
if PE output>row+column 
PE output = row+column 
increment change 
Shift row chain right 
Shift column chain downward 












































 l l l s l ^ - - l l l l l l ^ . . . r
 *
 -
 - _ , -
 - I l l - . - l l a f c l ^ ^ . - s ^ g ^ - s g l - i i ^ 
_關圓__11 saLjeuqi-n >|HnD 
