An evaluation of the directed flow graph methodology by Rajala, S. A. & Snyder, W. E.
General Disclaimer 
One or more of the Following Statements may affect this Document 
 
 This document has been reproduced from the best copy furnished by the 
organizational source. It is being released in the interest of making available as 
much information as possible. 
 
 This document may contain data, which exceeds the sheet parameters. It was 
furnished in this condition by the organizational source and is the best copy 
available. 
 
 This document may contain tone-on-tone or color graphs, charts and/or pictures, 
which have been reproduced in black and white. 
 
 This document is paginated as submitted by the original source. 
 
 Portions of this document are not fully legible due to the historical nature of some 
of the material. However, it is the best reproduction available from the original 
submission. 
 
 
 
 
 
 
 
Produced by the NASA Center for Aerospace Information (CASI) 
https://ntrs.nasa.gov/search.jsp?R=19840018256 2020-03-20T22:05:00+00:00Z
AN EVALUATION OF THE DIRECTED FLOW GRAPH METHODOLOGY
By
Wesley E. Snyder
Sarah A. Rajala
cc-
V^°^1
`°	 oR^srra^t c^*r;
5 ^2	 Final Report	 S0i0R
to
National Aeronautics & Space Administration
Grant NAG 1-20
(NASA-CR-173593) AN EVALUATION OF THE
	 N84-26324
DIRECTED FLOW GRAPH METHODOLOGY Final
Report (North Carolina State Univ.) 61 p
HC A04/MF A01	 CSCL 09B	 Unclas
G3/61 19444
Department of Computer and Electrical Engineering
North Carolina State University
Box 7911
Raleigh, NC	 27695-7911
May 1984
l
Table of Contents
I. Introdution	 1
2. Design of the Image Labeling System
	 2
2.1
	 Algorithm Description
	 3
2.2
	 Circuit Description
	 6
2.2.1 CAM1 Chip QR/ ►
 t .	 , 	 6
2.2.2 CAM2 Chip
	
&UMIL
	
i;''" s	 7r'- •.!2.2.3 System Description
	 ^OtOR 	 g
3. DGM Description of System
	 12
3.1
	 Description of DGM
	 133.2
	 Using DGM To Construct A Data Flow Graph
	 133.3
	 Modeling the Region Labeling System Using DGM
	 15
4. Evaluation
" f
5. Conclusion.
	 16
^„i s ^ tea,,. .:Sa mss... , _ -- --^..^--• - -
I. Introduction
The purpose of this project was to evaluate the applicability cf the
Directed Graph Methodology (DGM) to the design and analysis of special purpose
image and signal processing hardware. 	 To this end, a special purpose image
processing system was designed and described using DGM. 	 The design, suitable
for VLSI, implements an innovative region labeling technique. 	 The utility of
DGM was evaluated using thie design.
Two chips were designed, both using NMOS technology, as well as a
functional system utilizing those things to perform real-time region
labeling. The system was described in terms of DGM primitives.
As a result of this effort, it was concluded that DGM, as it is currently
implemented, is inappropriate for describing synchronous, tightly coupled,
special purpose systems. Instead,.the nature of the DGM formalism lends
itself much more readily to modeling of networks of general-purpose proces-
sors. Section 2 of this report describes the image labeling system, including
the two custom chips which were designed.
Section 3 provides an overview of DGM, and then shows how the special
purpose design may be described using DGM.
Section 4 describes and justifies the conclusion that DGM is inappro-
priate for describing special purpose signal processing systems.
Details are contained in the appendices.
22. Design of the Image Labeling System
DGM was evaluated in the design of a hardware system for region labeling.
The purpose of this circuit is to partition an image into a set of meaningful
regions, and to do so "on the fly" with a single pass over the data. 	 These
partitioned regions are composed of all pixels that have similar attributes
and have a four-neighborhood connectivity.
One technique for assigning pixels to regions is known as "region
growing." The region growing technique is initiated by choosing a pixel which
meets some criteria (e.g. grey level above threshold) for inclusion in a
region.	 The algorithm then proceeds by examining all adjacent neighbors of
the pixel and comparing that pixel with the neighbor in question.
	
Typical
measures of similarity include the magnitude of the neight:oring pixel's grey
level or the relative contrast between the pixel and its neighbor under
consideration for inclusion in the region. 	 This process is repeated recur-
sively for all ne:yly accepted pixels until no new pixels can be added to the
region.	 Since the region -growing techni ^ue always results in closed regions,
this technique is often preferable to other techniques which are based on edge
detection or line fitting.
The algorithm for region labeling incorporated into the system architec-
ture described in this report differs from traditional region growing in `.hat
it performs the assignment of pixels in a sequential, raster-scan fashion
rather than using a recursion.	 For this reason, it is potentially orders of
magnitude faster than recursive region growing. It is a technique based on
the concept of equivalence relationships between pixels of the image.	 The
regions are labeled in a single pass over the image by utilizing a
content-addressable memory.	 Appendix 1 provides the theoretical foundation
for the algorithm described herein.
32.1 Algorithm Description
Two pixels a and b are defined to be equivalent (designated R(a,b)) if
they belong to the same region of an image. This relationship can be shown to
be reflexive (R(a,a)), symmetric (R(a,b)=>R(b,a)) and transitive (R(a,b) AND
R(b,c)=>R(a,c)).
The transitive property enables all pixels in a region to be detemined by
considering only local adjacency properties. In this algorithm, each pixel
will be compared with each adjacent pixel in a left-to-right, top-to-bottom
raster scan fashion. Pixels in a simple binary image are labeled in
raster scan order.
The system in this report assigns labels to pixels maintained in a table
of equivalence relationships. Figure 2 shows that this hardware resides
between the image memory and a host computer.
If two pixels meet some criterion, in the case of a binary image, both
pixels are at logic 1, and they are adjacent, then they are in the same
region. By definition, if'two pixels are in the same region, the R(a,b)
holds.
That is
ADJACENT (<x,y>, <x',y'>) AII(x,y)-I(x',y')I<T<=>R(<x,y>,<x',y'>).
The transitive property of R cannot be used to infer
R (< x , y>, <x',y'>)=>II(x,y)-I(x',y1)1<T
without also considering the adjacency property.
As the region partitioning proceeds in real-time (i.e. synchronously with
the raster scan), two activities must be performed. First, the M memory must
be loaded with the region label number of each pixel under consideration, and
second, the CAM memory must be updated with all equivalence relationships
discovered. For example, if region 4 is actually identical to region 2, then
4both CAM(2) and CAM(4) will contain 2 (the lower numbered region label takes
precedence. Hence, when the host computer interrogates pixel (x,y) of the M
memory, the interface/processor interprets M(x,y) in terms of the CAM memory
and returns CAM(M(x,y)) to the computer. Whenever an equivalence relationship
is detected, all locations in the K memory containing the larger region label
number are loaded with the smaller region label number. 	 While the execution
of this step in real time is not within the capabilities of conventional
random access memories, it is within the capability of the content-addressable
memories.
The architecture used to implement the algorithm is shown in figure I.
The architecture contains four major components: 	 image (1), region label
memory (M), equivalence CAM memory, and an interface/processor. The region
labels assigned to individual pixels are contained in the region label memory.
However, the contents of the M memory also include all intermediate region
labels for which equivalence labels were determined.
The M memory is a conventional random access memory. 	 However, the
equivalence memory has two modes of operation.	 It may be used as a
conventional RAM where the address in corresponds to the region table, and
data out is the equivalent table. 	 In the associative memory mode, it is used
to update that table. 	 In this mode, two activities occur in synchronism with
a 2-phase clock:
Phase 1--all memory cells whose contents match the contents of the data
bus, set their corresponding enable flip-flops. (see figure 5)
Phase 2--all memory cells whose enable flip-flops are set, read the
contents of the data bus.
This operation effectively updates the equivalence table in parallel
during the scan.
-A
:t
5Thus when two regions are found to be identical (step 6 below), all
locations in the CAM memory containing the larger region number are changed to
the smaller region label number, thus allowing regions to be grown in a single
pass over the image.
Algorithm: Region Growing
C - current pixel
N - previous pixel to C on current scan line
A - pixel from previous scan line which is "topographically" above
current pixel
P - previous pixel to A on previous scan line
I	 II	 P	 I	 A	 Ii	 I	 II	 I	 II	 N	 I	 C	 II	 I	 I
Square template for region growing
Let the initial label number, K=1
Scan the image from left to right and top to bottom. f(i) refers to the
image brightness at point-i. In this description only binary-valued images
are considered. The extension to grey-valued images is straightforward.
1. If f(C) = 0	 Text Pixel Layout
then label (C) = 0	 comment:
	
X X
X 0
else
begin
2. If f(N)	 = f(C) = 1 and f(P) = f(A)	 = x
then label (C)	 = label	 (N) comment: X 0
1 1
3. If f(P)	 = f(A)	 = f(N)	 = f(C) =1
then label (C)	 = label
	
(N) comment: 1 1
1 1
4. If f(A)	 = f(C)=1,	 and f(P)	 = x,	 and f(M) = 0
then label (C) = label
	
(A) comment: X 1
X 0
5. If f(C)	 = 1 and f(A)	 = f(N)	 = 0 and f(P) = x
then label (C) = K;	 CAM(K) = K comment: X 0
0 1
T+
K = K+1	 ; A new region
6. If f(C) = f(",) = f(N) = 1 and f(P) = 0
then
7. If label (A) < label (N)
then
label (C) = label (A)
CAM(N) = CAM(A) (update)
Else
label (C) = label (N)
CAM(A) = CAM(N) update)
Continue till finished
6
comment: 0 1
1 1
END
2.2 Circuit Description
2.2.1 CAM1 Chip
This content-addressable memory contains the equivalencies between re-
gions and has .two modes of operation. In the first mode, it behaves like a
conventional RAM and is used in this mode when a new region is encountered.
The first pixel in a new region cannot be equivalent to any other region.
Therefore, each cell in the CAM is initialized to contain its own address.
This is illustrated in step 5 of the algorithm. CAM(i) refers to the contents
of address i in the CAM. Thus, initially, CAM(i)=i.
In the associative memory mode, the CAM updates the equivalencies. When
the chip is in this mode, two functions occur in synchronism. The word to be
updated is placed on the data bus of the CAM. All memory cells whose contents
match this word set their flip-flops. Next, the replace word is placed on the
data bus and all memory cells whose flip-flops were set are now changed to the
replace word. This operation has now merged all regions which were found to be
equivalent. An individual cell in the CAM may be found at different times to
be equivalent to many different regions and be updated several times as a
result.
As
	
---^U+
7
Figure 2 shows a block diagram of the CAM 1 chip, illustrating the use of the
common data bus and enable flip flop. Appendix 2 contains a complete
description of the CAM 1 chip, as well as simulation and performance analysis
results.
2.2.2 CAM2 Chip
The purpose of the CAM2 chip is to update the current scan line when an
equivalence is found; as a result, this will eliminate the time consuming read
to CAM1.
In steps 6 and 7 of the algorithm, an equivalence between two regions is
found. Here, CAM1 has to be told, for instance, that region 3 is equivalent to
region J.. That is, at cell 3 in the CAM1, a data 1 needs to be written. Also,
before the next pixel can be interrogated, M memory will be written with the
smallest of these two labels. ( In this case, a 1 is written into M.)
If all pixels on the current scan line that have been labeled as region 3
have not already been changed to region label 1, a read to CAM1 will be
necessary to find out if region 3 is equivalent to any other region. Instead
of having to read cell 3 of CAM1 (a slow process), CAM2 was designed to change
all region lables that were labeled as a 3 to region label 1 on the current
scan line. The CAM2 chip needs only to hold one raster scan line of labeled
regions to perform this function. Figure 3 shows a bock diagram of the CAM2
chip, and figure 4 shows the circuit layout.
The CAM2 chip (Figure 3), consists of eight input pins called VL n , and
two more sets of eight input pins called Replace and Compare. The chip has an
output port called VLA and three control lines, latch, replace, and VLA
enable. The chip behaves as a regular shift registor except when it is given a
replace control signal.
8When the replace signal is high, every word on the previous Raster Scan
line is bit by bit compared with the eight bit compare register. Every word
which is "true" to this compare operation will at the trailing edge of p+At be
replaced by the contents of the replace register. If the replace control line
was not high the words are not clocked again. The inputs replace and compare
are not latched by the CAM2 package and are assumed to be valid throughout the
duration of the replace command.
2.2.3 System Description
The form pixels (binary valued) to be tested by the hardware are defined
as follows:
Previous Line
	
P	 A
Current Line	 N	 C
C - Current pixel [ any pixel to the right of C is currently undefined ]
N - Previous pixel to C on current scan line
A - pixel from previous scan line which is "topographically" above current
pixel
P - previous pixel to A on previous scan line
The following six test conditions satisfy all possible logical combina-
tions for a four neighbor connectivity and serve as appropriate control
signals.
C	 CNA	 ACNP	 CAN	 ACN	 ACNP
Only one condition will be true at any given pixel evaluation.
Refer to figure 5 for the system block diagram.
X X
Case 1:	 X 0
Whenever the current pixel isn't a logical 1, that pixel is to be
unconditionally labeled as a zero.
9Bus Connections for Case 1:
'_ Zeros are placed on the data bus, VL N , and VLN-1•
2. Latch signals are sent to VLN, VL N-1, and a write signal is sent
to M-memory.
3. The address counter to M-memory is incremented.
X 0
Case 2:	 CNA	 1 1
This condition arises when the current and previous pixel are at logic 1.
The current pixel is to be labeled identically as the previous pixel.
Bus Connections for Case 2:
1. The contents of VLn is gated onto VL N
 and the data bus.
2. A latch signal is sent to VLn and a w: 'e signal is sent to M-memory.
3. The address counter to fit-memory is incremented.
1 1
Case 3:	 ACNP	 1 1
Here, all four of the test pixels are at logic I. 	 The current pixel is
to be labeled identically as the previous pixel.
Bus Connections for Case 3:
Same as for CNA.
CAN X '_
Case 4:	 0 .
Here the current pixel and the above pixel are at logic I. Current pixel
is to be labeled identically to its above pixel.
01
LC
10
replace bus
8
Figure 3: Organization of CAM2
11
Bus Connections for Case 4:
1. Wait for VLa to propagate through CAM 2 package.
2. The contents of VL A is gated to the data bus, VLN-1, and VLN.
3. A latch signal is sent to VLN-1, VL N , and a write signal is sent to
M-memory.
4. The address counter for H-memory is incremented.
Case 5:
The current pixel is at
This condition shows the appe;
is to be incremented and the
label counter.
ACN	 X 0
0 1
logic 1, but none of its test pixels are true.
arence of a new label region.	 The label counter
current pixel is labeled from the incremented
Bus Connections for Case 5:
1. The lapel counter is gated onto the data bus, CAM buses, VL1-1, and
VLN,
2. A write signal is sent to M-memory and to the CAM.
3. The address counter to the 11-memory is incremented.
Case 6:	 ACNP	 0 1
11
The current, previous, and above pixels are at logic 1, while the
previous pixel to A is at logic 0.	 The contents of VLA contain the above
label and VLN-1 holds the previous label. 	 These two laLils are compared and
the current pixel is labeled from the smallest of the two. 	 The CAM and the
CAM 2 chip are updated accordingly.
12
Bus Connections for Case 6:
1. Wait for VLA to propagate through CAM 2 package.
2. The contents of VLA is gated onto the comparator inputs and latched
for future access.
3. The contents of VLN-1 is gated onto the comparator.
4. The comparator is enabled.
When VLA < VLN-1
a. The contents of VL N-1 is gated onto the CAM 2 compare inputs, and onto
the CAA address bus.
b. The contents of VLA is gated onto the CAM 2 replace inputs.
c. A replace signal is sent to the CAM 2 package and a union signal to
the CAM.
d. After one CAM delay, the contents of VLA is gated to CAM data inputs.
e. .'LA is placed onto the data bus and latch signals are sent to VLN-1,
VLN
 and a write signal is sent to M-memory.
When VLA > VLN-1
a. The contents of VLX.is
 gated onto the CAM 2 compare inputs, and onto
the CAM address bus.
b. The contents of VLN-1 is gated onto the CAM 2 replace inputs.
c. A replace signal is sent to the CAM 2 package and a union signal to
the CAM.
d. After one CAM delay, the contents of VLN -1 is gated to CAM data
inputs.
e. VLN-1 is placed onto the data bus and latch signals are sent to VLN
and a write signal is sent to M-memory.
3. DGM Description of System
In this section, an overview of DGM is provided, followed by a
description of this system in DGM format, anI a discussion of the effective-
ness of the representation.
AjW
13
3.1 Description of DGM
The DGM software, as supplied, consists of two parts:
	 a directed graph
editor (DGMED) and an ADA package library manager (DGMLM).	 Both are written
in VAX (VMS) Pascal.
DGM is intended to be a hierarchical system design and analysis tool.
	 A
system is represented as a directed graph. 	 Each vertex in the graph
represents a system function and arcs designate data flows between vertices.
Arcs have attributes such as produce, consume, threshold and capacity.
	 These
attributes are related to the amount of data at a node input that must be
present before a node can fire, and to the amount of data that is produced and
consumed when a node does fire.
Vertex functions are implemented by ADA packages assigned to the vertices
from a library of packages. 	 A set of processor assignments can be specified
for each package as an aid in mapping the flow graph onto an architecture.
The methodology supports a top down design strategy. A design is refined
by expanding higher level nodes into more detailed subgraphs until the desired
level of refinement is reached. Each node in the graph has an ADA package
assigned which performs the node function. 	 The use of flow graphs at all
levels of the hierarchy provides a uniform, consistent representation of the
system and can provide a convenient mechanism for moving up and down the
hierarchy.
3.2 Usinq DGM To Construct A Data Flow Graph
The process of constructing a flow graph begins by using DGMLM, the
library manager, to enter the ADA package definitions of vertex functions into
the package library.
	 DGMLM maintains a library of functions, so only new
functions need to be entered.
14
Information required for a package is its name and the specification of
its inputs, outputs and data types. Produce, consume and threshold attributes
can also be specified for each package. 	 Only ADA package header information
is kept by the library manager. The actual code bodies would be included when
the graph description was compiled.
DGMLM itself is a menu driven program which allows for addition,
deletion, modification and display of package definitions. 	 The most serious
shortcoming of DGMLM is that althou gh a list of packages currcntly in the
library is available, it is difficult to tell what function a particular
package performs.	 The package name and inputs and output data descriptions
are available, but there is no provision for a text description of what the
package does.	 Clearly a package name can provide some indication of function
as can knowledge of the inputs and outputs, but this is not sufficient.
	
A
text description capability would be.a useful addition.
This makes the use of package definitions already in the library very
difficult, and requires the antry of new definitions and much external
bookkeeping to keep track of what eac) package does for each new flow graph.
The next step is the entry of the graph description using DGMED. 	 DGMED, also
a menu driven program, allows for the creation and modification of flow
graphs.	 Vertex name and function definitions are entered as well as the
connectivity and attribute information provided by the arcs.
	
ADA package
assignments are also made to each node.
The major shortcoming of DGMED is its lack of a graphic data entry and
poor display capability.	 While the menu driven approach is simple to use, it
makes verification of the correct construction of a flow graph difficult.
Verification must be done by examining a text description of the graph and
comparing it to a mental picture or a hand drawn prototype.
	 The graphic
display capability provided is very primitive and not very useful.
DGMED also makes it difficult to maintain more than one graph at a time
in the same directory.
	 The creation of a new graph destroys the old graph,
since the same files are used for the graph description. 	 To maintain
different graphs requires renaming files or moving files to another directory
and starting over. This must be done by the user.
3.2 Modeling The Region Labeling System Using DGM
A data flow graph of the system is shcwn in figure 6 and a block diagram
is shown in figure 5. 	 Appendix 3 contains a tabular summary of the circuit
flow graph.	 Appendix 4 contains the ADA package definitions and Appendix 5
contains a description of the graph in DGM notation.
4. Evaluation
The basic thrust of DGM, that of representing a system as a data flow
graph, has significant potential as a design tool.
	 However, the utility of a
design aid is directly related to the information that can be extracted from
the design representation.	 The DGM software, as it exists at NCSU, is
primarily for the entry and maintenance of data flow graphs and the package
library. Few graph analysis tools currently exist.
The ability to obtain information from the graph at all levels of the
hierarchy is important.	 This information can be then used to analyze and
improve the design.	 The information required can change at different stages
of the design.
In the initial stages of a design, functional correctness will be
important.	 Later stages may put the emphasis on other considerations such as
performance.	 These differing requirements mandate a variety of analysis
tools.
16
The ability to assign ADA packages to graph nodes and the existence of
graph control variables implies that some type of functional simulator is
planned, but it is currently not availabe. This capability would be very
useful in establishing functional correctness of a design and for generating
test data.
DGM, as it currently stands, seems to be primarily concerned with
software system design. 	 Suppport for ADA software packages and processor
assignments is provided, as is the ability to create new data types. In
addition, data flow graphs are inherently asynchronous, while hardware systems
are usually considered to be synchronous.
In the early stages of a hardware system design, a functional simulation
based on software function modules could be useful. However, at some point in
the design, this is no longer adequate. Hardware notions such as clocks,
registers and propagation delays are probably better represented in a hardware
description language and simulator than in a general purpose language such as
ADA.	 Thus the ability to assign both hardware and software function modules
to graph nodes would be an important addition to DGM.
5. Conclusion
Our basic conclusion is that DGM has the potential to be a valuable
design tool for both hardware and software system design. Flow graphs can
provide a convenient and useful representation of a system hierarchy.
However, the asynchronous nature of data flow graphs does not well model
tightly coupled, synchronous hardware systems.
r
The ultimate utility of any design aid depends on the information it
provides the designer.	 In the case of DGM, this requires the further
development of tools which can extract such information from the flow graph
representation.
AD
1OT
s	 r
17
A similar design system, based on many of the ideas of DGM, is under
development at the Research Triangle Institute in North Carolina. This system 	 1
has a color graphics data input and display, and a variety of analysis tools.
These include a dynamic graph simulator, an analyzer based on a Petri net
model of a graph and a hardware description language interface.
"WOW !
Video
Image
Memory
(I)
Host
Computer	 ----
Region
Inter-	 Label
face	 Memory
Processor I	 I	 (Mi
Equiva-
lence
Memory
(K)
Figure 1
Ae
e2
0 P, n, I tv."
'OF P'j
data
bus
Figure 2: Organization of the K Memory
ice.
ORIGINAL PALM:  13
OF POOR QUALITY
1 ^Iil Il}I 1, li Vii}
	
lli? illi if}i li,l li''i'a i' l	 i	 l; f}}I i;li 1 94 liil'!^;
'Ii li^i
^}^ 1 I'^ li}! i }li ,^11 lil Iii I II ;??^ ' J'	 ^ p ^^ 1
1
i
,
4^ ie;4
!I' I 'i I. ^^
--7—
I
i i I
70.E
• 114
i ^ I	 I !
,
K
1:
^
it-
-
7
-1--
z
I
i I
V
^
- -
IT
1.
of	 .
f_ n in
i
s
., u ti E;ZI
• Y .. i,
ill
1
^
_ -,..	 • ^. fir♦ i ^`I• 'd' ^+^,.^- _	 .ins.
t!1
0L7
00
(L
a,
w
a
00
w
ORIGINAL PAGE 19
OF POOR QUALITY
.7777
Ai
Appendix I.
Content-Addressable Read/Write Memories for Image  Analysis
by
Wesley E. Snyder
Carla D. Savage
IEEE Transaction on Computer, October, 1982.
Appendix 2
Design of a Content-Addressable RAM
by
Robert Tyszcenko
1. Introduction
This chapter has twc major components: 1) a decoder and 2) a memory cell
with attached logic.	 These two components have been designed and, to some
extent, tested.	 Figure 1 shows what one word of memory looks like at its
highest level.
The three major operations consist of two that are fairly straightfor-
ward, the Read do Write of a memory cell. The third, Union, requires the extra
logic in the "smart memory."	 Because of the variety of operations being
performed, a 4-phase clock is used, rather than pipelining. 	 Before an
operation begins, the previous operation is completely over.
To complete the chip, some logic and pass transistors need to be designed
to regulate the flow of data & addresses from pads to their destination. 	 In
particular, the fact that input and output is done with the same pad and
drivers causes a problem on and between the Read & Union operations. 	 A
solution is proposed later in this report.
The basic operation of the circuit is best understood by reading the
"Timing Conventions" data, and the "Mixed Notation" illustration in con-
junction with the following explanation.
Since this circuit uses mostly nor logic, inputs to indicate a Read,
Write, or Union, are active when low. 	 Note also that the decoder which
selects a given data word requires two phases for operation. 	 For a Read or
d
Write, a memory location is specified by the decoder. 	 Dropping the appropri-
ate cintrol (Read,Write) line completes the operation. The Union operation is
t
not done with decoder assistance.	 It occurs because a "flag" was set (by xor
logic) to indicate that one or more memory locations match a data registers
i
S	 contents. All cells that have their "flag" sec will be rewritten with the new
y
date placed in the date register on m2.
,,r -A
r+
A2-3
In what follows, in a filename such as xor.ab, the .ab tells ABCD that
the file contains ABCD text.	 Wires are frequently labelled with something
like:	 wire-N at the top and: 	 wire —8 at the bottom. This facilitates
simulation because qrs assumes that they are one node. Labels are required
whenever s wire at the periphery of a cell is to connect to another cell or to
a wire outside of the present cell.
2. Description of Cells
2.1 mcell.ab (fig. 6)
This is the memory itself. This design was chosen because of the simple
refresh control, performed by clocking a pass transister on ml, and the
requirement that both the true and complement form of the memory cell be
available at all times.
Notice that reading is controlled by ren a/ren w. 	 The signal on this
line is generated by a read finable logic cell called rencell.ab. 	 Writing to
memory is more complicated since it can occur as: 1) a simple RAM writes ') a
Union operation write. 	 Writing is controlled by a signal on union a/union w
from uenable.ab) or by a signal on ram a/ram w (from wencell.ab).
2.2 xor.ab (fig. 7)
Performs the xor function.	 If the contents of memory match the contents
on h--j data bus then xor out will go to Vss.	 Note that the pulldowns (pd.)
appear to form two legs--one to the left and the other to the right of the
pullup (pu.). Since at most one leg will have a path to Vss:
	
pu. 1	 4
	
w	 1 => small devices
	
R	 2
	
pd. w	 2
and pass transistors are avoided.
Nd
L
C
N
SQ 'CC7Q
O
F
V
V
2O
HQHO2
DW
X
Z
N
u
N
URiGINAL PAGE 19
OF POOR QUALITY
•o
L^
v
	
u
A2-4
2.3 pulldn . ab (fig. 8)
This cell is essential for the Union operation. 	 The wire labelled
pwr w/pwr a is precharged on 1.	 Assume the contents of memory match the
contents of a data register to which it is compared. The cell xor.ab does the
compare.	 Since the two are equal, xin n is at V as , and pwr w/pwr_e stays
high.	 This is the "flag" that indicates that a write should occur for this
memory cell on 3.	 The logic to generate the enable signal is in uenable.ab.
The cell otl . ab is affected too.
2.4 slice.ab (fig. 9)
The constituents of this cell are 1) moell.ab; 2) xor.ab and 3)
pulldn.ab.
2.5 connect.ab (fig. 10)
This cell is composed simply of wires. 	 The following wires come from
off-chip: 1) penable_n/penable s
to otl.ab
2) gndenab s/gndenab_N
3) Vss n/Vss a	 to mcell.ab
4) Mbar n/Mbar s	 to uenable.ab
5) Wbar n/Woar s	 to cencell.ab
6) Rbar s/Rbar N	 to rencell.ab
The following wires are generated on chip: (actually the signals on them
are generated on-chip)
renable w/renable s - from rencell . ab to mcell.ab
aenable w/uenable s - from uenable . ab to mcell.ab
wenable w/wenable s - from wencell . ab to mcell.ab
^i
A2-S
2.6 ctl.ab (fig. 11)
This cell is used during Union operations. During tl, the upper pass
transistor is on which charges the wire labelled pwr w/pwr e. 	 The charge is
stored on an inverter attached to pwr a and resides in uenable.ab. 	 The lower
pass transistor is off and means that the charge remains even if the previous
state of pulldn.ab would have allowed it to discharge. 	 After the output of
xor.ab settles (by t2 hopefully) the lower pass transistor is turned on by $2.
If the memory cell (all 10 bits) differs from the data that it was compared
to, pwr w/pwr a and the gate in uenable.ab will discharge.
2.7 rencell.ab (fig. 12) and wencell.ab (fig. 13)
Both cells perform the nor function. ►loth are used when operating in the
RAM mode. Both share an active low input from the decoder. 	 Either
Wbar n/Wber s or Rbar n/Rber s can go to V ss if their respective operations
(Write,Read) are being performed. 	 They should not both be low at the same
time.
	
Their outputs enable the Read or Write by activating pass transistors
in mcell.ab.
2.8 uenable.ab (fig. 14)
Basically an inverter and a nor gate. If the inverter has a low input
this implies that a mismatch between the memory cell and the date register
occurred causing xor.ab to output a high signal which discharged pulldn.ab and
the gate of this inverter.	 Despite the fact that Mbar n/Mbar s may be at Vas
(for Union operation) nothing will happen. 	 Similar reasonsing will reveal
that the Union operation will occur if the memory contents match the data
register contents.
MA2-6
2.9 Decoder: in general
The decoder was designed such that it dissipates no static power, which
justifies its larger size.
This decoder can have 256 outputs and yet be built with little more than
a proper arrangement of:
1) dec00.ab
2) decOl.ab
3) decll.ab	 and
4) decout.ab attached to provide the outputs.
For example, let us look at how to arrive at the arrangement in figure 3.
We want 4 outputs.
Count in binary:	 0 0
0 1
1 0
1 1
This is eesily extended (but tedious).
I allow for 10 inputs even though lo92 256 seem sufficient because the 2
high order bits can, effectively, act as chip select inputs. (Recall that 4
chips each with 256 locations are expectred in the final configuration)
3. Timing Conventions
To write:
^1: Latch data. Latch address to decoder. Refresh memory.
2: Let decoder select a word.
3: Drop Write control line.
4: Raise Write control line.
b.
3i
_` ^"_, _ LL . ^^r► ^^ ^ mfr ^1 ..^r.^ __ ^.
A2-7
To Read:
^1: Latch address to decoder. Refresh memory. Precharge data lines if
desired by placing Vdd on I/P pads.
02: Let decoder select a word. Drop Read control line.
03: Latch 93/p to pads.
U: Raise Read control line.
To Union
�1: Precharge pulldn.ab. Refresh memory. Latch I/P data.
�2: Enable ground in pulldn.ab and otl.ab cells
03: Latch new data. Lower Mode control line.
4: Raise Mode control line.
4. Testing
4.1 Decoder Test: (figs. 20)
Dectest.ab (not capitalized) represents the decoder that was tested (fig.
2). As above, ats required that I create a file called decoid . ab.	 In either
case, what was tested could be called a low-going 1-of-4 decoder. Even though
the pu/pd ratio was about 2 instead of 4, a successful simulation is depcited
in figure 3.
For grs: the spicefile is : spfiledec
the clockfile is . clkfiledec
5. Pincount and Estimate of Transistor Count
Pins: - AO - A9 10
DO - D9 10
Vdd&Vss 2
4-phase clk 4
MODE 1
WRITE 1
READ 1
penable 1
genable +1
Read
ltitch output
latch I/P
mcell.ab, xor.ab, pulldn.ab	 14/slice=> 140/word
total control logic 	 + 13/word
153 wo d
152 * 256 =	 39,168
5,120
44,288
(Decoder, Cmos type)
Solution to rop bleu, posed in introduction
03 conflict occurs between action for Read and for Union
To Read: we need something like this:
Vdd
Vdd
To Union:
^3
Union	 —'I
v v	 v000vugacu
a
U	 `
O	 =
O
O	 '
s
i
OrI ^• ••, U CJ
1
M-- U C1
4
O......{ui
co
0--»Um
U-
M— U CJ
i
0) ^-1 -y U CJ 4
i
M--.-. U CJ	
{
I
ORIGINAL PAGE IS
OF POOR QUALITY
FIGURE 2
dectest o ^ -5 -117:34 I
mm
y
S
.0
v
m
m
m
m
CS)
N
CD
lD
m
m
m
In
m
CD M
W
C7
m
m
CD
ni
CD
m
a
m mo)	 oo
H	 ^..i H	 H H

iML 1•
!rM IwC"o
L 1«
=.04L Iw
CAML 1
PL 10340
Ln
W
CC
t7
LL
ORIGINAL PAGE t9
OF POOR QUALITY
ORIGINAL PAGE 11
OF POOR QUALITY
I
0•«r I e 	 ^ ^«• 1•
X — C 1
%D
WC
G7
W
CAOL Ic	 AFL In
li
Q	 ^
s I Ice
-^W
mce 1 1
xo r
pulld n
FIGURE 7
3AOL 1
=AOL
O:.0O L
0C'ObCOA I
010 0 0A- 0
1
U
Q^
C-
0
U
i s U
C OA-•
 Y IS
COA- 6 IO
OL In
OL 10	 00
W
OL 10
COim-6 1•
3WCOA 10
:0A-+N 10
ORIGINAL FHL^'^ ^^
OF POOR QUAL ITY
a
O	 4 y
V	 C C
>1	 ] L
e.e
q
ay
C t 1	 16: 4f_
ORIGINAL. PA:-.7. ig
OF POOR QUALITY
P	 q
e	 e
n	 n
a	 ab	 b
I	 I
e	 e
e	 e
i	 •
FIGURE	 9
Q.1
U
nC—
W
4 C Oa-+
OL IN
tC.a O L 1
O
W
C7
IL
ORIGIN L P"^' s 
gi
OF POOR QU A` '
1r
oil
a u4
V	 ^
4D
wc^
w
•L 10
ORIGINAL. FAZE J3
OF POOR QUALITY
r
ab
^	 j
z
3AOL i
s^
QJ
'Ui C-
ai
s 3
V C 4A-+
At IA
=.aec, 1
i
1
4 C QA-
W
C7
U.
a
ORI(OtNAL ^'APK? "I
OF POOR QUTA i :''
a
C
VQ
t
Appendix 3
Tabular Representation of a Data Flow Graph
Summary cf graph CAMCHIP
QUEUE THRESHOLD READ CONSUME CAPACITY PRODUCE DATA-TYPE INIT SOURCE SINK
L1 1 1 1 1 1 * F LABELC VLN
L2 1 1 1 1 1 * F LABELC CAM
L3 1 1 1 1 1 * F LABELC MEMM
L4 1 1 1 1 1 * F LABELC VLSI
Z1 1 1 1 1 1 * F ZERO MEMM
Z2 1 1 1 1 1 * F ZERO VLN
Z3 1 1 1 1 1 * F ZERO VLSI
Al 1 1 1 1 1 * F ADDCNT MEMM
V1 1 1 1 1 1 * F VLN CAM
V1 1 1 1 1 1 * F VLN COMPAR
V1 1 1 1 1 1 * F VLN VLSI
VL1 1 1 1 1 1 * F VLSI VLN
VL2 1 1 1 1 1 * F VLSI COMPAR
VL3 1 1 1 1 1 * F VLSI CAM
VLd 1 1 1 1 1 * F VLSI MEMM
MEMOU 1 1 1 1 1 * F MEMM
CAMOU 1 1 1 1 1 * F CAN
NODE PACKAGE 1ST PROCESSOR 2ND PROCESSOR EXCLUDES SHARE
LABELCNTR LABELCNTR 1 FALSE
COMPARE COMPARE 3 FALSE
VLSI VLSI 4 FALSE
ZERO ZERO 5 FALSE
VLN VLN 6 FALSE
ADDCNTR ADDCNTR 7 FALSE
MEMMEM MMEM 10 FALSE
CAM CAM VLB FALSE
End of graph CAMCHIP
F
aAppendix 4
ADA Package Defintions
i
;I
package LABELCNTR
	 is
procedure GO LABELCNTR
	
(
-- output queues in package
OUT—QUEUE-1: out array(1..1) of INTEGER	 ;
OUT
—
QUEUE
-
2: out arruy(1..1) of INTEGER 	 i
OUT—QUEUE-3: out array(1..1) of INTEGER 	 ;
OUT—QUEUE-4: out array(1..1) of INTEGER 	 ;
end LABELCNTR
package CAM
	 is
procedure GO CAM
-- input queues in package
IN
—
QUEUE
-
1:
	 in array(1..1) OF INTEGER
IN—QUEUE-2: in array(1..1) OF INTEGER
IN—QUEUE-3: in array(1..1) OF INTEGER
--output queues in package
OUT
—
QUEUE
-
1: out array(1..1) OF INTEGER
end CAM
package ADDCNTR	 is
procedure GO ADDCNTR
-- output queues in package
OUT—QUEUE-1: out array(1..1) OF INTEGER
end ADDCNTR
i
i
i
i
;package MMEM	 is
procedure GO MMEM
-- input queues in package
	
IN
—
QUEUE
-
1:	 in array(1..1) OF INTEGER
	
IN
—
QUEUE
-
2:	 in array(l..l) OF INTEGER
IN—QUEUE-3: in array(l..l) OF INTEGER
	
IN
—
QUEUE
-
4:	 in array(1..1) OF INTEGER
-- output queues in package
OUT
—
QUEUE
-
1: out array(l..l) OF INTEGER
end MMEM
	
package ZERO	 is
procedure GO ZERO
-- output queues in package
OUT
—
QUEUE
-
1: out array(l..l) of INTEGER
OUT
—
QUEUE
-
2: out array(1..1) of INTEGER
OUT—QUEUE-3: out array(l..l) of INTEGER
end ZERO
package VLSI	 is
procedure GO VLSI
-- input queues in package
	
IN
—
QUEUE
-
1:	 in array(l..l) OF INTEGER
	
IN
—
QUEUE
-
2:	 in arrfiy(1..1) OF INTEGER
IN—QUEUE-3: in array(l..l) OF INTEGER
--output queues in package
OUT—QUEUE-1: out array(l..l) OF INTEGER
;
a-- output queues in package
OUT—QUEUE-1: out array(1..1) of INTEGER 	 ;
OUT—QUEUE-2: out array(1..1) of INTEGER 	 ;
OUT—QUEUE-3: out array(l..l) of INTEGER 	 ;
OUT
—
QUEUE
-
4: out array(1..1) of INTEGER	 ;
end VLSI
package COMPARE	 is
procedure GO COMPARE
-- input queues in package
IN—QUEUE-1: in array(l..l) OF INTEGER	 ;
IN
—
QUEUE
-
2:	 in array(l..l) OF INTEGER	 ;
end COMPARE
package VLN	 is
procedure GO_VLN
-- input queues in package
IN—QUEUE-1: in array(l..l) OF INTEGER	 ;
IN
—
QUEUE
-
2:	 in array(1..1) OF INTEGER	 ;
IN
—
QUEUE
-
3:	 in array(l..l) OF INTEGER	 ;
--output queues in package
OUT
—
QUEUE
-
1: out array(1..1) OF INTEGER
OUT
—
QUEUE
-
1: out array U ..l) of INTEGER 	 ;
OUT
—
QUEUE
-
2: out array(l..l) of INTEGER	 ;
OUT
—
QUEUE
-
3: out array(1..1) of INTEGER 	 ;
end VLN
^J
A . pendix 5
Data Flow Graph in DGM Notation
graph CAMCHIP	 contains;
package LAdELCNTR	 has
output =
L1
threshold = 1
read	 = 1
consume = 1
capacity	 = 1
produce	 = 1
data
—
type = INTEGER
L2
threshold = 1
read = 1
consume
	 = 1
capacity	 = 1
produce	 = 1
date type = INTEGER
0
threshold = 1
read	 = 1
consume	 = 1
capacity = 1
produce	 = 1
data type = INTEGER
L4
thre^hold = 1
read	 = 1
consume	 = 1
capacity	 = 1
produce	 = 1
data type = INTEGER
package CAM	 has
input =
VO
threshold = 1
read = 1
consume = 1
capacity = 1
produce
	 = 1
data type = INTEGER
L2 threshold = 1
read = 1
consume = 1
capacity = 1
produce	 = 1
data type = INTEGER
Vl
threshold = 1
read = 1
consume = 1
capacity = 1
produce = 1
date type = INTEGER
output =
CAMOUT
threshold = 1
read	 = 1
consume	 = 1
capacity	 = 1
produce	 = 1
date—type = INTEGER
package PDCN-t
output =
threshold = 1
read = 1
consume = 1
capacity = 1
produce = 1
data type = INTEGER
package MEM	 has
i
input
A2
threshold = 1	 i.
read = 1
consume = 1
capacity = 1
produce = 1
data type = INTEGER
0
threshold = 1
read = 1
consume = 1
capacity = 1
produce = 1
data type = INTEGER
VL4
threshold r 1
read = 1
consume = 1
capacity = 1
produce = 1
data type = INTEGER
has
Al
•read = 1
consume = 1
capacity = 1
produce = 1
data type = INTEGER
Z1
threshold = 1
read = 1
consume	 = 1
capacity = 1
produce	 = 1
data type = INTEGER
output =
MEMOUNT
threshold = 1
read = 1
consume = 1
capacity = 1
produce	 = 1
data type = INTEGER
package ZERO	 has
output =
Z1
threshold = 1
read = 1
consume = 1
capacity = 1
produce = 1
data type = INTEGER
Z2
threshold = 1
read = 1
consume = 1
capacity = 1
produce = 1
data type = INTEGER
Z3
threshold = 1
t,..:.
package VLSI	 has
input =
V3
threshold = 1
read = 1
consume = 1
capacity = 1
produce = 1
data type = INTEGER
L4
threshold = 1
read = 1
consume = 1
capacity = 1
produce = 1
data type = INTEGER
D
threshold = 1
read = 1
consume = 1
capacity = 1
produce = 1
data type = INTEGER
output =
threshold = 1
read = 1
consume = 1
capacity = 1
produce = 1
data type = INTEGER
threshold = 1
read = 1
consume = 1
capacity = 1
produce = 1
data type = INTEGER
VU
VL2
VL3
VL4
threshold = 1
read = 1
consume = 1
capacity = 1
produce = 1
data type = INTEGER
package COMPARE	 has
input =
V1
threshold = 1
read = 1
consume = 1
capacity = 1
produce = 1
data type = INTEGER
VL2
threshold 1
read = 1
consume = 1
capacity = 1
produce = 1
data type = INTEGER
d
package V!_SN	 has
input =
L1
threshold = 1
read = 1
consume = 1
capacity = 1
produce = 1
data type = INTEGER
VU
threshold = 1
read = 1
consume = 1
capacity = 1
produce = 1
data type = INTEGER
Z2
threshold = 1
read = 1
consume = 1
capacity = 1
produce = 1
data type = INTEGER
output =
V3
threshold = 1
read
	
= 1
consume	 = 1
capacity	 = 1
produce	 = 1
data type = INTEGER
• ;rw4 ^► - rte+^ ^•.r^-	 r	 — -,
	 Ad
V2
threshold = 1
read = 1
co-isume = 1
capacity = 1
produce = 1
data type = INTEGER
queue L1 has type = DATP :=0
queue L2 has type = DATA :=0
queue 0 has type = DATA :=0
queue L4 has type = DATA :=0
queue Z1 has type = DATA :=0
queue Z2 has type = DATA :=0
queue Z3 has type = DATA :=0
queue Al has type = DATA :=0
queue V1 has type = DATA :=0
queue V2 has type = DATA :=0
queue V3 has type = DATA :=0
queue VU has type = DATA :=0
queue VL2 has type = DATA :=0
queue CL3 has type = DATA :=0
queue CL4 has type = DATA :=0
queue MEMOUT has type = DATA :=0
queue CAMOUT has type = DATA :=0
node LABELCNTR has package LABELCNTR with
processor = 1
priority = 1
sharable = FALSE
output = L1 ,	 L2 0 L4
JI
, VL2
with
L3
VL2
with
, Z2
with
VU
V2
with
, L2
with
, Al
with
, V1
	
, L3	 , L4
L4
	
VL3	 , VL4
	
, Z3	 L4
Z2
	
V3	 L4
	
L3	 L4
	
, Z1	 , VL4
, L2
.:	
v
has package COMPARE
processor = 3
sharable = FALSE
output	 = V2
has package VLSI
processor = 4
priority = FALSE
sharable = ZERO
output	 = VU
has package ZERO
processor = 5
sharable = FALSE
output
	 = Z1
has package VLN
processor = 6
priority = FALSE
sharable = L1
output	 = V1
has package ADDCNTR
processor = 7
sharable = FALSE
output
	
= Al
has package MMEM
processor = 10
sharable = FALSE
input
	 = L1
`-	 = MEMOUT
:age CAM
r = VL8
= FALSE
= VL3
= CAMOUT
node COMPARE
node VLSI
node ZERO
node VLN
node ADDCNTR
node MEMMEM
with
