Equivalence relations of synchronous schemes by Cirovic, Branislav




Equivalence Relations of Synchronous Schemes
BY BRANISLAV CIROVIC
A thesis submitted to the School of Graduate Studies in
partial fulfilment of the requirements for the degree of
Doctor of Philosophy
Department of Computer Science
Memorial University of Newfoundland
April 2000
St. John's Newfoundland
Abstract
Synchronous systems are single purpose multiprocessor computing machines, which
provide a realistic model of computation capturing the concepts of pipelining, par-
"llelism and interconnection. The synta."{ of a synchronous system is specified by
the synchronous scheme, which is in fact a directed, labelled, edgc--weighted multi·
g'dph. The vertices and their labels represent functional elements (the combinational
logic) with some operation symbols associated with them, wbile the edges represent
interconnations between functional elements. Each edge is ""eighted and tbe non-
negative integer weight (register count) is the number of registers (cloc:ked memory)
placed along the interconnection between t\\'O functional elements. These notions a1·
lowed the introduction of transformations useful for the design and optimization of
synchronous systems: retiming and slowdown.
Two synchronous systems are strongly equivalent if they have the salllc stepwise
b~havior under all intcrpretations. Retiming a functional element in a synchronous
S~'StCIli means shifting one layer of registers from one side of the functional element
to the other. Retiming equivalence is obtained as the reRexh-e and transitive closure
of this primitive retiming relation. Slowdoum is defined as follows: for any system
G = W, E, w) and any positive integer c. the c-.slow system cG = (\t', E, ttl) is the
oue obtained by multiplying all the register counts in G by c. Slowdoum equivalena:
is obtained as the symmetric and transitive closure of this primith'e slowdown re-
lation. Strong retiming equivalence is the join of two basic equivalence relations on
synchronous systems, namely strong equivalence and retiming equivalence. SlowdotlJ1l
retiming equivalence is the join of retiming and slowdown equivalence. Strong .slow-
d01JJ1l retiming equivalence is the join of strong, slowdown and retirning equivalence.
It is PrD\-ed that both slowdown retiming and strong slowdown retiming equivalence
of synchronous systems (schemes), are decidable.
According to (Leiserson and Saxe, 19833, 1983b I, synchrooous systems S and $f
are equivalent if for e\-ery sufficiently old configuration c of 5, there exists a con-
figuration c of S such that when S staned in configuration c and S started in
configuration C, the two systems exhibit tbe same behavior. It is proved that t....-o
synchronous systems (schemes) are Leiserson equivalent if and only if they are strong
retiming equivalent.
The semantics of synchronous systems can be specified by algebraic structures
called feedback throries. Synchronous schemes and feedback theories have been a.x-
iomatized equationally in [Bartha, 198il. Here we extend the existing set of a.xioms
in order to capture strong retiming equivalence.
One of the fundamental features of synchronous systems is synchrony, tbat is,
the computation of the network is synchronized by a global (basic) clock. Other.
slower clocks can be defined in tenos of boolean-valued Rows. In order to describe
the behavior of schemes with multiple regular clocks, we extend the existing algebra
of schemes to include multiclocked schemes. and study tbe general idea of Leiserson
equivalence in the framework of this algebra.
iii
Contents
Abstract
List of Figures and Tables
Acknowledgments
1 Introduction
2 Preliminaries
2.1 Systolic Arrays
2.2 The Structure of Systolic Arrays.
2.3 Synchronous Systems
2.-& Flowchart and Synchronous Schemes
ii
iv
vi
12
19
Equivalence Relations of Synchronous Schemes and
their Decision Problems 27
3.1 Slowdown Retiming Equivalence. . . . . . . . . . . . . . . . . . . . 27
3.2 Decidability of Slowdown Retiming Equivalcna! 29
3.3 Strong Slowdown Retiming Equivalence. 34
3..& Decidability of Strong Slowdown Retiming Equivalence . 35
4 Leiserson's Equivalence vs. Strong Retiming Equivalence 40
5 Retimiog Identities 50
5.1 The Algebra of Synchronous Schemes . . . . . . . . . . . .•••.
5.2 Equational Axiomatization of Synchronous Schemes .
6 The Algebra of Multic10clred Schemes
6.1 The LUSTRE Programming Language.
6.2 The Algebra or Schemes with Multiple Regular Clocks
Conclusion
References
Index
50
53
62
62
6i
80
82
65
List of Figures and Tables
Figure
2.1 :\Iesh-connected systolic arrays .
Page
2.2 Geometry for the inner product step processor.
2.3 :\Iultiplication of a \'eCtor by a band matri,< with p = 2 and q = 3
2.-1 The linearly connected systolic array for the matri'( vector multiplica·
tion problem in Figure 2.3 .
2.5 The first seven pulsations of the linear systolic array in Figure 2.-l .
2.6 The difference between ~'Ioore and Mealy automata .
2.7 Semi-systolic array and its communication graph. 10
2.8 (a) The communication graph G l of a synchronous system 51 15
(b) Retiming transformation. 15
2.9 Slowdown transformation. I j
2.10 The constraint graph G1 - 1 of a synchronous system SI in Figure
2.8{a)
2.11 The systolic system S~ obtained from the 2·slow synchronous system
S. by retiming .
2.12 Flowchan scheme .
2.13 The tree representation of a tenn
18
18
22
22
2.14 Unfolding the f10wchan scheme Z2
2.15 The difference bet...."e(!n flowchart and synchronous schemes. 24
2.16 Synchronous scheme S and its (infinite) unfolding tree T(S) 25
2.1; Unfolding as the strong behavior oLschemes 26
3.1 Example of two slowdown retiming equivalent schemes 32
3.2 Retiming equivalent schemes after appropriate slowdown transformat-
~ ~
3.3 The proof or Theorem 3.4.3 in a diagram . 36
iv
3.4 E:ICample or two strong slowdown tetiming equivalent schemes 37
3.5 Construction or product or ft(utr(S(» and fl(utr(~» 38
3.6 Scheme Ii as a product of /1(u/,(5,)) aDd /1(.1'(50)) . 38
3. j Schemes H l and H2 are slowdown retiming equivalent . 38
3.8 Schemes c\H\ and C2H2 are retiming equivalent 39
·tt E."<ample or two Leiserson equh-alent schemes. .&0
.&.2 Tmnslation or a tree by the finite state top-down tree transducer .&4
5.1 The interpretation or operations . 50
5.2 The interpretation or constants
5.3 17 E E(p, q) as an atomic scheme.
5..1 Examples or mappings lL'p(q) ....(n,p) and J#s
5.5 Retiming identity
5.6 Proor or identity R. in a diagram .
5. j Identit~· R. alone is not sufficient to capture the retiming equivalence
relation or synchronous schemes· counterexample .
5.8 Congruence 4l(R) as the retiming equivalell~ relation.
5.9 Proor or Theorem 5.2.4 in a diagram
5.10 The axiom C ror n =3, I =2 alld p = q = 1
6.1 Ground schemes belong to 6J.(E) .
Table
6.1 Boolean clocks and flows .
6.2 The "pretJioU3~ operator
6.3 The "followed by" operator.
6..& Sampling and interpolating.
6.5 The behavior or (V,2), (V'l,2) and (V3,2) during 6rst ele'fen pulsat4
ions
51
52
52
54
55
50
56
60
61
76
Page
63
66
66
67
74
Acknowledgements
First of all, I would like to express my deep and sincere gratitude to my supervisor,
Or. MiklOs Bartha for his willingness and determination to supervise my work for
more than three years, for his encouragement, e:tpertise, understanding and patience.
It was he who led me through the beautiful world of theoretical computer science and
taught me to express my ideas in an organized and precise language of mathematics.
r would like to thank the other members of my comittee, Dr. Krishnamurthy
Vidyasankar. Dr. Paul Gillard and Dr. Anthony Middleton for their assistance and
the services they provided.
I would also like to thank Dr. Nicolas Halbwachs from IMAG Research Lab,
Grenoble, who generously provided the LUSTRE compiler and related tools.
The Deparment of Computer Science at Memorial University of Newfoundland
has not only prO\,jded the .....arm and stimulative environment for my research but
has also gi,·en me the opportunity to teach several undergraduate courses, which 1
appreciate \'I~ry much.
I am grateful to Memorial Unh-ersity of Newfoundland Graduate School and the
Department of Computer Scienu at Memorial University of Newfoundland for pro-
,'iding me with the financial assistance. I rerognize that this research ""Quid not have
been possible without that support.
vi
1 Introduction
The increasing demands of speed and performance in modern signal and image pro-
cessing applications necessitate a revolutionary super-computing technology. Ac-
cording to [Kung, 1988], sequential systems will be inadequate for future real-time
processing systems and the additional computational capability available through
VLSI concurrent processors will become a necessity. [0 most real-time digital signal
processing applications. general-purpose parallel computers cannot offer satisfactory
processing speed due to severe system overheads. TherC£ore, special-purpose array
processors will become the only appealing alternative. SynchronQUJ systems arc such
lIIultiprocessor structures which provide a realistic model of computation capturing
the concepts of pipelining, parallelism and interconnection. Tne~' are single purpose
macnines wnich directly implement as low-cost hardware devices a wide variety of
algorithms, such as filtering, convolution, matrix operations, sorting etc.
The concept of a synchronous system [Leiserson and Saxe, 1983a] was derived
from the concept of a systo/ic system [Kung and Leiserson. 19781 which has turned
out to be one of the most attractive current concepts in massive paralle! computing.
In recent years many systolic systems have been designed, some of them manufac-
tured; transformation methodologies for the design and optimization of systolic sys-.
tems have been developed and yet tne rigorous mathematical foundation of a theory
of synchronous systems has been missing. Important equivalence relation.s of syn·
chronous systems such as Slowdown retiming and strong slowdown retiming still lack
decision algorithms. Some of the fundamental concepts, like Leiserson's definition of
equivalency of synchronous systems are still informal and operational.
:\. more sophisticated model of synchronous systems ....'as introduced in [Bartha.
1987]. In that model the graph of a synchronous system becomes a flowchart scheme
in the sense of [Elgot, 1975], with tbe only difference that all edges are weighted and
fe\'crsed. For this reason, such graphs are called 3ynchronoU3 schemes. With that
approach it becomes possible to study synchronous systems in an exact algebraic
(and/or category theoretical) framework, adopting the sophisticated techniques and
constructions developed for flowchart schemes and iteration theories.
This thesis addresses the problems stated above, which, to the best of our knowl·
edge, are unsolved so far. The thesis is organized as follows: in Chapter 2 we intro-
duce the concepts of systolic arrays, synchronous systems, flowchart and synchronous
schemes, and gi\'C a short summary of tbe most important definitions and results in
the field. In Chapter 3 we define the slowdown retiming and strong slowdown retiming
equivalence relations of synchronous systems and show that both relations arc decid-
able. In Chapter .& we compare Leiserson's definition of equi\'alency of synchronous
systems with strong retiming equivalence of synchronous schemes, and show them
to be identical. Chapter 5 deals with the equational axiomatization of synchronous
schemes. The goal is to define the retiming identity, which together with the feedback
tneary identities captures the strong retiming equivalence of synchronous schemes.
Finally, in Chapter 6 we introduce the generalized algebra of multiclocked schemes
which is intended to describe tne behavior of synchronous schemes with multiple
clocks motivated by the clock analysis of the Synchronous Dataflow Programming
Language LUSTRE.
2 Preliminaries
2.1 Systolic Arrays
In [Kung and Leiserson, 19781 the authors proposed multiprocessor structures called
systolic arrays (systems), which provide a realistic model of computation, capturing
the concepts of pipelining, parallelism and interconnection. The goal was to design
multiprOC6SOr machines which have simple and regular communication paths, employ
pij.)Clining and can be implemented directly as low-cost hardware devices. Systolic
systems are not general purpose computing machines. A systolic computing system is
a subsystem that perfomlS its computatKJns on behal£ of a hou which can be viewed
as a Turing-equivalent machine that provides input and recei~ output from the
systolic syStem. Kung 119881 defined a systolic anay as roll~"S:
DEFINITION 2.1 .-\ systolic array is a computing network possessing the following
features:
• Synchrony The data are rythmically computed (timed by a global clock) and
passed through the network.
• Modularity and Regularity The array consists of modular processing units
with homogeneous interconnections. ~'foreover, the computing network can be
extended indefinitely.
• Spatial locality and temporollocality The array manifests a locally commu-
nicative interconnection structure, Le., spatial locality. There is at least one
uniHime delay alloted so that signal transactions from one node to the next
can be completed, i.e., temporal locality.
• Pipelinability (O(n) executiOlHime speedup) The array e..'Chibits a linear rote
pipelinabiJity, i.e., it sbould achielo-e an O(n) speedup in terms of processing
ratc, where n is the number of processing elements. The efficiency of the array
is measured by the following:
Ts
speedup factor = r;
where Ts is the processing time in a single processor, and Tp is the processing
time in the array processor.
.-\ systolic device is typically composed of many interconnected processors. Two
processors that comrninicate must have a data path between them and free global
communication is disallowed. The farthest a datum can travel in unit time is from one
processor to an adjacent processor(s). Figure 2.1 illustrates several mesh-connected
network configurations.
D---D-O-D
(a) linearly connected
(b) orthogonillly connected
(llLIAC IV)
(c)hexagonallyconnect!d
Figure 2.1: Mesh-connect!CI systolic arrays.
Many algorithms such as filtering, convolution, matrL,< operations and sorting can
be implemented as systolic arrays. The following example (from [Kung and Leiserson,
1978]) demonstrates matnx·vector multiplication in a linear systolic array_
EXAMPLE 2.L
Consider the problem ormultiplyinga matri.'t A = (Oi;) with a \"ector x = (XI' .. ,X.. )T.
The elements in the product y = (Yl' .. ,y.. )T can be computed by tbe following
Yl O.
y~+1 yr + Qi.l:X,l:,
Yi = y!!+1
The single operation common to all the computations for matrL't vector multipli-
cation is the inner product step, C =C+.4·8. Processor which implements the inner
product step has three registers R.<t, RB and Re. Each register has two connections,
one rOT input and one ror output. Figure 2.2 shows the geometry for this processor.
Figur! 2.2: Geometry for th! inner product step processor.
Suppose A is n x n band matrLx with band width w =p +q - 1. (See Figure 2.3
for the case when p = 2 and q =3.) Then the above reccurenees can be evaluated by
pipelining tbe Xi and Yi through a systolic array consisting or w linearly connected
processors which compute the inner product step y =y+.4·x. The linearly connected
systolic array roc the band matmA-ector multiplication problem in Figure 2.3 bas four
inner product step processors. See Figure 2.4.
al\ at'2 x, y,
q a'21 an a'23 x, !12
a31 an a3J a34 x, y,
~'2 a4] a44 0.\5 x, y,
0,u 0Sol OMll&l x, y,
a.. x, Yo
A
Figure 2.3: Multjplic~tion of 01 lIector by 01 ~nd mOluix with p == 2 ~nc1 q == 3.
a"
""
a" a"
a" a"
a" a"
"
a" a"
,<
a"
T
• • , I
1JD[l[J /h"---
Figure 2.4: The line~r1y connected systolic ~r~y fOf the mOitrix vector
multipliation problem in Filure 2.3.
The general scheme of computation can be vie"'OO as follows. The Yi which are
initially zero, are pumped to the left while Xi are pumped to the right and the ail are
marching down. All the moves are synchronized.
Figure 2.5 illustrates the first 5e\'en pulsations of the systolic array. Obser\-e
that at any gh-en time alternate processors are idle. Indeed. by coalescing pairs of
adjacent processors, it is possible to use wj2 processors in the network for a general
band matrix with band width w.
Pt P2 P3 N
:c;,.= 0 e1lttl'S the fourth proces-
Pulse
Number
Configuration Comments
Figure 2.5: The first seven pulsations of the linear systolic ,may in Figure 2.4.
We now specify the operation of the S}'Stolic array more precisely. Assume that
the pr~rs are numbered by integers 1,2, ... w from the left end pr0ces50r to the
right end processor. Each processor has three registers RA• RB and Re. which hold
entries in A. x and y, respectively. Initially, all registers contain zeros.
Each pulsation of the systolic array consists of the following operations, but for odd
numbered pulses only odd numbered processors are activated and for even numbered
pulses onl~' e\"en numbered processors are acth'3.ted.
• Shift
1. R.. gelS a new element in the band of matrix A.
2. Rz gets the contents of register Hz from the left neighboring node. (The
Rz in processor PI gets a new component of x.
3, 14 gets the contents of register l4 from the right neighboring node. (Pro-
cessor Pl outputs its l4 contents and the R, in processor UJ gets zero.)
• Multiply and Add
Lsing the inner product step ptoCfSSOr the three shift operations in step 1 can
be done simultaneously, and each pulsation of the systolic array takes a unit of time.
Suppose the bandwidth of A is w = p +q - 1. It is readily seen that after w units of
time the components of the product y = Ax are pumped out from the left processor
at the rate of one output e\-ery tv."O units of time. Therefore, using tbe proposed
systolic network all the n components of 'I can be computed in 2n + w time units.
as compared to the O(wn) time needed for a sequential algorithm on a uniprocessor
computer.
2.2 The Structure of Systolic Arrays
Processors in a systnlic system are composed of a constant number of Moon automata.
Recall that a finite state Moore automaton is defined as a six-tuple.4 = (S,I,q, p, 15, '\l,
wher!' S is a finite set, I, p, q are nonnegati\-e integers; Ii ; ~+q ~ $I is the state
transition function, and ,\ : ~+q -t S' is the output function. Considering.4 as
an ordinary automaton, then $I is the set of states, S' and S' are the input and
output alphabet, respectively. The standard graphical representation of A is gi"'en in
Figure 2.6(a), where the triangles symbolize the I state components (registers), and
f = (6.),) : ,Sf+-' -+ ~+, is the combinational logic. This type of automaton has the
property that its outputs are dependent upon its state but not upon its inputs.
(~) Moen ~utom~ton (b) Muly ~L1lom~ton
Figure 2.6: The difference bet'Neefl MOln ~nd Mealy ~utom~U
In this mathematical model. time can be regarded as independent variable which
takes on integer values and is a count of the number of clock cycles or state changes.
The states S'(t + I) and outputs 5'(t + 1) of a Moore automaton at time t + 1 are
ulliquely determined by its states 5 f(t) and its inputs 5'(t) at time t by
S'('+ I)
S'(I+ 1)
'(S'('),S'(I))
'\(S'('),S'(I))
A Mealy automaton is similarly defined as a six-tuple A = (5,1, q, p, 6, ),), where all
is the sallie as in Moore automata e.'(tept that the output at time t is dependent on
input at time t, tbat is
5'('+1)
5'(' + 1)
'(5'('),S'(1))
'\(5'('),5'(.+ I))
The standard graphical representation of Mealy automaton is shown in Figure 2.6(b).
In both automata the state is docked through registers, but since the input signals
are allowed to propagate through to the output unconstrained, a change in the signal
on an input can affcet the output without an intervening clock tick. When Mealy
machines are connected in series, signals ripple through the combinational logic of
several machines bet\\'een clock ticks, [£ the signals feed back on themselves before
being stopped by a register, they can latch or oscilate, EV1!n if the problems associated
with feedback ha\'e been precluded, the settling of combinational logic can make the
clock period long in systems with rippling logic, Systolic systems contain only ~Ioore
autollutta, while Semisystolic systems may contain hath Moore and ~lealy automata.
The exclusion of ~'Iealy automata guarantees that the clock period docs not grow
with system size. and makes the number of clock ticks be a measure of time that is
largely independent of system size.
A s)'stolic system can be simply viewed as a set of interconnected Moore automata.
The structure of such a system S(n) is given by a commun"ication graph G =(F. E)
of It interconnected automata where the vertices in l/ represent the automata and
the l..'tIges in £ represent represent interconnections between the automata. The
weights of edges in systolic systems are strictly positive, while the weights of edges
in scmisystolic systems may be zero. An example of a semisystolic system and its
communication graph is shown in Figure 2.;.
Figure 2.7: Semi-systolic anly ilnd its communiation I~ph.
10
The automata operate s)'nchronously b:y means ofa common clock, and hme in the
system is measured as the number of clock cycles. All the automata in ,... are Moore
automata (~'Ioore and Mealy in tbe case of semisystol.ic system) with the exception
of one called the host which can be viewed as a Turing equivalent machine that
provides input to and receives output from the s)'stem. Based on the communication
graph. the neighborhood of an automaton 1I E V is the set of automata with which
it communicates:
N(v) = (w I ("w) E E vr (w,') E E}.
For S(n) to be systolic, it is further required that the Moore machines be small
ill the following sense. There must exist constants C" C1, C3 and C4 such that for all
II and all v E V - {host},
• I~ I ::; CI The !lumber o£statcs of each Moore (~·Iealy) automaton is bounded.
• Is: I $ C'l The number of input symbols is bounded
• IS: I $ CJ The number of output symbols is bounded
• IN(u) [.:5 CI. The number of neighbors of each automaton is bounded, Le.. the
communication graph has bounded degree.
The "smallness'" conditions help ensure that the number of clock cycles is a good
measure of time in the systolic model. .-\ problem arises, however, when the time
required to propagate a signal between machines becomes longer than the time re--
quired for the longest combinational-logic delay through a machine. The period of
the clock must be at least as long as the longest propagation delay between machines,
which means that the independence of the clock period from system size will not
be realized for SystCffill with long interconnections. Systolic arrays, which ha\'C only
nearest-neighbor connections.. are especially attracth-e for YLSI becal:Se propagation
delay is insignificant.
11
2.3 Synchronous Systems
The systolic design methodology manages communication costs effecth'eh' because
the only communication permitted during a clock cycle is betll,een a processing ele-
ment and its neighbors in the communication graph of the system. This constraint is
in direct contrast with, for example, the propagation of a carry signal which ripples
down the length of an adder. Such combinational rippling and global control such as
broadcasting are forbidden in systolic designs. Global communication is more easily
dcscribL'd in terms of rippling logic. In a systolic s)'Stem the effect of broadcasting
must be achieved by multiple local communications. The primal)' reason for intro-
ducing the concept of a SynchronolJs SYI/tern was the design issue. [0 [Leiserson and
Sa-xc, 1983aI the autnors demonstrated now a synchronous system can be designed
with rippling logic, and then converted through Systolic Conversion Theorem to a
systolic implementation that is functionally equivalent to the original system· the
principal difference being the shorter clock period of the systolic implementation .
..\. synchronous system can be modelled as a finite. rooted, edge-weighted, directed
multigraph G = (\I: E. till, w). The \"ertires ~. of the graph model the functional
elements {combinational logic) of the system. E\'t!ry functional element is assumed
to have a fi."(ed primitive operation associated with it. These operations are designed
to manipulate some simple data in the common algebraic sense. Each \'ertex tI E V
is weighted with its numerical propagation delay d(v). A distinguished root \'ertex
till. called the host, is included to represent the interface with the external world,
and it is given zero propagation delay. The directed edges E of the graph model
interconnections between functional elements. Each edge e in E is a triple of the form
(u, v, w), whereu and v are (possibly identical) vertices of G connecting an output of
some functional element to an input of some functional element and w =w(e) is the
nonnegath't! integer weight of the edge. The weight (register count) is the number
of registers (clocked memory) along the interconnection bet....-een the two functional
elements. [f e is an edge in the graph that goes from vene.'t u to \'ene.'t V, we shall
12
use the notation u -4 tl. For a grapb G, we shall view a path p in G as a sequence of
vertices and edges. If a path p starts at \~nex u and ends at a \'ertex tI, \\~ use tbe
notation u !- u. .\ simple path contains no vertex twice, and therefore tbe number
of vertices exceeds the number of edges by e..,,<actly one, We extend the register count
function 1lI in a natural way from single edges to arbitrary paths. For any path
p = Uo ...!4 Ul .!.4 ., ,~ Vk, we define the path weight as the sum of the weights of
the edges of the path:
.-.
wI'= Lw(e;)
,.,
Similarly. propagation delay function d can be extended to simple paths. For any
simple path p = L'o ~ VI .!.4 ...~ Uk, we define the paih riday as the sum of the
delays of the \'ertices of the path:
.
d'=Ld('il
i::O
In order that a graph G =(V, E. Vii, w) has \\~lI-defined physical meaning as a circuit,
we plare the following restriction on propagation delays d(v) and register counts w(e):
D. The propagation delay d(v) i.! nonntgatiue for mch ver1u u E \....
W. In any diT!£ted cycle of G, there iJ some edge wilh strictly positive rtgi.!tu count
We define a synchronow -,ystem as a system that satisfies conditions D and W.
The reason for including condition W is that whene\~ran edge e between two vertices
II alld U has zero weight, a signal entering vertex u can ripple unhindered through
vertex u and subsequently through vertex v. If the rippling can feed back upon it-
self, problems of asynchronous latching, oscilation and race conditions can arise. By
prohibiting zer~\\~ight cycles, condition W prevents these problems from occur1ng,
provided that the system clock runs slowly enough to allow the outputs ofall the func-
tional elements to settle between each tVtU consecuth~ ticks. The following definitions
are adopted from {Leiserson and Sax:e, 1983a, 1983b I.
13
DEFINITION 2.2 A synchronous system is systolic if for each edge (u, v, w) in the
comminication graph of S, the weight w is strictly greater than zero.
DEFINITION 2.3 .-\ configuration of a system is some assignment of values to all its
registers. With each clock tick, the system maps the current configuration into a new
configuration. If the weight of an edge happens to be zero, no register impedes tne
propagation of a signal along the corresponding interconnection.
DEFINITION 2.4 Let c be a configuration of a synchronous system 5 and let c be
a configuration of a synchronous system 5'. The system 5 started in configuration c
has the same behauior as the system S' staned in configuration t! if for any sequence
of inputs to the system from the host, the two systems produce the same sequence of
outputs to the host.
DEFINITION 2.5 let 5 and 5' be synchronous systems. Suppose that for every
sufficiently old (:ollfiguration C of 5, tnere exists a configuration r! of 5' such that when
S is started in configuration c and 5' is started in configuration C, the two systems
exnibit the same behavior. Then system 5' can simulate 5. If two synchronous
systems can simulate each other, then they aloe equivalent.
Two synchronous systems are strongly equivalent, or, in other words, ha....e the
sallie strong bthamor if they ha\l~ the same behavior under all interpretations. The
interpretation of a functional element labeled with (J from some alphabet E, with p
input channels and q output channels, is a mapping~" .. [)If --t 0', where the set D
consists of certain data elements.
DEFINITION 2.6 For any synchronous circuit G, the minimum feasible clock period
4'(G) is the maximum amount of propagation delay through which allY signal must
ripple between clock ticks. Condition W guarantees that the clock period is well
defined by the equation ~(G) = ma.'C{d(p) I w(P) = O}.
These notions allowed the introduction of transformations useful for the design and
the optimization of synchronous systems: rdiming and slowdown..
14
Retiming transformations can alter the computations carried out in one clock cycle
of the system by relocating registers, that is, shirting one layer of registers from one
side of a functional element to the other. T ....u systems are retiming equivalent if they
can be joined by a sequence of such primiti"e retiming steps. Retiming is important
technique which can be used to optimize clocked circuitS by relocating registers so as
to reduce combinational rippling.
Consider the communication graph G I in Figure 2.8(a). Suppose, for instanCf:,
that each ,·ertex has a propagation delay of 3 esec. Then the clock period of 51 must
be at least 9 esec • the time for a signal to propagate from V3 through 1J6 to (/$.
I
(a) The communication ~ph 0, of a synchronous systtm 8,.
(b) The communication graph G2 of a system ~,which is equivalent to the system
8, from Figure 2.8(01). as viawd from the host. Intemany, tht two systtms differ
in that vertex ~ lap by ont cIodt tick in 8, with resptCt to 8-l.
FilUre 2.8: Rttiming transformation.
15
Rctiming the \"ertex IJ3 in G l , that is, decreasing the number of registers by one
on all incoming edges and increasing the number of registers by one on all outgoing
edges. results in a communication graph G2 of a synchronous system ~ in Figure
2.8(b) which is. intuitively, equivalent to 51 but with a shorter clock period· 6 esec.
Formally. retiming transformation is defined as follows: let 5 be a syncbronous
system. \/(G) the set of venices of the underlying graph G and R a function from
\ '(G) into the set of all integers. We say that R is a legal retiming vector if for
every edge (u, u, w) in G the value w +R(v) - R(u) is nonnegative and R(host) =O.
.-\.pplying R to 5 simply means replacing the weight w(e) of each edge e : u -+ u by
w'(e) ~ w(e) + R(,) - R(u).
In our example. the legal retiming vector R which takes 51 illto 52 is:
R(host, Ull tI2, tIJ, v~, Us, lJ6, V7) = {O. 0, 0. -1,0,0,0, O}.
The impact of retiming on the beha\'ior of synchronous systems is expressed b)'
the so called Retiming Lemma in [Leiserson and Saxe. 19833 I:
LEMMA 2.1 (Retiming Lemma) let S be asynchronous system with communication
graph G. and let R be a function that maps each ,-ertex u of G to an integer and the
host to zero. Suppose that for every edge (u, u, w) in G the \'3.lue w + R(lI) - R(u) is
nonnegati'-e. let S' be the system obtained by replacing every edge e = (1.1, lJ,w) in
5 \\lith r! = (1.1, V, W + R(v) - R(u)). Then the systems 5 and 5' are equivalent.
Slowdoum is defined as follows: for any circuit G = (1/, E, w) and any positi\'c
integer c. the c-slow circuit cG = (V, E, 10') is the circuit obtained by multiplying
all the register counts in G by c. That is, w'(e) = cw(e) for every edge e E E. All
the data flow in cG is slowed down by a factor of c, so that cG performs the same
computations as G, but takes c times as many clock ticks and communicates with the
host only on every rfll clock tick. In fact, r.<; acts as a set of e independent versions
of G, communicating with the host in round·robin fashion. For e.'<aDI.ple, the 2·slow
circuit ~ of 5\ is shown in Figure 2.9.
16
Figure 2.9: Slowdown transformation. The communication graph G3 = 201of a system
S, obtained by multiplyina all the rq;ister counts in 0 1from Figure 2.8{a) by 2. All the
data Row in S3 is s£owecl down by a faetcw of 2. so that ~ performs the same computa-
tions as SI. but taka 2 times as many dock ticks and communicates with the I'Iost only
on every second tick.
The impact of slowdown on the behavior of synchronous systems is the following:
the main adventage of c-slow circuits is that they can be retimed to have shorter clock
periods than any retimed version of the originaL For many applications, throughput is
the issue. and multiple, interleaved streams of computation can be effectively utilized.
A c·slow circuit that is systolic offers maximum throughput. Another interesting
observation is that not e\-ery synchronous system can be retimed to get an equivalent
systolic system. According to SY5tolic Convemon Theorem [Leiserson and Sue,
1983a I, s)'ndtronous system S with communication graph G can be retimed to systolic
system S' if the am5traint graph G - 1, which is the graph obtained from G by
replacing every edge (u, 11, w) with (u. 11, w - 1) has no cycles of negative weight.
However, for any synchronous system tbat cannot be directly retimed to get a systolic
system, there might be a slowdown transformation such that, after this transformation
is applied. one gets a synchronous system that can be retimed to get an equi\-alent
systolic system. It can be proved that such slowdown transformation is possible only
if the underlying automaton is Moore automaton. Retiming \-ector R(tI) is defined
for every \-ertex II as the ....eight of the wrtat poth from II to h05t in G - 1. Consider
thp. Mn~traint gr~ph G: - 1 in Figure 2.10 of a synaronous system 51' Sin~ G1 - 1
17
contains a cycle host -+ Ul -+ vr -+ host of negative weight, 51 cannot be directly
fctimed to get an equivalent systolic system.
Figure 2.10: The constraint graph GI - I of a synchronous
system SI in Figure 2.8(01).
On the other hand. the constraint graph G3 - 1 = 2G l - 1 does not have cycles of
negative weight. Consequently. there exists a legal retiming vector which transforms
synchronous system 53 into systolic system S~ in Figure 2.11:
Figure 2.11: The systolic system S{ obtained from the 2-slow
synchronous system S3 by retiming.
18
2.4 Flowchart and Synchronous Schemes
Two major objections must be made about the model of synchronous system pre-
sellted in the previous subsection:
[11 According to Definition 2.1, a synchronous system is an infinite edge-weighted
directed rnultigraph represented by its finite aproximations that must be regular in
a certain sense. Therefore the single finite graph G should be called a finite system
only. or rather a scheme.
[2J The mulligraph representation of a synchronous scheme is inadequate in the sense
that it does not relate the t....u endpoints of a given edge to designated labelled input-
output "ports" of the corresponding vertices. This question is clearly imponant,
because i/o ports of the functional elements (processors) behave differently in general.
Also, it is advantageous to replace the host by a fixed (finite) number of input-output
channels as distinguished vertices, thus avoiding the unnecessary constraint that those
cycles cI05ed only by the host should (:ontain an edge with positive weight.
These two criticisms suggest re<:onsidering synchronous systems in tile frame-
work of Elgot's {1975 J ~-eI1·known model of monadic computations (flowchart algo-
rithms). This standpoint motivated the definition of synchrnnow flowchart schemt3
in {Bartha, 19871 or simply synchrnnou..! scheme. Since synchronous scbemes are
defined in terms of flowchart schemes we introduce the fundamental definitions and
properties of flowchart schemes that will play an important role in the sequel.
DEFINITION 2.7 :\ signature or ranked alphabet is a set r:, whose elements are called
operation JY11Ibols, together with a mapping ar : I: -+ N, called the onty Junction,
assigning to each operation symbol a natural number, called its finite amy. If the
operation symbols are grouped into subsets according to their arity: E" = {O' E 1: I
arlO') = n}, then the signature I: is uniquely determined by tbe family (E" I n E N).
19
A realization of an n-ary operation symbol in a set .-t is an n-ary operation on .-t.
Gi\"en a signature E, a E-algebrn .-l is a pair '3 = (A., E") consisting of a set .-t, called
the carner of ..I., and a family E" = «(1" I (1 E 'E) of realizations (1" of operation
symbols (1 from ~.
DEFINITION 2.8 A !:.-flowchart scheme (FE-scheme) F is a finite directed graph
augmented by the following data.
(I) A subset X ~ F of vertices of outdegree O. The elements of X are called exits
of F.
(2) .-\ labeling function, by ",'hich e\"ery none:<it vertex I) is assigned a symbol (1 E r:
in such a way that the rank of (1 equals the outdegree of v.
(3) F'or each vertex u. a linear order of the set of edges leading OUt of fl. By the
notation II -l-j U we shall indicate that the target of the jUt etIge leaving It is
\'crtex u.
(4) A begin function, which maps some finite set 8 into the set of vertices of F.
The begin function specifies a set of marked entries into F.
Forsimplicity, the marking set 8 abo\"e will be identified with the set [n] = {I ... .n}.
Similarly. the e.tit \-enices will be labeled by tbe numbers in [pi = {I, . .. p}. An
n..entry and ~exit F~-scbeme F is denoted F : n -l- p. If F : n -l- p and G : p -l- q
are F'~·schemes, then one can form their composite F· G : n -l- q by identifying the
exits of F with the entries of G in a natural way, assuming that F and G are disjoint
graphs. This kind ofcomposition gives rise to a category with all nonnegative integers
as objects and with. all n>schemes as morphisms. The category obtained in this way
is known as the horizontal structure of ftowchart schemes.
The \"ertical structure of F'E-schemes [BkIom and Esik, 1993 J is the category FIE:
constructed as the coproduct (disjoint union) of the categories Fldn, p), n, pEN
defined below.
20
• For each pair (n,p) € N x N, Fldn, p) has as objects all n:·schemes n -+ p.
• A morphism F -+ F' bet",-een Fl:'schemes F, F' : n -+ p is a mapping 0 from
the set of \l!rtices of F into that of F' which preserves:
1. the sequence of entry and e..xit \l!rtices;
2. the labeling of the bo.'~es;
3. the edges in the sense that if u -'ti IJ holds in F. then o(u) -+, o(lJ) will
hold in F'.
• Composition of morphisms is defined in Flr(n, p) as that of mappings, and the
identity morphisms are the identity maps.
Sometimes it is useful to consider an FE-scheme F : n -+ p as a separate partial
algebraic structure O\'er the set of \'ertices of F [Gratzer, 19681. [n this structure
there are n constants, namely the entry \l!rtices of F. Furthennore, for each a € ~
there are q unaf}' operations (a, i), i € (qJ if q ~ 1. one unary operation (0",0) if
q = O. If j ~ 1. then the operation (0, i) is defined on vertex u of F if and only if u is
labeled by o. and in that case (o.ll(u) is the unique vertex v for which u -+i v. The
operation (0",0) is interpreted as if there was a loop around each \-ertex labeled by the
constant s~'mbol 0, i.e. (0.0) is an appropriate restriction of the identity function.
).:0 operation is defined on the set of exit \-ertices.
A strong congruence relation of F (as a partial algebra) b~' which the exit vertices
form singleton groups is called a scheme congruence of F. Clearly, every scheme
morphism 0" : F -+ F' indur.es a scheme congruence 9 on F. By the homomorphism
theorem, if Q is onto then F/8 ~ F', where the isomorphism and the factor scheme
F/9 hall! their usual algebraic meaning (Gratzer, 19681. In tbe sequel we shall not
distinguish bet\\-een isomorphic rr:'schemes.
let" be a \l!rtex of an FE-scheme F ; 11 -+ p. Starting from v, F can be unfolded
into a possibly infinite E.tree T(F, v). R.ecaJ.1 from [Bloom and Esik. 19931 tbat an
21
infinite E-tree has all of its nodes labeled by the symbols of E in such a way tbat
the number of descendents of each node u is equal to the arity of the symbol labeling
u. The branches of the tree T(F, v) correspond to maximal ",-alb in F starting
from v, where a ma:omal walk is one that ends at a \'trtex of outdegree zero or it
proceeds to the infinity. The walk is allowed to return to itself arbitr:uy many times.
The nodes of T(F, v), being copies of the \'ertices of F, are labeled either by the
symbols of L or by the variable symbols Xl •.. ,X, chosen from the fi:<ed "'llriable set
X = {Xt, . . ,xn, .. .}, in the c~ of exit vertices.
EXAMPLE 2.2 Consider the ftoY.'Chart scheme in figure 2.12.
begin
Figure 2.12: Flowchart scheme.
Syntactical description of F can be gi\'en by the equation y = r(O'(y),xd. The tenn
on the right-hand side has the tree repff'SeDtation shown in Figure 2.13.
Rgure 2.13: The tree representation of a term.
Solving the equation means replacing y by r(O'(y),XI) as many times as possible. The
process results in the infinite labelled tree shown in Figure 2.14.
Figure 2.14: Unfoklinl the Rowchart: scheme..
22
For two \-ertices fl, tI of F, we say that u and tI have the same strong behavior if
T{F. u) = T(F, u), Unfolding F starting from each entry vertex simultaneousl:!,' yields
an n-tuple of trees, which is called the .strong behavior of F, denoted T(F). By
definition, if 9 is a scheme congruence of F and u == u(9), then u and u have tbe
same strong behavior. Consequently, if Q : F -+- F' is a morphism in FIt, then
T(F) ~ T(F').
All FE-scheme is called acceuible if every noncxit vertex of F can be reached from
at least one entry vertex by a directed path. In the algebraic setting F is accessible
ir. with the exception of the exit vertices, F is generated by its constants. For an
accessible FE-scheme F, define the equivalence JJp on the set of vertices of F in the
following way:
u" '(UF) ;[ T(F, uJ =T(F, 'I·
Obviously, I~F is a scheme congruence, and it is the largest among all the scheme
congruences of F. The scheme FIJJF is therefore called minimal.
Let G be a graph and denote by V(G) the set of vertices ofG. A subset S ~ V(G)
is .strongly connectrd if for e\'ery u, u E S tbere exists a directed path in G from u to
u going through \'Crtiees of S only..-\ strong component of G is a strongly connected
subset S which is ma.'timal in the sense that if S' is strongly connected and 5 ~ S'.
then S = 5'. An n-entry FE-scbeme F is tree-reducible if F is accessible and the
graph obtained from F by deleting its exit vertices and contracting each of its strong
components into a single vertex consists of n disjoint trees. E\1!ry accessible Ft-
scheme F can be unfolded into a tree-reducible scheme by finite means. To this end,
it is sufficient to unfold the partial order of the strong components of F with its exit
vertices deleted into a set of disjoint trees. The resulting tree-reducible Ft·scbemc
will be denoted by utr(F). Tbe unfolding detennines a morphism utr(F) -+ F in the
category Fl~>
DEFINlTIQN 2.9 A .synchronOU.5 .scheme 5 (SE-scheme for short) consists of a finite
underlying FE-scheme, denoted /1(5), and a weight function by which every edJ/:e of
23
S is assigned a nonnegative integer. We assume that the direction of an edge Ie in S
is the opposite of the direction of the same edge in 11(5).
let us point out the semantical differences between flowchart and synchronous
schcmes. A flowchart scheme of sort p -t q is interpreted as a flowchart algorithm
called monadic camputation [Elgot, 19751 with p entries and q exits. Accordingly, the
flow of information in the flowchart scheme follows the direction of the arrow bety."Cen
p and q. For. e.umple the scheme! : 2 -t I should be interpreted as a join of two
different paths in the flowchart. In a logical circuit. ho.....e\-er, the meaning of £ is a
branch: thllS, in this case the information flows in the opposite direction. Reasoning
from the point of view of category theory, the difference is the following. Concerning
Itowchart schemes, thc object n in the theory T is treated as the nth copowcr of the
object 1 (n = E~"l I), while in the case of synchronous schemes n would rather be
the nth powler of 1 (n = n:=, 1), as in the original definition of algebraic theories in (
la\\"\"ere. 1963]. See also Figure 2.15. Howc\-er. if we followed the product formalism,
then the sort of a mapping ip] -t (q] would confusingly become q -t p. Therefore, we
rather adhere to the coproduct formalism and express the product·like (functional)
scmantics only by designing our schemes in an upside-down fashion.
A Pi
(Oi) AowchOlrt scheme
F;D -+2D
(b) Synch«Nlousscheme
S; D~ -+ D
Figure 2.15: The difference between flowchart and synchronous schemes.
The category Syu~ of SE-scbemes consists of tbe following. Tbe objects are all
accessible SE·schemes. A morphism S -t S' in Syu~ is a morphism II(S) -+ 11(S')
in Fl~ tbat preserves the weight or the edges. Accordingly, a scheme congruence of
S is one of Il(S) tbat is compatible with the weip;ht function.
24
Categories TFlt and TSynt are full subcategories of FIr: and Synr: respecti\"ely,
determined by the subset of tree-reducible scbemes.
We define the signature Ev as E U {V}, where V is a unary operation symbol
(register). With any SE-scheme 5 \\"e then associate the FEv-scheme flv(S), which
is obtained from fl(S) by replacing every edge e in it by a chain of n V-labeled
\"Crtices. \\'here n is the weight of e. As in the case of FI:'schemes, \\"e include the
infinite unfoldings of SE·scbemes in SYD:t. Obvious details of this procedure are
omitted. See Figure 2.16.
S
ic
T(S)
9
~
9 9
I I
J 9
A A
'V .l'l 'V 'V
I I I
9 J 9
/\ /\/\
'V 'V 'V II 'V 'V
Figure. 2.16: Synchronous scheme. S ~nd its (infinite) unfolding trH T(S).
The importance of the concept of tree unfolding is that it captures tne strong
behavior of synchronous (flowchart) schemes. Two schemes can be syntactically dif.
ferent and yet exhibit the same strong behavior as snown in Figure 2.17.
Transformations of rctiming and slowdown are defined for synchronous schemes in
the same ...:ay as for synchronous systems.
[f R is a legal retiming vector for Sand S' is the scheme obtained by applying R
to S, then we shall write R: S ..... 5'. Retiming count \-ectors thus define a category
on the set of SE-schemes as objects. The composition of tv.-o arrows R and Jr is
R+ Fr a!l.d the identities arc the zero ..-octets U.
25
If c is a positi,'e integer such that SO = cS, i.e" SO is obtained from S by c.
slowdown. then 'A'll shall write c : S -t S'. Slowdown transformations also define a
catego~' on the set of SE.schemes as objects. The composition of two arrows c, and
C2 is CtC2 and the identities are designated by c = 1.
S, S, TIS,) T(S,)
~ / /I I'J 'JI I9 91 I/ /I I
'J 'J
1 I
9 9
I I
Filure 2.17: Schemes 5, and S:i while difl'~ent represent the same computational
process. They can be unfolded into the same trft T(Sd :: T(S:z).
The following Definition and Lemma are adopted from (Bartha, 19941.
DEFINITION 2.10 The relation of strong retiming equivaleTIC:e 011 the set Syn,£ is
the smallest equivalence relation containing --+, and --+., where --+, denotes the
bina~' relation induced by reduction (unfolding) and --+r denotes the binary relation
induced by retiming transformation. Strong retiming equivalence is denoted by"'.
LEMMA 2.11 '" =+--, 0 "'. 0--+,
FACT: The relation of strong retiming equivalence is decidable for synchronous sche-
mes IBanha, 19941.
26
3 Equivalence Relations of Synchronous Schemes
and their Decision Problems
In this section we introduce the equivalences of slowdown retiming and suang slow·
down retiming as the join of slowdown equivalence and rttiming equi\'3lence, and the
join of these two plus strong equivalence, respectively. on the set orS~:'schemes. Con·
cerning slowdown, -+.tl will stand for the partial order induced on Synr; by slowdown
constants. \\re shall use tlte pteorders defined by the categories Fh: and SyOt as sim·
pie binary relations over the sets Fir:. and Synr:. of all finite accessible FE-schemes
and SE-schemes, respectively. In both cases, this preorder will be denoted by -+,.
Concerning retiming, .....~ will stand for the equivalence relation induced on Synr. by
legal retiming count ....ectors.
3.1 Slowdown Retiming Equivalence
DEFINITION 3.1.1 The relation of slowdown retiming equivalence on the set 5yn!:
is the smallest equivalence relation containing ----tIl and ""~.
Siowdo\\'n retiming equivalence of synchronous schemes will be denoted by ""SR.
The relations ""d = (-+" U 4--d)O (symmetric and transitive closure) and -~ are
called slowdown equivalena and retiming equivalence, respectively. In order to decide
the relation ""SR we are going to prove the following equation:
(L)
Equation (1) says that if two SE·schemes 51 and s" are slowdo\\'tl fetimingequivaleot,
then they can be slowed down into appropriate schemes ~ and S; that are already
rctiming equivalent.
27
PROOF. Let 5, P and Q be SE--schemes sucb tbat 5 -r P and P -+.., Q. Then
there exist a legal retiming \-ector R : S ~ P and a positi\"e integer c such that Q
is the c-slow scheme cP obtained by multiplying all the register counts in P by c.
Scheme 5 can be slowed down by multiplying all the register counts in 5 by c, i.e.,
5 -+.1 cS = Pl' Define a legal miming vector R\ as Rl(v) = cR(v) for all vertices v
in PI- We claim that R, takes PI to Q. Indeed, for any edge u~ u in S, the weight
wR:(e) of the corresponding edge in P after retiming R is defined by the equation
WR(') = w(.) + R(.) - R(u)
The slowdown c: P ~ Q transforms wn:(e) into
cw.(.) = cw(.) +cR(.) - cRtu)
in Q. On the other hand, for any edge u~ u in 5, slowdown transforms w(e) into
cw(e) in Pit and retiming Rl takes this number to
CWRl(e) cw(e) + R\(u) - R.(u)
cw(.) + cRt.) - cR(u) •
LEMMA 3.1.3 t--.., 0 -+.., ~ -+.., 0 t--.o/
PROOF. Let S. P and Q be SI:-scbemes such that P -+.0/ Sand P -+,1 Q. Then
there exist positi\1:~ integers CI and C2 such that Sand Q are c-slow schemes CIP and
c~P obtained by multiplying all the register counts in S by c\ and c, respecti\-ely.
Then the following diagram commutes
c~_O1
0
Q---pc,
28
LEMMA 3.1.4 --+., 0 t--Il ~ +--u o--+Il
PROOF. Let S, P and Q be SI:'scbemes such that S --+Il P and Q ---;'Il P. Then
there exist positive integers CL and c, such tbat P = CIS = c,Q. Let e be an edge
in P. Then wp(e) = cLwS(e) = c,wQ(e) and lan(".'I~lw5(e)=~wQ(e), wbere
Icm(cL,c,) is the lmst common multiple ofc, and c,. Therefore, there e.~ists scheme P
such that S = ~P' and Q = kmt:~)p', Le., the following diagram commutes:
~
~-·l·
Q--c-,-P
PROOF. Follows directly from Lemmas 3.1.3 and 3.1A.•
COROLLARY 3.1.6 -SR = ---;'.10 -r 0 +--.0/
PROOF. It is sufficient to prO':e that the relation p =---;'Il 0 -r 0 +--., is transithl!.
We have
~ --+M 0 -. 0 ---;'01 0 +--.1 0 -r 0 +--0/ (by Lemma 3.1.3)
~ --+" 0 --+,1 0 -. 0 -. 0 +--.1 0 t--,I (by Lemma 3.1.2)p.
3.2 Decidability of Slowdown Retiming Equivalence
PROPOSITION 3.2.1 The relations -.1 and -r aTe decidable.
PROOF. For any twoSE-schemes S and S', S -0/ S ifand only if there exists scheme
p ll.llc1l that S ---4.0: P and S'~.l P, i.e. there emt positivemt~ c and t! such
29
that cw{e) = c'w'(e') for all edges e in 5 and corresponding edges e' in 5'. Therefore,
in order to decide -Jl it is sufficient to verify that the ratio~ is the same for aJi
corresponding edges e, e'.
As to the relation - .. recall from I~'Iurata, 19771 that a fundamental circuit of
a directed graph is a cycle of the corresponding undirected graph. Let 5 and 5' be
two S1:-schemes with .....eight functions wand uI, and assume that 5 and 5' share a
common underlying graph G. For every fundamental circuit z of this graph, let us
fix a cyclic order of the \'ertices of z in order to distinguish a positive and a negative
direction of the edges belonging to::. For every edge e E z, let sign:(e) = 1( -1) if
the dire<:tion of e is positive (respectivcly, negative) with respect to thc given order.
LEMMA 3.2.2 We claim that Sand S' are retiming equivalent, i.e. there e:ri.!t! a
legal rehmillg count vector taJ.ing 5 to 51 if and only if:
(I) For every fundamental circuit: of G
L ';gn,(,) . wI') =L ';gn,(,) . w'(,)
~E: d::
(2) AU simple paths from an ent~· to an e:tit verte..'( ha\'l! the same weight.
where by simple path \\'e mean an alternating sequence of \'ertiees and edges
Va~ 11\ ...!4 ...~ Ill' in which no vertex is repeated.
By [:\-Iurata, 1977. Theorem 11. (1) is necessary and sufficient for the existence of a
fCtiming \-ector R that satisfies the condition of being legal, except that R(v) need
not be zero fot all exit vertices. Suppose that such an R exists, and let p be a simple
path composed of vertices and edges Vo ~ VI ...!4 ...~ Vl, where '110 is an entry
and Ul is an exit vertex. Then we have:
l-l l-I
w'(p) = L w'(,,) = L(w(,,) + R(v.+O) - R(v.))
.:0 i;Q
~ w(,.) + ~(R(V;+.) - R(v.)) =w(p) + R(v.l - R(..).
30
Then w(p) =w(p') if and onl)' if R(Vo) =R(v~J. Obviously, one of R(Vo) and R(lItl
can be dUlSen freely, so that the condition R(I:tl) =R(vt) is equivalent to saying that
assignment R(vo) =R(vt) = 0 is possible.•
COROLLARY 3.2.3 Let Sand 5' be synchronous schemes such that 5 ....r S. Then
for any directed cycle c in 5 and 5', we have w(c) = w'(c).
PROOF. Follows immediately from Lemma 3.2.2 (2).•
If 5 and 5' are tree-reducible schemes, the relation .....r is yet simpler to decide.
Since every fundamental cireuit must remain in some strong component, in condition
(I), it suffices to check that C\-ery directed cycle of a common underlying graph a
has the same total weight b)' the "''eight functions of 5 and S'. •
THEOREM 3.2,4 The relation of "lowdown retiming equivalence i" decidable for syn-
chro11QUS ilcliemes.
PROOF. Let G and G1 be SE·schemes. By Corollary 3.1.6, G and a are slowdown
retiming equivalent if and only if there exist SE-schemes 5 and S' such that a ----joM S,
G' ----joJl S' and S ""r 5'. Since 5 = clG and S' = c,a. we must have cia ""r c,O',
Le. there e.'(ists a legal tetiming \'eCtor R such that, acoJrding to Lemma 3.2.2. all
fundamental circuits: and simple entf)·-to-e:c:it paths P ba'r"C the same total weight.
In other ...,ords, the ratios~ and~ must be the same, where wand uI are "'-eight
functions of a and G' respectively.
According to the argument above, one can decide the slowdown retiming equiva-
lence of G and G' by the following algorithm.
Algoritbm A.
Compute W(Zi) and uI(Zi) for e\'l~ry fundamental circuit Z;,I :s i :s "and w(pj) and
w'(Pi) fore'r"Cry simpleeDtry·to-exit path Pi' 1:S j:S m in G and G', respecti\"Cly, and
31
check the ratios~ and ~. Then, G and G' are slowdown retiming equivalent
if and onh' if these ratios are the same for all .I :$ i :s n and .I :$ j :$ m. •
Let us briefly discuss the comple..'tity of Algorithm .-\. It is easy to see tbat the
expensive part of Algorithm A is the finding of all fundamental circuits and ent~·
to exit paths. Johnson's algorithm [Johnson, 19751 for finding all tbe elementary
circuits ofa directed graph has a time bound ofO(t(n+e)(c+ 1)1) on any graph with
r1 \'ertices. e edges and c elementary ciruits. In order to find all entry-to-exit paths the
floyd-\Varshall algorithm might be used. It is well known that the floyd-Warshall
algorithm rUlls in 0(1 \;:'Il time on any graph with vertex set V.
EXAMPLE: 3.1 Consider the schemes in Fig. 3.1.
s, 50,
Figure 3.1.
Schemes 5\ and 52 are not retiming equivalent since both directed cycles and both
entry·to-e.'Cit paths bave different tOlal weights. In order 10 decide the slowdown
retirning equhoaJence relation for the given schemes it suffices to solve the following
system of linear equations:
CIWt(=d
CIWI(Z'l)
CIWt(PI)
CtWI(P2)
C'lW2(ZI)
C'lW2(Z'l)
C'lW2(P!l
c,w,(p,)
where =1 is the directed cycle VI -+ V;z -+ VS -+ VI; =1 is the directed cycle Vt -+ U'l-+
1:,1 -+ 1:4 -+ U5 -+ VI; PI is the entry-to-exit path Uo -+ VI -+ 112 -+ (,'5 -+ OCt and P2 is
the cntl'}'-to-exit path Uo -+ VI -+ V3 -+ OCt with WdZl) =3, W2(z!l =2, WI(.:2) =6,
W1(=1) =4. wdpd =3, w,(pd =2, wdP2) =3 and W2(P'l) =2, Since
w,(z,) w,(z,) w,(p,) w,(p,) 2
CI =C2 wl(':d = C1 Wl(.Z'l) =C2~ = C1 WdP'lJ =C13
the solution exists. Cl = 2, C2 = 3. By multiplying all the registers counts in SI and
SJ b~' Cl and Ct. respectively. one gets schemes ~ and 51. See Figure 3.2.
Figure 3.2.
It is trhial to check tbat schemes ~ and oS; are retiming equivalent. Consequently,
original !rltp.m~ S: "ott '~ "ff' ~Inwl'\o'llffi retiming equivalent.
33
3.3 Strong Slowdown Retiming Equivalence
DEFINITION 3.3.1 The relation of strong slowdoum retiming equivalma on the set
Syn!: is the smallest equivaleoce relation containing ~dl --+. and ....r.
Strong slowdown retiming equivalence or synchronous schemes will be denoted
bY""'ssR. The relation -, = (--+. U +--,)' (symmetric and transitive closure) is
called strong fl/1Jivalena:. for the definitions of slowdown and retiming equivalence
sec Definition 3.1.1. In order to decide the relation ""'SSR. we are going to pro\'c the
£ollowing equation
""'SSR = +--, 0 --+.1 0 -r 0 f--JI 0 --+. = +--, 0 ""SR 0 -t. (2)
Equation (2) says that if two accessible SE-schemes 51 and 51 are strong slowdown
rctiming equh-cllent. then they can be unfolded into appropriate schemes 5; and S;
that arc already slowdown retiming equivalent.
PROOF. Let S. U and S' be SE.schemes such that S --t, U and U ----tsl 5'. Then
there exist a scheme morphism 0 : 5 -+ U and a positi\-"e integer c such that 5' is a
c·slow scheme cU obtained by lIIultiplying all the register counts in U by c. Then the
following diagram commutes
cl-·),
U'--o--S'
LEMMA3.3.3 ...... O~. ~ +-.0""'. [Bartba, 1994]
PROOF. Let 5, S' and U be SE·schemes such that S -. U aod S' ----to U. Theo there
p~'{i...t.. a I~I f'f'I,iming ('OIlR~ \'{'('tQr R: S 4' (f and a schetrn? morphi!m a: S' ~ U.
J4
Since fl(U) = fl(S), 5 can be unfolded into a scheme U' for which fl(U') = II(S)
and Q: U' -+ S. For e\'ery \"erte:c: v of if, define R'(v) = R(a{v)). It is now easy to
check that the retiming R' takes U' to 5'. •
COROLLARY 3.3.4 -SSR = +--, 0 ---+,10 -r 0 +--,1 0--+,
PROOF. It is sufficient to prove that the relation p = +--. 0 --+,1 0 _. 0 +--.1 0--+.
is transitive. Observe that --+. 0 +--, ~ +--, 0 --+, and --+.1 0 +--, ~ +--,
0--+.0/. because the category Syn!: has all pullbacks and pushouts [MacLane, 19i1l.
By applying Lemmas 3.3.2, 3.3.3, 3.1.3 and 3.1.2 \\"e have:
po p +--. 0 --+~ 0 _. 0 +--,1 0 --+, 0 +--, 0 --+,t 0 -r 0 ~.I 0--+,
~ +--,0 --+,1 0 ..... 0 +--, 0 ~,I 0 --+.1 0 --+, 0 _. 0 +--,1 0 --+.
p.
Repeating the proof orCorollal1' 3.3.-1 \\'orking in the subset TSyn!:. of tree reducible
SE-schemes. we obtain the following result.
COROLLARY 3.3.5 -SSR = utr 0 +--, 0 --+,1 0 -. 0 ~Jl 0 --+, 0 utr- t , whf!1'e
the relation --+.is reJtricte.d to the Sltbset of tree-reducible schemes.
3.4 Decidability of Strong Slowdown Retiming Equivalence
PROPOSITION 3.4.1 The relations .....#, ...... and ...... are decidable.
PROOF. See Proposition 3.2.1 and Proposition 5.2 [Banha., 1994]. •
THEOREM 3.4.2 Let F Qnd F' be t~-retlucible SE-"chemt.1.ruch thQ~ F .....SR £I,
and as.sume th(J~ 9 is a tru·pruennny "chmte congruence of F. Then F/8 ......SR £'/8,
prouided that 8 is a "cherne congruena of £', too.
PROOF. Since slowdown transformations presen'e the congruence 8. Theorem 6.2.5
[Bartha. 19941 directly applies. •
THEOREM 3.-1.3 The relaUon of "trong .slowdown retiming equivalence is decidable
for synchronotL! "cherne&.
PROOF. Let G and G' be strong slowdown retiming equivalent SE-schemes. By
Corollary 3.3.5 there exist some schemes F and F' such that F ---t~ utr(G), F' ---t,
utr(G') and F ......SR F'. See figure 3.3a. Thus in the category TSynr: there are
morphisms F -+ Iltr(G) and F' -+ atT{G'), which determine two morphisms fl(F) -+
fl(atr(G)) and fl(£') -+ fl(utr(G')) in TFI!:. Let dJ and Ib' denote the scheme
congruences of fl(F) induced by these tWO morphisms.
aj b)
Figure 3.3: The proof of Theorem 3.4.3 in a diagram.
~ow construct tbe product of fl(u~r(G)) and fl(utr(G')) as a tree-reducible n:-
scheme H. Then tbereexist5 a morpbismfl(F) -+ fI that makes tbediagram of figure
3.3b commute. For the scheme congruence 8 induced by this morphism, we thus ha\'e
~ ~ ¢> lmd ~ t; C'. Ott the other bI:.d, ¢ and ¢' arc also S!:-scbeme congruences of F
36
and P, respectively, for which Flo = WT(G) and F'N' = utT(C). It follows that 8,
too, is an Sr-scheme congruence of both F and F'. Theorem 3.4.2 then implies that
H ~Flo-sR rIO = H'.
.-\ccording to the argument above, one can decide the slowdown retiming equiva-
lence of G and (J by the following algorithm.
Algoritbm B.
Step 1. See if j/(G) -. fl{G'). If not, thell G and a are not strong slowdown
retiming equivalent. Otherwise go to Step 2.
Step 2. Construct schemes Hand H'. which are the unfoldings of G and G' to the
extent determined by the product of j/(utr(G)) and jl(utr(G')) in TFh:, and test
whether Hand H' arc slowdown retiming equivalent.
The schemes G and G' are strong slowdown retiming equivalent if and only if the
result of the test performed in Step 2 of Algorithm B is positi\'e.•
EXAMPLE 3.2 Consider the schemes in Fig. 3.4.
Figure 3.4
Since jl(St) -~ jl(Sz) we construct the product Hof jl{utr(Sa)) and fl(utr(Sz)) as
foUaM:
37
(11 Tbeset ohertices V(H) =V(51)xV(~) with the restriction \hat (u, v) E V(H)
if and only if T(51, u) = T(~, v), for some u E ""(51) and II E V(~). This
restriction implies that tI and v have the same label in 5 t and Sz respectively.
This common label becomes the label nf venex (tI,ll) in the product scheme.
[21 The entry (exit) vertices of Ii are tbose pairs consisting of an entry (exit) \·ertex
in 51 and the corresponding vertex in 52.
\-1 I ~'lake the scheme Ii accessible by deleting lion-accessible Yenices.
Observe that 51 = IJtr(Sl) and 81 = IJtr(5:z). We have
f/(utr(S,))
~m_
~rn
f/(utr(S,))
~0
rn~ ~
Figure 3.5: Construction of product of jl(ulr(Sd) 'Inc! jl{utr(~)).
That is
Figure 3.6: Scheme Ii as 01 product of fl(utT(51)) 'Inc! fl(utr(~)).
38
Now we construct schemes Ht and H2 , which are the unfoldings of 51 and ~ to the
extent determined by the product scheme if.
H, H,
Figure 3.7: Schemes HI and H2 are slowdown retiming equiva~t.
It is trivial to verify that the slowdown constants are Cl = 2 and C2 = 1. Hence
clHI
_.fR
LP~
Figure 3.8: Schemes CI HI and ezH2 are retiming equivalent.
Therefore. schemes 51 and ~ are strong slowdown retiming equivalent.
Let us briefly discuss the complexity of Algorithm B. The product scheme Hcan
be constructed in 0(1\-,21). In order to construct schemes HI and H2 one has to
insert V nodes into the product scheme if with reversed flow. This can be done
starting from the exit \'ertkes of HI and H2 and following the unfolding trees T(Sll
and T{~) in 0(1 VI). At this point one bas to decide wbetber or DOt HI and H2 are
slowdown retiming equivalent. Algorithm A from Section 3.2, whose comple.'tity is
O(W3!)apptie5.
39
4 Leiserson's Equivalence vs.
Strong Retiming Equivalence
In this section the object of study is the relationship between Leiserson's (intuitive)
definition of equivalency of synchronous systems (Definition 2.5) and strong retmining
equivalence of Ilynchronous schemes introduced in [Bartha, 1994).
We assume that the initial contents of the registers associated with the weights is
.1 (undefined datum).
EXAMPLE 4.1 Synchronous systems in figure -1.1 are equivalent in the sense of
Leiserson, that is 5\ and S, can simulate each. other.
5,
Figure 4.1.
5,
The first three pulsations of the synchronous scheme 51 are:
Input: :rl
Output: 9(.1.1.)
r\ = r3 = 9(.1, .1), r2 =!{l., zll
40
Input: I,
Output, 9(/(1.. <,). 9(1., 1.))
" =', =9(/(J.,I,),9(J..J.), " ~ 1(9(1., J.),I,)
Input: XJ
Outpu" 9(/(9(1.,1.), I,), 9(/(1., I,), 9(1., 1.)))
" ~ " = 9(/(9(1., 1.), <,I, 9(/(1., I,), 9(1., 1.)))'" = 1(9(/(1., I,), 9(1., 1.)), I,)
The first three pulsations of the synchronous scheme 52 are:
Input: XI
Output: .1
r, = rJ = r~ = g(f(.1,rd,.1), rl =.1
Input: X,
Outpu" gU(J., <,), 1.)
" ~" = '. ~ 9(/(J.,I,),9(/(J.,<,), 1.)), " = 9(/(J.,I,), 1.)
Input: XJ
OUlpU" 9(/(1.. <,I, 9(/(1., I,), 1.))
" =', ='. =9(/(9(/(J.,I,),J.),I'),9(/(J..I'),9(/(J.,I,),J.))),
" = 9(/(1.,<'),9(/(1.,<,), 1.))
To demonstrate that 5, can simulate 5 11 let 51 proceed one cycle from its initial
configuration, then set rl = g(.1,.1) and r'l = rJ = r~ = y(f(.1,XI),g(.1,.1)) in
5,. From then on, for any sequence of inputs X'l, X3, .. scheme 5'2 exhibits the same
behavior as scheme 51'
Similarl)', after the first cycle of 5.!, define rl =.1, r2 = !(.1,XI) and r3 =.1 in 51'
Then, for any sequence of inputs X',XJ, .. scheme 51 will e.'tbibit the same behavior
as scheme~.
41
Let t· ={YI, .. I Y.., .. .} be a fixed set of variables. For a ranked alphabet E, Tr;
will denote the set of finite E-trees. If S is any set of \'ariable symbols then TdS)
denotes the set of E-trees over 5, tbat is TdS) = Tr;(SJ' where E(S) is the ranked
alphabet obtained from E by adding all the elements of S as variables of rank 0 to it.
DEfINITION 4.3 A finite state top-down tree transducer M [Engel£riet, 19751 is a
quintuple (E, j" Q, QIl, R), where
~ is a ranked alphabet (of input symbols),
j, is a ranked alphabet (of output .symbols),
Q is a finite set of states, such that Q n (E U~) =0,
Q~ is a subset of Q (of de.rignattd initial state.s), and
R is a finite set (of rules) such that R C; (Q x E) x T:l,(Q x V).
A rule of R will be written in the form 0 -+ p, where 0 = (q, 0") with q E Q, q E ~ ..
and p E T.:l,(Q x F). In this rule, however, ollly the vdriables Yl, ., ,y,. are allowed to
occur at the leaves of tree p. To emphasize this restriction, the above TIlle will rather
be specified as
(1)
[ntuith'ely, the transducer ' ..orks as follows, [t starts processing an input tree t E Tr:
at its root in any of tbe designated initial states. Processing a node tI labelled by
0" E!:,. is carried out by finit finding a rule of the form (l), then replacing tI by tbe
tree p and continue processing the n subtrees under tI in states ql' .. ,q,., attaching
them to the leaves of p labelled by qlYIt ,. ,q..y,u respecti\'ely. Note that the rules
are allowed to be nondeterministic. The relation R C; Tr; X T:l, induced by iH will be
denoted by 9t(M) [Engelfriet. 19751, i,e., two trees t l and t2 are related with respect
to 9t(M) if M maps II into 12 •
In our transducers we shall all()\\' the input tree to be infinite, which makes the
processing of tbe tree also infinite, but stilll\'ell-defined. :\Ioreover, \\'e shall augment
42
the input alphabet I: by the variable symbols X = {XI, .. ,In, ... } of rank O.
DEFINITION 4.4 For a fixed n E N, the finite state top--down transducer Tn is defined
by the following data
Input and output alphabet are the same: Ev
Q = (-n,nlo
Q, = (O);
R is the set of rules defined below
(1) jO'(YIt··. ,Yt) -+ V1(0'((i+l)Yb .. , (i+l)y,,)) for 0' E P:v)", 1~ 0, i+l ::; n
(2) iV(utl -+ (j - L)YI
(3) Ox; -+ Li, for i E N.
EXAMPLE 4.2 The finite state top-down tree transducer 7i can translate the tree
Vh(VI'ilxl, 'ilg'il I'ilx2) into 'il'ilh(f'ilXI,gV f'ilx2) as follows (see also Figure ,1.2):
O'Jh('J f'Jx" 'Jg'J f'Jx,) => 'JOh('J f'Jx, 'Jg'J f'Jx,) rule (1)
'J'Jlh('J f'Jx" 'Jg'J f'Jx,) rule (1)
'J'Jh(l'Jf'Jx" l'Jg'J f'Jx,1 rule (1)
'J'Jh(Of'Jx"Og'J f'Jx,) rules (2),(2)
'J'Jh(JO'Jx" gO'J f'Jx,) rules (1),(1)
=> 'J'Jh(J'JOx" g'JOf'Jx,) rules (1),(1)
'J'Jh(J'Jx"g'JfO'Jx,) rules (3),(1)
=> 'J'Jh(J'Jx"g'J f'JOx,) rule (1)
=> 'J'Jh(J'Jx"g'J f'Jx,) rule (3)
If X = {XI . ... Xn , •. } is a set of variable symbols, then Tl'(X) denotes tbe set
of (infinite) E.trees over X. An infinite tree t E 1?(X) is called regular if it bas
a finite number of different subtrees. Obviously, t is regular if and only if it can be
obtained:LS the unfolding of an appropriate F!:-schcme F, i.e. t = T(F).
43
0 V V V vI I I I Iv 0 v v vI I I I Ih ~ h (1) 1 (1) h (2),(2) h (1),(1)A A = I = A=A=
v v v V h 1 1 0 0
f
I I I A I I } I9 f 9 v v v v 9I I
f
I I I I Iv v v v 9 f 9 v vI I [ I I I I I£, f x, f v v v v x, fI } I fv v x, x, vI I I I£, x, V V x,
I I
x, x,
v v v v v
I I I I I
v v v v v
I I I I I
h (1).(1) h (3).(1) h (1) h (3) hA=A=A=A=A1919/9 I) Ig
I I I I I I I I I I
o 0 V V V V V V V V
I I I I I I I I I ITTl I x, f x, f x, f
Zl f x\ f 0 V V
I I I I I
V V V 0 ~
I I I I
Xz %2 %2 X2
Fipre4.2.
44
DEFINITION 4.5 Two regular infinite trees t l , t2 E ~(X) are retiming equivalent,
in notation t l '" t2, if there exist SI:'schemes $\> $1 such that tl = T(5a), t2= T($2)
and 51 '" 52-
THEOREM 4.6 The relation of retiming equivalence on regular infiniteI:v-trees can
be characterized as:
~ = U!l\('I;j·
I\~O
PROOF SKETCH. (=» Let t l = T(SI) and t2 = T{S2) for some strong retiming
e<luivalent SI:-schemes 51 and 51· Then 51 and 52 can be unfolded into schemes
S; and S~ that are already retiming equivalent, that is, there exists a legal retiming
vector R taking 5; into S~. The number of states [-n, 01 of the finite state top-down
tree transducer T.. which takes t l into t"l is determined by the m3.''Cimum absolute
value of R(u), i.e.. n = ma..,,<{[R(v) II v E V(5tl}_
(<=) Let t l = T(51) and t"l = T(5.l) for some synchronous schemes 51 and 52, and
let r:. be a finite state top-down tree transducer which maps t l into t2 with the
following feature: non V nodes are additionally labeled with the state in which they
are processed. We will denote the resulting tree as ttl- It is obvious that transducer
1,: forces the common underlying flowchart scheme structure on both schemes 51 and
52. Let Hdenote the product of fl(utr(5d) and fl(utr(52 )). Construct schemes HI
and "-]" which are the unfoldings of 51 and 52 determined by the product scheme Has
follows: reverse the flow of Ii and, starting from its exit vertices, insert 'il nodes into
it following the structure of t l and t2 respectively_ Now. starting from exit vertices
of HI and follo\\;og the structure of ttl> label non 'il nodes in HI with labels from
corresponding nodes in ttt. Since r.: maps t 1 into t2, the corresponding nodes in HI
and ttt have the same labels and these labels determine the legal retiming vector
which maps H t into H2- Therefore, St and 52 are strong retiming equivalent.•
45
The rollowing is the example where it is not possible to translate one tree into another
using the finite state top-down tree transducer 'Tn ror any n E N. Notice the difference
between the number or V' nodes along the corresponding paths.
EXAMPLE 4.3 The input tree is Vh(V/Vx" 'VgV'/VX2) and the goal output tree is
!I(V/V.!l, V9V/V:r2}'
OVh(VIV,,, VgVIV,,) (-I)h{V/V", VgV/Vx,)
=> h({-I)V/Vx"(-l)VgV/Vx,)
h(V{-1)/Vx" V( -1)gVIVx,)
=> h(V/{-I)Vx" Vg(-I)V/V,,)
h(V/V(-I)x" VgV(-I)/Vx,)
h(VIV(-I)x" VgV1(-I)Vx,)
h(VIV(-I)x" VgVIV( -l)x,)
h(VIV(-I)x" VgVIV( -1)x,)
rule (2)
rule (1)
rule (1).(1)
rule (1),(1)
rule (1),(1)
crash, rule (1)
crash. rule (1)
crash
DEFINITION .t.7 We define the finite state top-down tree transducer On which takes
as input t E~(X) such that t = T(S) ror some Sr:'scheme 5, and translates it into
the output or scheme S at the nIh clock tick, assuming an initial configuration with
.L's assigned to all registers, as rollows:
~ =~v(X)
~=~U{1.}u{x' Ii ~ I,XE X}
Q={O,I, .... n-l}
Q,={n-I}
R is the set or rules defined below
(1) iq(Yl.. ·. Yn) --t a(iYh .. , iYn) for (f E En
(2) iV'(yd -+ (i - l)YI i£ j ~ 1
(3) OV(gd .... 1.
(4) iXj --t xt for i E N.
46
Notice that On is deterministic. The variable symbol xj stands for the input arriving
from input channel j in the the jill clock cycle.
If the starting configuration c is different, then introduce unary symbols of the
form (V,p), where p E Tl:: is a finite tree representing the contents of a register
according to e. ~Iodify the abovc rules (2) and (3) as:
(2')i(V,p)(y,)--+(i-l)y, iri~l
(3') O(V,p)(y,) --+ P
Call this transducer On(e), where c is the starting configuration.
Lct HnO} denote the net output height of an infinite tree t E ~(X) in the
nih step, i.e. the height of O,,(t), and let H,,(c)(t) denote the total output height
of an infinite tree t E ~(X) in the nih step starting in configuration e, i.e. the
height of O,,(e)(t). :'-iote: H,,(eHt) - Hn(t) ~ kc for a fixed bound kc depcnding on
configuration c.
LEMMA 4.8 Let 5 and 5' be LeiserSDn equivalent SE-schemes Then 5 and 5' are
strong retiming equivalent.
PROOF. Recall the definition of Leiserson equivalency (Definition 2.5). Assume. by
way of contradiction, that 5 and S' are Leiserson equivalent, but 5 "" 5'. Then
(1) f/(T(S)) = f/(T(S'))
(2) T(S) ~ T(S'),
Condition (1) is necessary for two schemes to be Leiserson equivalent. For if fl(T(5))
-# fl(T(5')) then, no matter what the configurations of 5 and S' are, they will never
exhibit the same behavior, that is. produce the same sequence ofoutputs for the same
sequence of inputs.
By virtue of Lemma 3.2.2 and Lemma 2.11 (characterization ofstrong retimingequiv-
alence), (2) can only happen if
47
(i) There e.'tist5 a finite branch in jl(T(5)) leading to variable z} such that the
corresponding branches in T(S) and T(S') have a different number of registers along
them; or
(ii) There e."<ists an infinite branch in jl(T(S)) such that the absolute difference of
the number of registers along the corresponding branches in T(S) and T(S') is 00,
Le.. Hm._ I H.(T(S)) - H.(T(S')) I ~ 00.
rr (i) is the case, then it is easy to see that the input ~ will always appear in
different clock cycles in the output sequences of 5 and S'. Therefore the equation
O.(e)(T(S)) = O.(e')(T(S'))
will not hold for every n ;::: 0, no matter how the configurations c and d are chosen.
This contradicts the hypothesis that S and $I are Leiserson equi\<llent.
In case (ii). according to our hypothesis. there exist configurations c and d for 5
and S' respecti\"(~ly. such that
O.(e)(T(S)) ~ O.(e')(T(S'))
for all n ;::: O. Therefore H..(c)(T(S)) = H,,(c')(T{S)). On the other hand, by
assumption we also have:
!~ I H.(T(S)) - H.(T(S')) I = 00.
This is a contradictton since there erists a bed bound k,such that Hn(c)(t)-Hn(t) ~
k, for all infinite trees t E ~(X).•
LEMMA 4.9 Let S\ and 52 be strong retiming equivalent SE-schemes. Then SI and
52 are Leiserson equivalent.
PROOF. According to Lemma 2.11, there e.'tist SE-schemes 5~ and S; sucb that Si and
S: are strongly equivalent (or i =1,2, and ~ -, 5;. By definition, if two SE-schemes
are strongly equivalent, then tbey are Leisersoo equivalent. 00 the other hand,
T....mm;l 2.1 (Rp.t1min!l (nnrna) a..c:sures that Sj and S; a.re Leisersol!. equivalent. •
THEOREM 4.10 Two synchronous schemes 51 and 5z are Leiserson equivalent if
and only if they are strong retiming equivalent.
PROOF. Follows directly from Lemmas 4.8 and 4.9.•
49
4 Retiming Identities
4.1 The Algebra of Synchronous Schemes
It was observed in [Elgot and Shepardson, 19791 that flowchart schemes can be
treated as morphisms in a strict monoidal category [MacLane, 19711 over the set
of objects N = {O,I,2, ...}. Arnold and Dauchet [1978,19791 reformulated these
categories as N x N sorted algebras called magmoids. In a magmoid M, we have an
underlying set i'v/(p, q) corresponding to each pair (p, q) of nonnegative integers, and
the basic operations are the following:
• CompositioTl: maps i\tJ(p,q) x M(q,r) into I\-I(p,r) for each triple p,q,r E N,
denoted by". See Figure 5.1(a).
• Sum: maps iH(Pl,ql) x M(P2, Ih) into M(Pl + /J'l,qL + rn) for every choice of
the nonnegative integers PhP'J.,qI!Q2, denoted by +. See figure 5.I(b).
• Feedback: maps ."'/(I+p, l+q) into M{p,q) for each pair (p,q) E Nx N, denoted
by t. See Figure 5.1(c). The application of t crootes triangles (boxes of sort
1 -; 1) which represent registers.
c
'--~:IOO .rrQ..
• f,. ---U--U--" --~--
~-- ._-- --; PI P2 P
P
(.) Composition.
It· h: p -t-r
(b) 5,m.
II +h :Pt +1'2 -tql +1/2
Figu~ 5.1: Tnt int~tationof operations.
50
(c) Feedb.ck.
tf' P"" q
There are t....,o constants in iH, 0 and I, standing for the identity anows 10 and
1" respecti\'ely. By the strict monoidal property, 1, (p ~ 1) then corresponds to
the element E~=I 1 in ."'[(P, pl. We use the notation p for L~=, I, and adopt the
categorical terminology / : p -+ q to mean that / is an element (morphism) of
sort (P, q) in M. The operations and constants are subject to the obvious identities
~I1. .... M5 below.
The magmoid operations are, however, not sufficient to express even the most
dcmentary schemes, i.e., mappings. For this reason. some further constants are to
be introduced. Usually the constants 1r~ for all pEN and i E [p I = {1,2, ... p}
are chosen. The constant 1f~ : 1 -+ P represents the mapping [ 1J -+ ( p I which sends
1 to i. This choice is natural, because the semantics of flowchart schemes is defined
ill algebraic theories [law\'crc, 1963\, and the constants ;r~ are included in the type
of the coresponding N x N sorted algebras. However, regarding the pure synta."( of
schemes only, the choice of the constants 1f~ is not the simplest onc. Indeed. e\·cry
mapping can be expressed by the help of the transpo8ition :r : 2 -+ 2, the join (or
branch) '- : 2 -+ 1. and the :ero Ol : 0 -+ 1 using the magmoid operations. These
constants are also natural for us, even from tbe semantic point of view. because we
consider schemes to be logical circuits. In this case the constants I, £ and 0, are
interpreted as the simplest switching elements in the circuits, see Figure 5.2.
9
1: A;
DC,
Figure 5.2: The interpretation of constants.
(n accordance 'Ilo'ith [Bartha, 19871, S denotes the type consisting of the oper·
ations " + and t and constants 0, 1, x, e and 0" and D is the subtype of S nOt
containing t. This way we have defined the S·a1gebra S£(E), "'here Sf(E)(P,q) is the
51
set of all S!:·schemes of sort p --t q over a doubly ranked alphabet E. Recall that
~ = p:(p,q} I (p,q) E N x N} where the sets E(p,q) are pairwise (Iisjoint. With each
a(p, q) E E(p, q) we associate an atomic SE-scheme with p + q + 1 vertices (2p + 2q
pons) shown in Figure 5.3.
Figure 5.3: a E E(p,q) as an atomic scheme.
The following mappings will play an important role in the sequel:
• Ek : k --+ I is the unique one of its sort.
• wp(q) :p.q --+q. For any p.qE N. wp(q) takes a numberofthc form (j -I) '(1+1
(j E [pl,i E [q!) to i. See Figure 5.4a.
• lI:(n,p) : p' n --+ n· p is the permutation (sometimes called a perfect shuffle)
which rearranges p blocks of length n into n blocks of length p, i.e., ",(n,p) takes
(j - 1)· rI + i (j E [pI, i E [rLl) to (i - 1)· P + j. See Figure 5.4b.
• .8#s. If ,8 : r --t r is any permutation and II is a sequence (nt: .. ,nr) of
nonnegative integers with n = L:~=l ni, then i3#s : n --t n is the block by block
performance of ,8 on s, i.e, i3#s sends j + L~=l nit where j E [n.l:+ll to the
number y + j, where y is the sum of numbers ni such that J3(i) < {3(k + 1). See
Figure SAc.
a) mapping W2(3) b) mapping ,.;:(3,2) c) mapping x#{2,2)
Figure 5.4: Examples of mappings 'Wp(q), ,.;:(n,p) and fJ#s.
52
4_2 Equational Axiomatization of Synchronous Schemes
The syntactical and semantical features of synchronous systems can be com-eniently
separated.. The syntax is specified by a synchronous scheme. The semantics is tben
specified by an algebra. which associates a !i.'(ed operation with each operation symbol.
The set of identities Sf has been developed in {Bartha, 19871. In this section we
augment SF with a new uiom R, intended to capture the retiming equivalence of
synchronous schemes and develop the system of identities F~T to serve as a basis of
identities of feedback theories being the semantics of synchronous schemes. The first
set of identities towards the a.xiomatization of schemes is MG:
1. ~'IG = {MI. .., M5} is the set of magmoid identities, where
Ml: f· (9' hI = (f . y) . h if f : p -+ q, 9 : q ~ r, h : r -+ s;
~12: f + (9 + h) = (f + 9) +h if f PI --+ qt, 9 : P1 -+ th. h : P3 -+ '13;
~14, !+O=O+!=!if!,p->q;
~15: (!I'gd +(17 '92):=: (It + h) ·(91 +9'1) if Ii: Pi ~ Qi,9i: qi -+ ri,i:=: 1,2.
2_ DF = MG u {P, D1, 02, DJ}, wh,,,
P, J. + h = x#(p"",) - (f, +!,) -%#(q"q,) if /;' p; -> q;,i = 1,2_
P is the block permutation a.xiom introduced by Elgot and Shepherdson 11980 I.
This axiom postulates a .!ymmetry {MacLane, 19711 for the strict mORoidal category
determined by the uioms MG.
01, «+1)-<=(1+<)-"
02: X·E=~j
DJ, (1+0,)-.=1.
3. SF = DF U (SI, S2, .., S9), whore
51: Hil +12) = tIL +fz if II; 1+Pl-+ 1+QI,h:P2 -+fl2;
S2, 1'((x + p). f) =1'(/· (x +q)) if f' 2 +p .... 2+q,
S3, 1(/ ·(1 +g)) = (tf)'g if f' 1 +p l +q,g,q "
S4, t((l+g)·f)=g tfiff'1+q 1H,g'p q,
55: t1 = 0:
56: £.1. = .1+.1, where 1. = if";
So, t(/'«+q)) = t'((<+p)'f) if f' 1+p .... 2+q,
58: 01 • V = 0, where V = tx;
59: t(c:· V'n) = 1. 'fin E N, where V" denotes the n-fold composite of V.
-I. RF = SF u R. where
R; tl'l (f '(9+'11)) = flL({g+p:!l' J) if f :Pt +pz -+ql +rn and g. ql -+Pl'
For the interpretation of axiom R see Figure 5.5.
q, '"
Figure 5.5: Retiming identity.
54
CLAIM: The (ollo....·ing identity is provable from RF (See also Figure 5.6):
R.:~V'f=f tv for f:p-+q
PROOF. tv ';! l'(f.(p+q)) J; l'(f·z#(p,q))
,.,
~ !,(z#(q,p)·f) J; !'((q+p)'f)
~ I·tV •
Fir;ure 5.6: Proof of tdentity R. in ~ di~r;r~m.
:-iote. however. that identity R. alone is not sufficient to capture the retiming equi\"-
alence of synchronous schemes. Consider 5\ = t(!· I .9) and ~ = f((g + 1) r· f).
See Figure 5.7.
Figure 5.7.
Schemes 5 t and s,. obviously exhibit the same behavior, yet equation 5L =s,. is not
Pf'OWl.h1f! from SF u R*. Thp ('nly a.wm that· interchanges the rom~tion is P7
55
which is not applicable in this case. On the other hand, t(E' f .g) =t(g + 1) .E' f)
follo\\'5 directly from R.
Let Q be a type ofNx N sorted algebras and E be a doubly ranked alphabet. 1f E
is a set of Q.identities, then we denote by J::q(E) tbe variety of all Q-algebras in which
the identities E are valid. If ~ is a Q-algebra, then ~'1(E), or simply ~(E), denotes
the congruence relation of 11 induced by E, i.e., the smallest congruence relation for
which 11/4I(E) (the quotient of 11 by 4t(E)) becomes an algebra in J::q(E).
THEOREM 5.2.1 The congrumce relation ~(R) induced by axiom R in the algebra
Sf(E) is the retiming equivalence relation 01 $ynchronow $cheme$.
PROOF. As retiming equivalence is the smallest equivalence containing the primitive
retiming relation (retiming one box only), and 4t(R) is also an equivalence. it is
sufficient to show that if SE-scheme 5' is obtained from 5 via one primitive retiming
step, then R I- 5 = S' in the algebra Sf(E).
Let S, SO : P -+ q be Sl:'scbemes such that S' is obtained from 5 by retirning a
single box. 5 can be represented as 1"1 ((g+p)-F), F: PI+P -+ q\ +q representing the
~urroundings" and 9: ql -+ PI representing the single box. Then S' =t"l (F·(g+q))
follows from S by a single application of a'dom R. See also Figure 5.8. •
S' S'
11p
Figurl! 5.8: Con~ ~{R) as ttl!: miminz: ~uivaltttte re!ation.
56
THEOREM 5.2.2 [Bartha, 19871 Sr(L) is/reely generated by E in K:s(SF).
THEOREM 5.2.3 Sr(E)/4a{R) is freely generated by E in A.:s(RF).
PROOF. It is well known that irrree algebra over the equational class or algebras exists
then it is isomorphic to a quotient algcbra or tcrms, where the quotient is taken \vith
respect to the congruence induced by the set or a.xioms (equations). Thererore:
Sf(E) " T·SE/~(SF)
wherc T-SE denotes the term algebra over Land 41(SF) denotes the congruence
relation induced by thc set or axioms SF. Let 4t(R) denote the congruence relation
illducl..>d by the retiming a.xiom R. Then, by the second isomorphism theorem:
Sf(E)/~(R) "(T-SE/~(SF))/~(R) "T.SE/~(SFU (R}).•
[II our a.xiomatic treatment, algebraic theories can be introduced by the help or
iderttities TH = {Tl. T2}, where
Tl: 01 • / = Oq ror /: 1 --+ q
T2, w,(p)· f~ (t,f) .w,(q) [0' f' P'" q
We define the identity Rl as rollows:
THEOREM 5.2.4 In the presence 0/ the theo"} axiom TH it is sufficient to consider
axiom Rt rather than R.
PROOF. We han~ to prove that
is a consequence or SF u TH u {Rl}. The proor is an induction argument on qt. Ir
q\ :: 1 then a.xiom R is or the rorm
iI" (J (g+q;z}) = fll ((9 +1":<). f) = t({g+P'2)' f)
57
that is, R reduces to Rl. Now assume that ql ~ 1 and that the theorem is true for
qt = n_ Then for q. = n + 1 we have
t"U (9+"'))
~ f'1(J ((1 + --+0\ +- -+0. + --+ 1) -wfl(Pd-g+lh))
~ f"(J-((I+ --+Ot+ --+01 + ·-+1) Eg-{w,,(PI)+/h)))
i::t
S;~IG t"l"((w,,(pd+P2)-j-((I+ --+Ot)-g+ --+(Ot+···+1)-g)+/h))
i~ i"'(i"(((l+ --+O\+Ot)-g+ --+(Ot+ ·-+l+Otl
9+p,)·W,,(p,j·j)·(n+(0,+ ,,+0,+1)'9+"'))
iPI(i"(.c#{(l+ --+Ot+Ot)-g+ -·+(0\+- -+1+0d g+p:!)
w,,(p,j·f)«#(O,+ "+0,+1)'9+")+"'))
~ i"{:r#((O\+ --+Ot+1)-g+(I+ --+Ot+Od g+ .. +
1:, t"(((I+ ··+0,) 9+ ··+(Ot + ,,·+1)'9)+1'2)' W,.(PI)· f)
~ t"((I+· -+Ot + .-+01 + ,,+ 1)' w,,(P,)· 9 + 1'2). J)
~ t"«(g+p,).J).
See also Figure 5.9.
58
59
Figure 5.9: Proof of Theorem 5.2.4 in .JI di.JIgram.
Con~rning feedback theories. we introduce tbe well.known commutative identity
!Esik, 19801 in the following alternative way:
C:w.(p)·f'!=f'(f'lp" .. ,p,j)·w.(q) if !:I+p .... l+q,
for all n E N under every choi~ of mapping PI, .. ,PI : n --t n, where
!. Ip" .. ,,,) = Q(I,n,p)-'· (to!) ·Q(I,n,q)· (t" +n. q)
and a{l, n, m) = (1'1:(2, n)#(l, m)") . (11:(1, n) + n m). See Figure 5.10 for an instance
ofCin thecasen =3,1 =2and p=q= 1.
60
p, '"
- -
Figure 5.10: The axiom C fOf n =J.t = 2 and p:=: q = 1.
DEFINITION 5.2.5 We define the strong ret·jming feeJ1back theory FrT as
f,T=SfuTHCu(Rl}
where THe = TH u C.
COROLLARY 5.2.6 Strong retiming equivalence of synchronous schemes can be
characterized as a congruence relation ~(FrT) induced on the set of SE-schemes by
the a:(iom set Fr T.
PROOF. Follows immediately from Theorem 5.2.1 and Definition 5.2.5.•
COROLLARY 5.2.7 The free algebra in K:s{FrT) generated by E has a characteriza-
tion by equivalence classes of infinite Ev-trees according to their retiming equivalence.
PROOF'. Follows immediately (rom the fact that the free algebra in A:s{FT), where
FT = SF u THe is a feedback theory, generated by E has a characterization by
equh.tlence classes of infinite Ev-trees and Theorem 4.6. •
61
5 The Algebra of Multiclocked Schemes
In this chapter we study the general case of muhiclocked synchronous schemes. The
motivation comes from the synchronous data80w programming language LUSTRE
[Halbwachs, Caspi, Ra)'mond and PiJaud, 19911 proposed as a tool for programming
reacti\-1! systems as well as for describing hardware and program l,terification.
5.1 The LUSTRE Programming Language
Readive systems have been defined as computing systems which continuously interact
with a given physical environment, when this environment is unable to synchronize
logically with the system. This class of systems has been proposed [Harel and PoueH
1985. Berr~,' 19891 to distinguish them from trnn"fonnational systems - Le" c1assi·
cal programs whose data are available at their beginning and which provide results
when terminating - and from interactive systems \\'hich interact continuously with
environments that possess synchronization capabilities. The dataftow aspect or Lus·
TRE makes it ver:y close to usual description tools in these domains (block-diagrams,
networks or operators, dynamical samples-systems, ... ), and its synchronous inter·
pretation makes it well suited ror handling time in programs.
In LUSTRE, any constant, \"3riable and expression denotes a flow. Le.. a pair
made or a possibly infinite sequence or values and a clock, representing a sequence
or time. .-\. flow takes the n·th value of its sequence or values at the n-th clock tick.
.-\. LUSTRE program describes a network of operators controlled by a global (basic)
clock. When executing, this network receives, at each clock tick, a set of inputs
and calculates the set or outputs. The language is based on the perfect s)'nchrony
hypothesis. which means that all computations or communications take no time and
that the net is supposed to react instantenously and to produce its outputs at the
same time it rteeh-es its inputs. Other, slower docks can be defined in terms of
boolean Bows. The clock defined by a boolean Bow is the sequeoce of times at which
62
the flo"' takes the value true. For e.umple, table 6.1 shows the time scales defined
by the flow C whose clock is the basic clock, flow C t whose clock is defined by C and
flow ~ whose clock is defined by Ct.
basic time scale
Cflow false true true false false true false true
C timescale
C\ flow falst! true tru, tru,
C\ timescale
C2 flow true false true
C1 timescale
TOibie 6.1: BoolcOin clocks Olind flows.
LUSTRE: has only few elementary basic types: boolean, integer, real and one type
constructor: tuple. Complex types can be imported from a host language and handled
as abstract types. Constants are those of basic types and those imponed from the
host language. Corresponding flows ha\'C constant sequences of values and their clock
is the basic one. Variables must be defined with their types and variables which do
not correspond to inputs should be gi\'Cn one and only one definition, in the form of
equations (expressions). The equation "X ::II E;n defines variable I as being identical
to expression E in the sense that E denotes the flow of variables of the same type
elte1,C3, .. and Xi = e; for all j 2: 1 where XllX1,X3,'" denotes the flow X with the
same clock as E.
Usual operators over basic types are available (arithmetic: ... , -, ., I, div, Dod;
boolean not, and, or; relational: "', <, <"', >, >"'; conditional: if then else) and
functions can be imported from the host language. These are called data operators
and only operate on operands sharing tbe same clock.
What follows is the description of the context·free synta.'( of LUSTRE using a
~mp!e va..-1ant of Backus-Naur·Form {BNF}. <Jt:;!~> type style words enclosed in
63
angle brackets are used to denote the syntactic categories and Typevriter type style
words or characters are used to denote reserved words, delimiters or le.'cical elements
of the language. other than identifiers. c denotes the empty string.
<LUSTRE.progrnm> ::= <sequmu_of.nodu>
<~equence.of.nodu>::= <node> I <node><sequence..of_nodes>
<node> ::= node <identifier> «inpuLdecl» returns «outpuLdecl»;
<declamtion..sequence>
let
<block>
tel.
<inpuLded> ::= <variable.list > :<type> I <variuble.liJt>: <type>; <inpuLded> I
«inpuLJed» "hen <variable>8;<variable>8:bo01
<outpuLJed> ::= <inpuLdecl>
<uariuble.liJt> ::= <variable> I <uariable>,<uariableJut>
<type> ::= int I bo01 I rea.l
<declarationJtqUence> ::= E I <declarntion><declamtion..sequern:e>
<dedaration> ::= var <uoriableJut>: <type>;
<block> ::= <command>; I <commaRd>;<block>
<command> ::= <variable> • <e:tpTeSSion> I <tuple> • <expression> I
<as.serCion>
<expression> ::= <comtoRt> I <variable> I <integer.apr> I <boolean..erpr> I
<conditionaLerpr> I <tempoml.erpr> I <node.call>
<cofl.'ltant> ::= <numeral> I <boolean.comtant>
<numeral> ::= <integer> I <real>
<integer> ::= <digit> I <digit><integer>
<real> ::= <integer>.<integer>
<boolean..constant> ::= true I false
<integer.apr> ::= <tenn> I <integer_expr><arithmetie-op><term>
<term> ::= <numerul> I <variable>
<arithmetic.ov> ::= ... I - [ • II I div I .0<1
64
<boolean_e:tpr> ::= <boolean_tenn> I not <boolean.e:tpr> I
<boolean_expr>< boolean..op ><boolean_term>
<booleaTLtenn> ::= <boolean.constant> 1<variable> I <comparison>
<boolean..op> ::= and I or I xor
<comparnon> ::= <integer_expr>l<relation><integer~expr>2
<relation> ::= : 1 () 1 ( [ (: I ) 1 ):
<conditionaf.e:tpr> ::= if <boolean_expr> then <expression>1 else <expression>z
<temporaLexpr> ::= pre <expression> 1 <expression>l -) <expression>z I
<expression>l vben <expression>z 1 current <expression>
<node_call> ::= <-idenbfier>«variableJist»
<assertion> ::= assert <boolean.expT>
<variable> ::= <identifier>
<tuple> ::= <variabldist>
<identifier> ::= <letter> [ <identifier><letter> I <identifier><digit>
<digit> ::= 0 1 [9
<letter> ,,: a I 1 z 1 A I ... 1 z
LUSTRE'S specific operators are "temporal" operators pre, -). when and current
which operate specifically on flows. A flow of values from a data domain 0 is a pair
(d, r) where d is a sequence over D and T = [TI, ... , T'II is a clock of d. The basic
data domains consist of finite and infinite sequences of integers and boolean values
extended with the value .1 to represent the absence of a value, which is treated like
an)' other value - in particular, it is not smaller than other values in the domain
ordering. The clock element h, ... ,Till represents a clock that ticks as defined by the
simple clock T\ and bas been sampled using the clocks rz, . .. , Tn. The last element
of tbis sequence Tn is always the basic clock. An element (d, [TI, ... , Tn)) represents
the flow that produces the i·th clement of d at the instant when the i-th tick of T\
appears.
The operator pre is the delay operator. It memorises the last value of a flow and
outputs it when it receives a new value, transforming a sequence ele2 .. with dock
r into thp. !\FqIlPnN> l,,:p:! .. wit.h t.he same dock.
65
Table 6.2 shows the behavior of the pre operator in schematic form.
pre (E) 1. Cl C2 cJ C4 C5 C6 Cr
Table 6.2: The "previown operator.
Tne initialization operator -) maps flows E = (CIC2 ", r) and F = (ili? .. ,r)
to the flow (edzf3 .. , r). The -) operator only gives well·defined output as long as
tne input flows have the same clock. Table 6.3 snows the behavior of the -) operator
in schematic form.
e, e, e, e, e, e, e.
h h t. h t. ft f,
h f, f, fs f, ft f,
Table 6.3: The Mfollowed byn operator.
The expression E vhen B samples \'alucs from E when B is true. Here E and B
lUust be on the same clock and B must be a boolean flow. The clock of the flow
defined by E when B~onsists of tnose instants when B is true. Formally, if E = (e, r)
and B= (b. r), where e = ele2 .. and b = b1bz ... , then E vhen B = (e vhen b, [br!),
where e vhen b is the sequence c"e'2 .. such that the numbers ij are exactly the ones
in increasing order for which bij is true.
The current operator performs up-sampling, or interpolation, of a flow. For
E = (e. [br]), current(E) = (cur(e,b),r), where cur(e,b) is the sequence e' for
which
e' _ {Ci ifb; is true
1 - e:_ l ifb; is false
~ote that, according to the above recursive definition of e', Co =.1, by definitioll. As
to the sequence r,
r_{T
[61
if T is not empty
if T is empty
66
Table 6.4 shows the behavior or tbe vtlen and current operators in schema'ic rorm.
B false true true false false true false true
T~ble 6.4: S~mplinB ~ncIlnterpol~tinB.
LUSTRE program is a finite sequence or nodes which consist or a declaration or
input/output variables and a set or equations defining the output flows. The rollowing
node is the standard example how to define the basic clock counter (COUNTER) and its
application in defining the regular clock which ticks on every third tick or the basic
dock (REGUUR_CLOCK_3).
node COUNTER(vaLioit, val_iocr: int; reset: boo1) returns(n: int);
let
n ,. val_init -) if reset then val_init else preen) + vaLincr;
tel.
node REGULAR_CLOCIC3 () returns (cloclt_3: bool);
var n_3: int;
let
D_3 ,. COUI!ER(l. 1, pre(D_3) '" 3);
cloclt_3 = if (0_3 • t) then true else fdse;
tel.
5.2 The Algebra of Schemes with Multiple Regular Clocks
[n this subsection, motivated by the clock analysis of LUSTRE, we develop the algebra
Sf,(E) or synchronous schemes with multiple regular docks, Le., clocks that tick
every first, second, tbird etc. instant or the basic clock. The arbitrary clocks are
intentionally omitted since tbe issue becomes technically too complex.
DEFINITION 6.1 The algebra 5r,(E) or generalized synchronous schemes consists or:
• Objects: {So n} or sort p ~ q, S p ~ q in Sf(~) and ~ E N.
67
(5, n) stands for generalized scheme. Each input signal is repeated n times and
outputs are read in kn + 1 c:ydes only, where k =0, 1, 2, ..
• Constan'" 1 =(1, 1),I =(I,I),O =(0, 1),< =«, 1),0, =(0,,1)
• Operations:
1. Compo.!ition: (f, m) - (g,n) =(SLOWn,(f)' SLOWm, (g), Ic:m(m,n))
if f: p -+ q.g: q ~ r. where lc:m(m. n} is the least common multiple ofm and n
with tn' =~, n' = \cm~"'l and SLOWc(S) is the c·slow of S.
2. Sun" (f, m) + (g, n) = (SLOW",(f) + SLOWm,(g), lcm(m, nl)
if f: PI ~ qhY: P'J ..... 1/2.
if f : 1+ p ~ 1 +q, where t n means feedback with interjecting n registers.
THEOREM 6.2 The aiyebm Sf,(!:} !iati:lfie.s !icherne idenlitie4 RF.
PROOF.
if f: P~ q, g: q ~ r. h. r ..... $ and X.y,:; EN
(f,I)' ((g,y)' (h,II)
(f,x) (SLOW~(9)'SLOW~(h),lc:m(y,:;))
(SLOWkm{~.~m!•.• !!(f)· SLOWlcmk~~;:"ll(SLOW~(9)SLOw~(h)},
lcm(L,lcm(y, I)))
(SLOWlcmr•.I~mrl"Il(f)· (SLOW1c!T!!"~!r .• n(g)· SLOWlcml•.kmc, .• " (h)),
lcm(I,lcm(y,I)))
(SLOW~(SLOW1cm!~Ul.~l(f) SLOW~(g)).SLOWlcnulc":,,.,!.•l(h),
lcm(lcm(I,y),II)
(SLOWk~"';l,.".,,(f). SLOW_"':!,."." (g), Icm(lcm~I' y), II)) . (h, I)
68
~ ((f,%)' (g,.n· (h, ,)
M2 (f,%) + ((g,.) + (h, ,) =((f, %) + (g,.n + (h, ,)
if f :PI -+ qh 9 : P2 -+ /h, h : 1'3 -+ 93 and X, y, zEN
(f,%1 + ((g,.) + (h, ,n
(f. x) + (SLOW~(g) +SLOW~(h),lcm(y,.:;))
(SLOWlcmu.~I"'ll(J) + SLOWlcnv.:~:rt!(SLOW~(9) + SLOW~(h)).
km(%,km(y,,)))
(SLOWIanr.~"..n(J) + (SLOWIanr~'?""'lI(g) + SLOWlcmh'~"'ll(h)),
km(I. km(y, ,)))
(SLOWz:(SLOWkmlk';'j ... I... (J) + SLOWkmck,;,:, .•I.• , (9)) + SLOW~(h),
Icm(lcm(.c.y),z))
(SLOW kmrlcml'"I ... U) + SLOWkmllcml•." .• ,(g), Icm(lcm(x, y), .:))) + (h,:)
• •• x
((f,I) + (g, .1) + (h, 'I
M3 (p,l) (f,y)~(f,y)·(q,llandyEN;ff'p.....
(P, 11· (f,y) (SLOW,(P) f, yl
(p. f,y)
(f .,.)
(f SLOW,(.), y)
(f,y)' (., I)
M4 (f,%) + (0,1) = (0, 1) + (f,%) if f' p ..... and %E N
(f,%) + (0, 1) (f + SLOW,(O),%)
(f +0,%)
(O..j. !,z)
69
(SLOW.(O) + fl.x)
(0,1) + (I,x)
M5 ((I"x,) ' (g"y,)) + ((h,x,) (""",)) = ((I" x,) + (j"x,)) ((g"y,) + (g"",))
((I" x,) ,(g"y,)) + ((j"x,) , ("" y,))
(SLOW~(Jd'SLOW~(91),lcm(xl,Yd)+
(SLOW~(h)' SLOW~(92)' lcm(xt,m))
(SLOWltmllcm(.,.,,!.Icm(""2Jl(SLow~{fd· SLOW!£!!!!.!.w.ll (gl)) +
iern(.,.,() '( r,
SLOWkmllcm(.,.,,),lcm(••.••Il(SLOW~(Iz)·SLOW~(g2))'
icm('t.r~1 '1 '1
lcm(lcm(Xt. Yll, Icm(x2' !h)))
{(SLOWlcmlkllll.!.~!11.Icm{.2.")) (il) . SLOWlcmttcml"",\I.lcrnl.",.ll (gl)) +
(SLOWlcm(lcm(.,.~~I.Jcml''''21l (h) . SLOWlcmllcml."W.lcml"'12l1 (92)),
1cm(lcm(xLI YI),lcm(xz, Y2)))
(SLOWlc,n(km(.!.:~!.km(" .•• n (fl) + SLOWtcmllcml"':r,lcmlrr .•• 1I (h)) .
(SLOWlcmllcml.,.~~I.IcmI".n)l (91) + SLOWkmllcm(., ':r,lcn'hl .•?I,(fh)).
Icm(lcm(xt,x2},lcm(YI,!h)))
(SLOWlcmllcml, •.•• I.lcml!',uJl (5LOWkmlo, .• 21 (/t) +SLOW~ (h)) .
km('I"ll '( '1
SLOWlcmllcllll.,.~'I.lcm(!!.•'II(SLOW~(gl)+SLOW~(!h)),
lCmr'I"2l '1 '1
lcm(lcm(Xt, I'l), Icm(Yl! Y2)))
(SLOW~(Jd + SLOWlcml:t2l(hl, Icm(Xl,X2))'
(SLOW~(91) +SLOW~(92),lcm(YllY'l))
(If" x,) + (/"x,)) , ((g"y,) + (g"",))
P (I" x,) + (I" x,) =x#((p"l), (p"I)) , ((I"x,) + If"x,)) , x#((""l),(q" I))
iff;:Pi ~qi, i= 1,2 andz\tx2 E N
70
(SLOW~(fI) + SLOWkml:t.I(!2), Icm(zhx2))
(x#(p"P2) .(SLOW~ (h) + SLOW~(fd) x#(fh, qd, Icm(zh %2))
(x#((p" 1),(",,1))' (SLOW~(h)+
SLOW~(Jd) 'Z#((fh, l),(qll 1)),lcm(xl,I2))
x#((p" I),(p" I)) ((f" x,) + (f"x.)), x#((q" 1), (0,,1))
D1 ((,.1) + (I. I))· (E, I) = ((1, I) + (E, 1))· (E, 1)
Follows directl~· from SF and the definition of constants.
02 (x. 1)· (E. 1) = (E, 1)
Follows directl~· from SF and the definition of constants.
03 ((I, I) + (0" 1))· (E, I) = (I, I)
Follows directly from SF and the definition of constants.
51 t((!<,x.) + (f"x,») =t(f"x,) + (f"x,)
if j: 1+PI--+ I +qloh:P2 --+fh andxt.x2 E N
i(SLOW~(fl) +SLOW~(j,), Icm(xhx2))
(ilcm(:rl~~) (SLOW~ (JI) + SLOWlcm(:J.~'f (j,)), icm(xl' %2)}
((ilcm(:r1.z1)SLOW~(Jl)) +SLOW~(h),lcm(zbx2))
(fz1ft,%I) + (h, X2)
t(!<,x.) + (f"x,)
52 t' (((x, I) + (P, I)) . (f,e)) =t' ((f,e) . ((x, 1) + (0, 1)))
ifj:2+p--+2+qandceN
n
f(((z, I) + (P, I))· (f,e))
f ((z+ p) . (f, e))
f«(SLOW.(Z) + SLOW.(P))· f,e)
(t~«(z+p)'f,e)
(f,(f. (z+ q)),e)
f (f . (SLOW,(Z) + SLOW.(q)), c)
f«(f,e)· «z, I) + (q, I)))
53 t((f.%). ((1.1) + (g,,))) = tU,z)· (g, ,)
if J: I + p .... 1 + 9,9: q -+ r and X,: E N
t((I,Z)' ((1,1) + (g, 'Ill
t((I, ,,). (sLow,(I) + g), 'II
t(SLOw~{f).(sLowr~(l)+ SLOW~(9)), Icm(x, z)l
(tkm«"' (SLOW~(I) . (1 +SLOW~ (g))), leon(z, ,))
«tlan(z.:ISLOWInru;,.II(J))· SLOW~(g),lcm(:r,=))
(t.f,z)· (g,,)
t(l,z)'(g,,)
54 t((I.I) + (g, g)) . (f, ,)) =(g, g) . tU, ,)
if f: 1+q -+ 1+r, 9 :p-+q and y,zE N
t«((l, I) + (g,g))' (I, ,))
t(sLOw.(I) +g,g). (I. ,))
t(sLOw,,(l) + SLOW""l"'(g)). SLow~(f),lem(g,,))
(t_",((1 +SLOW""l"'(g)). SLOW""l"'(f)), lem(y,,))
(SLOW~",,(g). (tlcm(,-=)SLOW~'''I(J)).lcm(JI.z))
72
Ig,y) . (t,f,')
Ig,y)'W,')
55 t(l, 1) = (0, 1)
Follows directly from SF and thp. definition of constants.
56 1<,1)' l.l, 1) = (.l, I) + (.l, 1) whe".l =t,
Follows directly from SF and the definition of constants.
57 tl(f.x)· 1(', 1) + Iq, 1))) =t'(II',1) + (p, 1)) (f,x))
if f; 1+ P -+ 2 + q and x E N
t((f,x), ((" 1) + Iq, 1)))
t((f,x) I<+q))
t(f (sLOw,ld + SLow,lq)),x)
t(f (<+q),x)
(t,If' (, + q)),x)
(t:((<+p)·f),x)
fl(sLOw,«) + SLow,lp)), f,x)
fl(I<, 1) + Ip, 1)) If,x))
58 (0" 1) 1'1,1) = 10" 1) where V = tx
Follows directly from SF and the definition of constants.
59 t(1', I}· ('1,1)") = (.l, 1)
where (V, l}R denotes the n-fold composite of (V, 1)
Follows directly from SF and the definition of constants.
R t"(If,x)·(lg,y}+lq"l))) = t"(I(y,y) + (p" 1)) If,x))
73
1" ((f,x) «g. y) + (q" 1)))
1" (If, x) -(g + SLOW,(q,), y))
r'1(SLOW~(f). (SLOW~(9) +SLOWrw(q!l),lcm(z,y))
(t'':.*..)(SLOW~{f). (SLOW~(9) +ql)),lcm(x,y))
(t"~~.,)«SLOW~(9)+ P2) .SLOW~ (I)). !cro(z, y))
r'l «SLOW~(9)+ SLOW~~(P2»· sLow~{f),lcm{x, y»
1" «((g,y) + (I'l, 1)) -(f,x)) •
DEFINITION 6.3 We define tne L·eqivalence of generalized synchronous schemes as
follows. Let (F. m) and (0. n) be generalized synchronous schemes. Suppose that for
C\"C[')' sufficiently old configuration c of (F, m), there C,.'tists a configuration d of (G, n)
such that when (F, m) is started in configuration c with each input signal repeated m
times and (G. n) is started in configuration d with each input signal repeated n times,
the two schemes exhibit the same behavior, Le.. the outputs in cycles km + 1 and
1011 + I. k = O. 1. 2•.. are the same. Then scheme (G. n) can simulate (F. m). If two
generalized synchronous systems can simulate each other, then they are L-equivalent.
linrortunately, not all (5, nl schemes are suitable. Consider the rallowing e.umple:
Basic clock I 2 3 4 6 7 8 9 10 11
Input x, x, x, x, x, x, x, x, Xs Xs x,
(V,2) Output .l x, Xs x,
"
's
(V',210utput .l x, x-, x,
'. 's(V3,2) Output .l .l x,
"
x, x,
Table 6.5: The behavior of (V,2). (V2,2) and (V3,2) during first elevtn pulQtions.
Then
(V,2) '" (V,2)
74
(V,2) (V',2) but
(V,2) (V,2)~(V',2) # (V',2) = (V,2) (V',2)
where ~ denotes the L-equivalence relation. In other words, the L-equivaleoce is not
presen"ed by all (5, n) schemes. For this reason lI..e introduce the following restrictioo
to Sf,(E).
DEFINITION 6.4 The algebra 6f.(!:) consists of all (5, nj schemes such that S is
strong retiming equivalent to some appropriate n-slow SE·scheme S'.
It is now obvious that (V, 2) fI. 5f.(E) since there is no 2·s1ow scheme which is
Leiserson equivalent to (V,2). On the other hand, (V2 ,2) E 61,(1:) since (V'l,:?) "'"
SLOW:!(V') = V'l.
THEOREM 6.5 The characteri!tic function
{
I if (5, n) E 61.(EI
«5. n) ~ 0 otherwise
is 4 recumve function.
PROOF. (a) BiaCUMible .scheme.!. ReeaJl from [Bloom aDd Tindel. 19i9] that a
ftowchart scheme is biaccrssibie if it is accessible and every \-utex is the starting
point of some path whose endpoiot is an exit \l~rtex. An SE-scheme S is biaccessible
if the FE-scheme Ii(5) is such. In other words, 5 is biatteSSible if it is accessible and
every vertex can be reached from some input channel by a directed path.
Let (5, Tt) be a biaccessible generalized scbeme. If (5, n) E 6f.(E) then there
exists SE-scheme 5' such that 5 ..... n5'. According to Theorem 6.1.5 {Bartha, 19941,
where 5nuu; and S'mu are SE·schemes such that R.....u : 5 -+ Srruu:, R...- : nS' -+
n5:"- and
R....... (t.) = min{w(p) !p is an input path leading to t'}
75
is a legal retiming vector. Since Smu N, nS:"u' Sm.u: is an n-slow of some SE·scheme
F. i.e., 5",u = nF. Therefore, in order to decide whether (S,n) E 6f.(E) or not,
it is sufficient to compute total weights of all entry-to-exit paths w(p) and directed
cycles w(z) in 5, and check ratios ~ and~. If these ratios are integers then
(5,0) E 5f,(E), othecwise (5,0) ¢ 5f.(E),
(b) The general case. First of all, observe that ground schemes, i.e., schemes without
input channels, belong to 6f.(E). This is indeed true by Theorem 4.6, since in infinite
trees without variables it is always possible to rearrange the registers (V-nodes) from
any regular pattern to any other regular pattern by using the transformation "'.
See also Figure 6.1. Hence, given generalized scheme (8, n), it is sufficient to isolate
ground subschemes and tcst only the biaccessib!e part. •
Figure 6.1: Ground schemes belong to el.(!:).
76
THEOREM 6.6 The algebra 6f.(!:) is 4 3Ubaigebrn oj Sf~(!:).
PROOF.
1. Compo.riaon
Let (F,m),(G,n) E 51.(E). Then (F,m) - mE" and (G,n) - nG' [o"ome
appropriate scnemes rand C'. We nave:
(F,m)· (G.n) ~
(SLOw~(F)'SLOW~(G),lcm(m,n))- SLOW,m,m,I(E" G')
Hen", (F,m) ·(G,n) E 5f.(E).
2. Sum
Let (F.m),(G,n) E 61.(!;). Tnen (F,m) - mF' and (G,n) - nG' for some
appropriate scnemes F' and G'. We nave:
(F.m)+(G.n)=
(SLOW1tm:,.... 1(F) + SLOWkm~...nl (G), Icrn(m, n)} - SLOWkm(m,Il)(F' + G')
Hen", (F,m) +(G,n) E 5f.(E).
3. Feedback
Let (F.m) E 6f.(!:). Then (F, m) - mF' for some appropriate scheme F". We
have:
t(F,m) ~ (tmF.ml - SLOWm(tE")
Hen", f(F, m) E 5f.(E).•
COROLLARY 6.7 6f.(E) satisfies scheme identities RF.
PROOF. Follows directly from Theorem 6.2 and Theorem 6.6. •
DEFINITION G.8 We define tbe relation 81. on 61.(£) as follows. Let (F, m), (G, n) E
5f.(E). Thene
(F, m) '= (G, nHed .. SLOW""!.=' (F) - SLOW""""" (G)
77
THEOREM 6.9 Th~ relation at iJ a congruence relation 0/ Sf,(El.
PROOF. (a) at is an equivaleoce relatioo.
1. R~jfuivity
let (F, m) E 6f.(E). Since F ..... F we bave (F,m) == (F, m)(8Ll.
2. Symm~try
Lot (F,m),(G,n) E 61.(!:) and (F,m) " (G,nH8Ll. Then
(F,m) " (G.nH8Ll'"
SLOW~(F) SLOW~(G)=>
SLOWkmr....., (G) SLOW';,......, (F) =>
(G, n) " (F, mH8d
3. Transitivity
Let (F,x),(G,y),(H,') E 61,(!:) and (F,x) " (G.yH8Ll. (G,y) " (H,,)(8Ll.
Then
(F.x) " (G.yH8Ll and (G.y)" (H,'H8Ll'"
SLOW~(F}.... SLOW~(G) and SLOW¥(G) .... SLOW~(H) =>
SLOW~(F)"" SLOW~(GI and
SLOW~(G) .... SLOW~(H) =>
SLow~(F) .... SLOW~(H) =>
(F,x) " (H.'H8Ll
(b) at satisfies the substitution property.
1. Compo"ition
Lot (F"m.) " (G"n,H8L ) and (F"m,) " (G"n,H8Ll. Then
(FII mil· (F'2. m1) =
(SLOW~(Fd .sLOw~(F'2),a) ==6t
(SLOWlaIu&.JoI(F,)· SLOwknwd\(F'2).lcm(a.bll =
~ ----;;;- .
78
(SLOW~SLOW~(FI)' SLOW~SLOW ;!;-(F:z), lcm(a,b)) .....
(SLOWlcm;o..lSLOWfi(G t )· SLOW¥SLOW..;-(G:z), Icm(a, b)) =
(SLOWkn:.\o.~l(Gd· SLOW~(G:z),lcm(a,b)) ==et
(SLOW~(Gd . SLOW-.;-(G·l),lcm(nt,1t2)) = (Gll nd' (G2,n2)
where a = Icm(mb m:z), b = lcm(nh rI:z), c = lcm(m(, nd and d = Icm(m2' n2)'
2. Sum
Let (Fhmd == (Gltnd(aLl and (F2,m2) == (G2,fl2)(ac.J Then
(Fltrnd + (F2,m2) =
(SLOWk::::Ol(Fd +sLOw-.q(F2),a) ==St
(SLOWlcn;:::.~l(FI) +SLOW~(F2),lcm(a,b)) =
(SLOWlcm(a.~lSLOW...£.(FI) +SLOWlcmldlSLOW..L(F2),lcm(a,b)) '"'-
.........----"'1 ----r-"'2
(SLOWlcm~dlSLOW;!j(Gl)+ SLOW~SLOW :;(G2 ), Icm(a,b)) =
(SLOW~(GI) +SLOWlc~(;.~1(G2),lcm(a,b)) ==et
(SLOW ~(Gl) +SLOW-.;-(G2 ),b) =(Gt.RI) + (G2,n2)
where a = Icm(m[, ffl2), b = lcm(nl' rl:!), c =Icm(mh nd, and d = lcm(m2,n2),
3. Feedback
Let (F, m) =Ie, nHed. Then
i(F,m) = (tmF,m) =6t SLOWlcm~"'.... (t"'F,m) .....
SLOW~(t"e,n) =s, It"e,n) =t(e,n) •
THEOREM 6.10 Two generalized synchronous schemes (Sh nl) and (82 , n2) are
i.-equivalent if and only if they are at.. equivalent
PROOF, Follows directly from Definition 6.8, Theorem 6.9 and Theorem 4.10.•
79
Conclusion
The notion ofa synchronous system allowed the introduction of transformations useful
for the design and optimization of such systems; slowdown and retiming. Retiming is
important transformation which can be used to optimize clocked circuits by relocating
registers so as to reduce combinational rippling. It has an interesting property that if
two systems can be joined by series of primiti\"C retimillg steps, i.e., shifting one layer
of registers from one side of a functional element to the other, then those two systems
exhibit the same behavior, as proved in [Lciserson and Sa."e, 1983aJ. Concerning
slowdown transformation, the main adventage of c·slow circuits. Le, circuits obtained
from the original circuit by multiplying all the register counts by some positive integer
c. is that they can be rClimed to have shorter dock periods than any retimed version of
the original. Slowdown transformation does not preserve the equivalence of schemes
in the strictest sense. The c-slow circuits perform the same computation as original
circuit. but take c times as many clock ticks and communicate \\'ith the host only on
e,-ery cth clock tick. The impact or slowdown on the behavior of synchronous schemes
is the following: not any two synchronous schemes are retiming equi''alent. However,
for two synchronous schemes that cannot be directly retimed to each other, there
might be appropriate slowdown transformations such that, after these transfonnations
are applied. one gets synchronous schemes that are already retiming equivalent. A
new relation is obtained by taking the join or the retiming and slowdown relations
and is called. slowdown retiming equivalence. One or the contributions of Chapter 3 is
the proof of the fact that the slowdown retiming equivalence relation is decidable for
synchronous schemes. Two synchronous schemes are said to be strongly equivalent
ir they exhibit the same behavior under all interpretations, that is if they can be
unfolded into f!."(actly the same tree. The new equivalence relation can be obtained
as a join or strong and retiming equh'3lence. In [Bartha, 1994] it was proved that
strong retirning equivalence relation is decidable ror synchronous schemes. The nf!."(t
80
major contribution or Chapter 3 is tbe proof tbat the strong slo.....do.....n retiming
equivalence relation, which is the join or strong, slo.....down and retiming equimence,
is also decidable.
The concept or equivalency or synchronous systems .....as introduced in [Leiserson
and Saxe, 1983a1 in a rather intuitive and inrormal manner. The most important
contribution or the Thesis is the proor in Chapter" that two notions, Leiserson
equivalence and strong retiming equivalence, coincide. The very same notion or the
equivalency or synchronous schemes has also been characterized in tenns or finite
Slate top-down tree transducers.
The syntax or a synchronous scheme is specified by a directed, labelled, edge-
weightl'<i multigraph. The semantics or a synchronous scheme can then be specified
by the algebraic structures called reedback theories. Synchronous schemes ha\'e been
a.xiomatized equationally in [Bartha, 19871 capturing their strong behavior. The
major contribution or Chapter 5 is tbe introduction or retiming identities and the
construction or the reedback theory capturing the strong retiming behavior or syn-
chronous schemes.
The motivation ror the results or Chapter 6 stems rrom two sourres: multiphase
clocking (clocking schemes that use more phases and consequently offer more ftexi-
bilit~· in adjusting tbe relati\"e timings or the runctional elements) has been lert as a
rurther topic in [leiscrson and Saxe, 1983a1 and the notion or multiple clocks defined
in terms or boolean-valued ftows or the synchronous dataflow programming language
LUSTRE. The major contribution or Chapter 6 is the construction or the general
algebra or multiclocked schemes. For simplicity, only schemes with multiple regular
docks, i.e., clocks that tick every first. second, third etc. instant or the basic clock,
ha\'e been considered. The arbitrary clocks are intentionally omitted since the issue
becomes too complex technically. Also, the intuitive notion or l-equivalency between
t .....o generalized schemes is introduced and shown to coincide with the rormal charac·
terization of 8 L equivalency.
81
References
[11 ARNOLD, A. AND DAUCHt7. M. (1978, 1979), Theeorie des magmoides,
RAlRO In/onn. Thear. Appl. 12,235-257 and 13, 135-154.
[21 BARTHA, M. (1987), An equational axiomatization of systolic systems, Theo-
retical Computer Science, 55, 265-289.
[3) BARTHA, M. (1989), Interpretations ofsynchronous flowchart schemes, in "Pro-
ceedings, ith ConferenC1: on the Fundamentals of Computation Theory, Szeged"
(Edited by J. Csirik, J. Demetrovics and f. Gecseg), Ucture Note.! in Computer
Sclena. 380, 25-34. Springer-Verlag, Berlin.
{~I BARTHA, M. (l992a), foundations of a theory of synchronous systems, Theo·
retiaJl Computer Science, 100, 325-346, Else\'ier.
[51 BARTHA, M. (1992b), An algebraic model of synchronous systems, lnfonnation
and Computation, 97, 97-131.
(61 BARTHA, M. AND GOMBAS, E. (1994), Strong retiming equh-alence of syn-
chronous systems, Technical Repon No. 9404, MUN.
[i] BERRY, G. (1989), Real time programming: Special purpose or general purpose
languages, in IFlP World Computer Congreu, San Francisco.
181 BILSTEIN, J. AND OAMM, W. (1981). To(Hiown tree-transducers for infinite
trees, in "Proceedings, 6th Colloquium on Trees in Algebra and Programming,
Genoa" (Edited. by E. Astesiano and C. BOhm), Lecture Note& in Computer
Science, 112, 9i-131, Springer-Verlag, Berlin.
[91 BLOOM, S. L. AND £511(, Z. (1993), Iteration Theories, The Equational Logic
of Iterative Processes. Springer-Verlag, Berlin.
(10) BLOOM, S. L. AND TINDELL, R. (1979), Algebraic and graph theoretic char-
acterizations of structured ft~-chan schemes, Theoreticoi Computer Sciena, 9,
~286. Elsevier.
82
[Ill ELGOT, C. C. (1975), Monadic computations and iterative algebraic theories,
in ~Logic Colloquium '73, Studies in Logic and tne Foundations of Mathe-
matics" (H. E. Rose and J. C. Shepherdsoo, Eds.), 175-230, North·Holland,
.-\.msterdam.
[12] EtGOT, C. C., BLOOM, S. L. AND TINDELL, R. (1978), On the algebraic
structure of rooted trees, Journal of Computer and System Sciences, 16, 228-
242.
[13] ELGOT, C. C. AND SHEPHERDSON, J. C. (1979), A semantically meaningful!
characterization of reducible flowchart schemes, Theoretical Computer Science,
8(3),325-357.
[141 ELGOT. C. C. AND SHEPHERDSON, .I. C. (1980), An Equational Axiomatiza-
tion of the Algebra of Reducible Flowchart Schemes, IBM Research Report RC
8221.
[15] ENGELFRIET. J. (1975), Bottom·up and Top-down Tree Transformations· A
Comparison, Mathematical Systems Theory, 9(3), 198-231.
[16] ESIK, Z. (1980), Identities in iterative and rational theories, Computational
Ljnguistic.~ and Computer Languages, 14, 183-207.
[17j GRATZER, G. (1968, 1979), Universal Algebra, Springer·Verlag, Berlin.
[181 HALBWACHS, N., CASPI, P., RAYMOND, P. AND PILAUO, D. (1991), The
synchronous dataflow programming language LUSTRE, Proceedings of the IEEE,
79(9), 1305-1320.
119] HAREL, D. AND PNUELI, A. (1985), On the development of reactive systems, in
"Logic and Models of Concurrent Systems", NATO, Advanced Study Institute
on Logics and Models for Verification and Specification of Concurrent Systems,
Springer-Verlag.
[201 JENSEN, T. P. (1995), Clock Analysis of Synchronous Dataflow Programs,
in "Proc. of ACM Symposium on Partial Evaluation and Semantics-Based
Program Manipulation", San Diego CA, 15&-167.
83
1211 JOHNSON, D. B. (1915), Finding all the elementary circuits of a directed graph,
Sl-\M, Journal on Computing, 4(1), n·84.
[22] KUNG. H. T. AND LEISERSON, C. E. (1978), Systolic arrays for VLSI, in
"Sparse Matrix Proceedings", Sl>\M, Philadelphia, 256-282.
123 J KUNG, S. Y. (1988), VLSI array processors, Prentice Hall. Englewood Cliffs,
~.J.
[24] LAWVERE, f. W. (1963), Functional semantics of algebraic theories, Proc.
Nat. Acad. Sci. U.S.A., 50(5), 869-8i2.
(251 LEISERSON, C. E. (1982), Area-efficient VLSI Computation, ACM.MIT Press
Doctoral Dissertation Award Series 1. .-\.CM·MIT, New York.
[26] LEISERSON. C. E. AND S....XE, J. B. (1983a), Optimizing Synchronous Sys-
tems, Journal of VLSI and Computer Sy"tem3, 1(1), 41-67.
Pi! LEISERSON, C. E., ROSE, F. M. AND SAXE, J. B. (l983b), Optimizing
Synchronous Circuitry by Retiming, Proceedings of 3rd Caltech Conference on
VLSI (Edited by Randal Bryant), Computer Science Press, 87·116.
[28] MACLANE. S. (1971), Categories for Working Matbematician, Springer-Verlag,
Berlin.
129 J MURATA. T. (19n), Circuit theoretic analysis and synthesis of marked graphs.
IEEE TraJl.!. on Circuit! and Sy"tems vol. CAS-24 7, 4()0..405.
[30] WRIGHT, J. B., TH....TCHER. J. W., WAGNER, E. G.....ND GOCUEN, J. A.
(1976), Rational algebraic theories and fi:xoo-point solutions, in "Proceedings,
17th IEEE Symposium on Foundations of Computer Science, Houston, Texas."
147·158.
84
Index
Algebra
N x N 50"00, 50
5f.(E) as a subalgebr.> of Sf. IE) of
generalized schemes, 75
1:. 20
Sf(E) of synchronous schemes, 51
Sr~(I:) of generalized schemes, 67
constants, 51
equational class of, 57
frcc, 57
magmoid,50
operations
composition. 50
sum. 50
feedback. 50
partial,21
quotient algebra of terms, 57
of terms, 57
\'ariet~' of, 56
Algebraic theory, 51, 57
Arnold,50
.-l'tiollls (identitie$)
commutative identity C. 60
retiming R, 54
retiming R·, 55
retirning RI, 57
system MG. 53
system OF, 53
system SF, 54
system RF, 54
theory identities TH. 57
Bartha, 1, 19.35,40,47,51,53,57,75
Bloom, 20, 75
Category
Fldn, p) of 80\\'Cbart schemes, 20
Syn~ of synchronous schemes, 24
85
objeets (synchronous vs. flowchart
schemes),24
strict monoidal, 50
Dauchet,50
Equational a.xiomatization of schemes,
2,53
Elgot, 50, 53
Engelfriet, 42
Esik, 20, 60
Feedback theory, 2, 53, 50, 61
FT,61
F,T.61
Flowchart schemes, 1, 2, 19. 20, 22, 24,
25. -15, 50. 51
Graph
communication graph as a structure
of a systolic system. 10
constraint graph, 17
finite, rooted, edge--weighted. direct·
ed multigraph as a model of a
synchronous system, 12
fundamental circuit of a directed
graph, 30
simple path, 13, 30
strongly connected, 23
vertices representing functional et.
ements,12
weights representing registers along
interconnections, 12
Gratzer, 21
Kung, 1,3,4
L-equivalence, 74, 80
Lawvere,51
Leiserson, 1,3, 4, 12, 13, 16, Ii, 40,
47-49,75,80
Lustre - synchronous dataflow program-
ming language, 2, 62-67
MacLane,50
)/Iealv automaton, 9
)'Ion~dic computation (Floy/chart alga..
rithm), 19. 24
)'Ioore automaton. 9
)'lur313.3O
Retiming, 1-1-16, 18, 25
Retiming Lemma, 16. -19
Sa.xe. 1. 12, 13, 17,80
Semisystolic system, 10
Shepardson, 50, 53
Signature (ranked alphabet), 19, 25
Slowdown. 16-18
Slowdown rctiming equivalence, 1, 2.
2i-29
decidabilityof. 29-31
Strong behavior. 1-1, 23
Strong retiming equivalence. 2. -10. -15.
-li-49, 61, 75
Strong slowdown retiming equivalence.
1,2,27,3-1-35
decidability of, 35-39
Svnchrollous schemes, 1. 2. 19, 23-26.
. 27. -10, -15. 49, 50, 53, 56, 61, 80
accessible, 23
atomic,52
biaccessiblc, i5
86
generalized (multiclockcd), 67
ground,76
minimal,23
scheme congruence, 21
strongly equivalent, 14
tree-reducible, 23
Synchronous systems, I, 2, 12-19, 40,
53,80
behavior or, 14
configuration of, 14
equi\'alence. 14
simulation, 14
Svstolic Com'ersion Theorem, 12, 17
S;'Stolic system, 1-8, 10-12, 14. 17, 18
Tindel,75
Tn'.,
E-trees, 21,-12
E",-trees,45
T... set of finite E.trees. 42
rio set of infinite E.trees. -13
~ set of infinite Lv-trees, -15
unfoldings of flowchart schemes, 22
as strong behavior of vcrtiCt!S, 23
unfoldings of synchronous schemes,
25
as strong behavior of schemes, 25
finite state top-down tree transducer
definition of, 42
definition of Oil' 46
definition of 'fn, 43
regular, 43
VLSI,l




