A global routing technique for wave-steered design methodology by Funabiki, Nobuo et al.
Engineering
Electrical Engineering fields
Okayama University Year 2001
A global routing technique for
wave-steered design methodology
Nobuo Funabiki Amit Singh
Okayama University University of California
Arindam Mukherjee Malgorzata Marek Sadowska
University of California University of California
This paper is posted at eScholarship@OUDIR : Okayama University Digital Information
Repository.
http://escholarship.lib.okayama-u.ac.jp/electrical engineering/71
A Global Routing Technique for Wave-Steered Design Methodology 
Nobuo Funabiki', Amit Singh2, Arindam Mukherjee2, Malgorzata Marek-Sadowska2 
'Department of Communication Network Engineering, Okayama University, Japan 
2Department of Electrical and Computer Engineering, University of California, Santa Barbara, USA 
Abstract 
Wave-Steering is a new circuit design methodologv to real- 
ize high throughput circuits by embedding layout friendly 
structures in silicon. Latches guarantee correct signal 
arrival times at the inputs of synthesized modules and main- 
tain the high throughput of operation. This paper presents a 
global routing technique fo r  networks of wave-steered 
blocks. Latches can be distributed along interconnects. 
Their number depends on net topologies and signal ordering 
at the inputs of wave steered blocks. Here, we route nets 
using Steiner tree heuristics and determine signal ordering 
and latch positions on interconnect. The problem of total 
latch number minimization is solved using SAT formulation. 
Experimental results on benchmark circuits show the effi- 
ciency of our technique. We achieve on average a 40% latch 
reduction at minimum latency over un-optimized circuits 
operating at 250 MHz in 0.25pm CMOS technology. 
1. Introduction 
Wave-steering is a layout-friendly design methodology 
to realize high throughput circuits by embedding synthe- 
sized modules in silicon. These synthesized modules are 
variants of Binary Decision Diagrams (BDDs) [2]. Since the 
first work in [4], the wave-steering design methodology has 
been extended to ASIC structures [ 5 ] ,  reprogrammable 
FPGAs [6][7], and finite state machines [8][9]. In each 
instance, throughput has been improved as much as 3-4 
times over non-wave-steered circuits. A circuit composed of  
wave-steered blocks inherently utilizes latches to guarantee 
correct arrival times at their inputs. Latches are used for sig- 
nal skewing and for high frequency operations on the inter- 
connects between blocks. In this paper we present a global 
routing technique for wave-steered circuits that determines 
routing topologies using rectilinear Steiner trees. Next we 
find an optimal assignment of signals to block levels and 
latch positions on interconnects for maximal latch reduc- 
tion. 
The ability to guarantee by construction a specified sys- 
tem clock period is an attractive feature of the wave-steering 
design methodology. To achieve the operation with a speci- 
fied clock frequency we utilize a variant of pipelining within 
the blocks and on interconnects. Unlike classical micro- 
pipelining schemes, this system does not introduce logic 
redundancy. 
Our global routing and latch assignment technique are 
performed in four steps. In the first step, the rectilinear rout- 
ing path for each net is found by using the 1-Steiner Tree 
algorithm [9]. In the second step, latches are assigned on 
interconnects to satisfy delay constraints. In the third step, 
signal arrival times to logic blocks (LBs) are evaluated for 
the minimum achievable latency. At that time the lowest and 
highest LB input levels assignable to each signal are calcu- 
lated. Finally, in the last stage, signal input levels are deter- 
mined by formulating this problem as the Satisfiability 
problem (SAT) and applying an existing SAT solver [l 11. To 
simplify the formulation we do not consider routing 
resource constraints and/or congestions, or circuits with 
feedbacks. We also assume that all LBs are complete binary 
trees, which is the case in Field Programmable Gate Arrays 
(FPGAs)[6][7]. This assumption prevents any LB deforma- 
tion due to input assignment. 
We organize the rest of the paper as follows: Section 2 
briefly introduces the underlying ideas of the Wave-Steering 
design methodology. We formulate the problem in Section 
3. Section 4 describes the four stages of our global routing 
and latch assignment technique. Section 5 shows experi- 
mental results on benchmark circuits. Finally, concluding 
remarks and f i ture  works are provided in Section 6. 
2. Wave-Steering design methodology 
2.1 Wave-Steered LBs 
All LBs in a Wave-Steered circuit are variants of a 
Binary Decision Diagram (BDD) [Z]. BDD's physical 
implementation is made of one 2:1 multiplexer node type. 
We use Pass-Transistor-Logic (PTL) mapped nodes built of 
2 nfets[l][3]. Because a straightforward PTL mapped BDD 
0-7695-1239-9/01 $10.00 0 2001 IEEE 
430 
implementation has too much delay to be acceptable, the 
wave-steering methodology introduces a fine pipeline gran- 
ularity to drastically decrease the delay. A single level, con- 
trolled by one signal, becomes a pipeline stage. The inputs 
to an LB are spatially distributed along the pipeline stages. 
To allow the co-existence of multiple data waves corre- 
sponding to different input vectors in a single design, we 
must skew input signals properly in application time. 
Part of a Skewing FlipFlop Chain 
showlng a basic master-slave skewing cell 
‘ Y L B  \ 
(b) 01 02 01 (a) 
$ c k A n  a-level clock 
clock 02 KJ-LJl b-levelclock 
clock - c-levelclock ’$” 
( c )  
Figure 1 : Wave-Steering principle 
Figure 1 .a shows an LB in wave-steering methodology, 
to implement a functionfof 3 signals a ,  b, c. Each variable 
must be assigned to a particular level in the LB considering 
their arrival times. If several signals arrive at the same time, 
dynamic latches are used to skew them (Figure 1.b) 
[4][5][6]. Figure 1 .a shows that c is assigned to the lowest 
level, followed by b and a .  Figure 1.b shows a 2-phase 
clocking scheme with a ‘$1 clock’ and its non-overlapping 
complement, the ‘$2 clock’. Both clocks have a $1 phase 
followed by a $2 phase. The $1 clock is “on” during the $1 
phase and “off’  during the $2 phase and vice-versa. Signals 
a and c are clocked by the $1 clock, and b is by the $2 clock. 
During the $ I  phase the a-level and c-level are computing 
while the b-level is isolating the computations, and during 
the $2 phase, the b-level is computing while the other levels 
are shut off. Because the clocking period is denoted by $, 
each level of an LB can be clocked in a single phase of dura- 
tion 4/2. As a result, since the inputs are spatially spread 
along the pipeline stages, multiple data waves coexist in the 
LB at the same time. Since each stage is a single level of 
multiplexer cells, the pipelining granularity is very small. 
The dynamic node output capacitances of the multiplexer 
cells act as natural capacitors; therefore no latches are 
needed at the outputs of the multiplexer cells to explicitly 
hold the signal values constant. In general, if there are n lev- 
els (corresponding to n input signals) in an LB, there will be 
data waves co-exist inside this n pipeline stages, and 
LB. 
2.2 Network of Wave-Steered LBs 
A larger circuit may be built from a network of wave- 
steered blocks. An example of a small network is shown in 
figure 2. Each LB may have an arbitrary number of levels. 
In the wave-steering methodology, the efficient latch assign- 
ment to interconnects becomes an essential task not only to 
guarantee the synchronized operation of  the circuit with the 
high frequency, but also to reduce the cost of hardware and 
power consumption arising from latch requirements. 
long ‘nterconne WJ inte erconnect delay 
4- loge stage delay 
Figure 2: A network of Wave-Steered LBs 
Depending on the role a latch plays, it is classified as 
either a hard  latch (interconnect latch) or a soft latch (input 
skewing latch). Hard latches fragment interconnects into 
fixed delay segments. They must be assigned such that the 
signal propagation time between two consecutive latches on 
the interconnect is less than the half of the user-specified 
clock period. We refer to this requirement as interconnect 
delay constraint. On the other hand, the soft latches are 
assigned such that LB can properly operate without a stall. 
Since the wave-steering methodology utilizes a two-phase 
non-overlapped clocking scheme [4][5], input signals at two 
consecutive input levels of an LB have to  reach there a half 
clock phase apart. Thus, the soft latch is introduced on inter- 
connects to satisfy the timing constraint to LB input signals. 
Actually, the number of required soft latches at an LB 
input is equal to the timing difference between the actual 
arrival time and the requested arrival time of the corre- 
sponding signal. The actual amval time is evaluated by con- 
sidering the output time from the former LB and the number 
of hard latches on the interconnect. The requested arrival 
time is given by the signal output time from the LB, which 
is uniquely determined on the assumption of the minimum 
latency for the circuit operation. However, if several sinks of 
one net are assigned the same number of soft latches, they 
can be shared by allocating the latches before the branching 
point. The total number of soft latches is uniquely, deter- 
mined for a given hard latch assignment under the minimum 
43 1 
latency operation. Therefore the input signals to LBs should 
be assigned to proper levels such that as many soft latches as 
possible can be shared. 
Note that in this paper, we assume the circuit decompo- 
sition and placement done by other algorithms. The next 
section formally defines the global routing problem for the 
wave-steering design methodology. 
3. Global routing problem formulation 
The LB placement and interconnect net list are given as 
inputs. The goal of the global routing problem is to find a 
path for every net with hardisoft latch assignment such that 
no pipeline hazards exist, and the circuit latency and the 
total number of assigned latches are minimized. Here, for 
simplification, we restrict the routing path such that 1 )  any 
path is rectilinear, 2) no routing congestion is considered (no 
limitation on the number of nets passing through the same 
channel), and 3) the total path length is minimized. 
Hard latches should be assigned to routing paths or 
interconnects such that the interconnect delay between any 
two consecutive latches is no more than half of the clock 
period. Because the signal delay time usually depends on 
the number of loading gates [lo], for a net driving k gates, 
the maximum interconnect length between two consecutive 
latches is given by a function of k,  A [ k ] .  
After the hard latch assignment, we can calculate signal 
arrival times at the LB inputs, the system latency, and the 
upper/lower LB level limits assignable to each input signal 
in topological order from primary inputs to primary outputs. 
Inputs to LBs are assigned such that no two of them share 
the same level and the output is available as soon as possible 
(ASAP).  
The final input signal level assigned is determined such 
that the number of shared soft latches are maximized (Fig- 
ure 3) .  This subproblem is solved by transforming it into a 
satisfiability problem (SAT). 
m v e  & 
4. Global routing and assignment 
The global routing and latch assignment technique is 
divided into 4 stages. Sections 4.1-4.4 discuss these sub- 
problems in detail. We denote the number of levels, or 
height, in the j,, LB as hi and its output evaluation time as 
ou?. An input i to the j l h  LB has the amval time arrivalii. 
These notations are used throughout the remainder of the 
paper. 
4.1 Routing path selection 
To solve this subproblem, we adopt the “I-Steiner algo- 
rithm” by Robins [IO].  This algorithm finds a near-optimal 
rectilinear routing path for a given set of terminal locations 
by repeatedly adding one Steiner point until no further 
improvement can be made. In our technique, the routing 
paths for all nets are found sequentially without considering 
congestion possibly caused by the already routed nets. 
4.2 Hard latch assignment 
Hard latches are assigned to each net sequentially. In 
order to obtain the minimum latency, hard latches are first 
assigned to each interconnect from the source to the sinks by 
the procedure forward assignment. There, after each inter- 
connect segment has been sorted topologically from the 
source to the sinks, hard latches are assigned at the most dis- 
tant locations from the previous latches. Then they are 
shared as much as possible by moving them backward 
before net branches by the procedure of backward sharing 
(Figure 4). 
move & shared - - - - -  
A I  
move I q) 
Figure 4 Hard latch backward movement 
Figure 3: Soft latch sharing 
432 
procedure forward-assignment 
1. Initialize the point list on the path as Q = {source). 
2. If Q is null, terminate the procedure. 
3. Pick up the first point, z ,  in Q. 
4. Assign (hard) latches on the path section between zand its first 
ranch or sink with A l l ] .  
5. If it reaches sink, go to 2. 
6. Assign a latch just before the branch on the path if the distance 
jetween the last latch in 4. and the branch is larger than A[2]. 
7. Find the most distant point from z such that the latch at z can 
:over outgoing wires: 
1) Sort branches on outgoing wires from z in ascending order of 
listance from z. 
2) Count the number of outgoing wires if a latch is assigned just 
ifter branch node. 
3) Find the largest branch to satisfy distance <= Aranout]. 
8. Assign one latch to each outgoing wire with Aflanour]. 
9. Insert points where latches are assigned into Q and go to 2. 
procedure backward-sharing 
rink-m}. 
1. Initialize the point list on the path as Q={sink-I, sink-2 ,...., 
2. If Q is null, terminate the procedure. 
3. Pick the first point z in Q. 
4. Starting from z, move latches backward as much as possible 
until a branch is reached. 
5. If a branch has latches on all of its k outgoing wires placed just 
after the branch point and each of their neighboring latches in the 
sink direction are located within A[k] from the branch point, 
merge those latches which are just after the branch point and move 
the resulting latch before the branch point. 
6 .  Insert this merged latch position into Q, and go to 2. 
Property 1: forward-assignment provides the hard latch 
assignment for the minimum system latency. 
Proof: Any hard latch except the last one to a sink covers 
the longest interconnect section. Thus, any forward shift of a 
latch creates interconnect sections which are not covered by 
a hard latch, which violates the delay constraints. 
tem latency. However, in this case, every time hard latches 
are merged, the increase of the system latency must be 
checked to avoid it. Thus, backward-sharing does not 
guarantee the minimality on the total number of hard 
latches. 
4.3 Signal arrival time evaluation 
Knowing the hard latch assignment, we calculate for 
each net the signal amval times at its sinks, and the output 
times from its source. The nets are sorted topologically from 
primary inputs to primary outputs. The signal amval time at 
a sink is determined by adding up the signal output time 
from the net's origin and the number of hard latches 
assigned on the interconnect path. When amval times of all 
inputs of an LB are calculated, the LB's output time is deter- 
mined. 
Irocedure signal-arrival-time 
a. Calculate signal arrival times at the net sinks (LB inputs) 
[hen the net's source amval time has been already determined. 
iitially, LB inputs which are net sinks originating at primary 
iputs are calculated. 
{here 
arrivalq = Outk + num-of-latchk. 
arrivalq: arrival time to i-th input toj-th LB 
outk: output time from k-th fanin LB 
num-of-latchkj: number of hard latches on the path from k-th 
.B to j-th LB. 
Here, we assume i-th input to j-th LB has a source at k-th LB. 
1. Calculate signal output times for LBs in which all input arriva 
lmes are obtained. 
1) Sort amval times of LB inputs (assume rn inputs) in ascendini 
rder: time,, time,, .._., time ,,,. ,. 
2) Initialize output time by: 
3) Initialize the pointer by p=O. 
4) Ifp=m, terminate the procedure. 
5 )  Update the output time by: 
6) Increment p by 1, and go to 4). 
outk' time(,. 
VUtk  'MAX(oufk + I ,  timep+,) 
Property 2: backward-sharing may reduce the number of 
hard latches. 
Here, we note that for simplification, backward-sharing 
only considers the hard latch reduction under the condition 
that the total number of hard latches is not increased on any 
path from a primary input to a primary output. If we allow a 
subset of k hard latches on the outgoing wires from one 
branch to be merged into one latch, the number of hard 
latches may be further reduced without increasing the sys- 
Property 3: signal-arrival-time calculates the earliest 
possible output time from an LB for given LB input signal 
arrival times. 
Once signal arrival times for all LB inputs are found, we 
determine the lowest possible and the highest possible input 
permissible levels for each input signal to a given LB. 
Within the bounds are all the possible levels to which a par- 
ticular signal can possibly be assigned without increasing 
433 
procedure highest-assignable-level 
I .  Sort input signals to the LB in descending order of lowp 
2. Initialize highest permissible level, highii=$. 
3. Initialize pointerp=h/-. and the number of signals to be 
assigned in higher levels by num-signal=O. 
4. Update num-signal by adding the number of signals where 
5 .  If num-signal= highii - p + I  (current assignable space is 
occupied by such signals), update highii of signals where 
lowij<=p by high .= -1, and initialize num-signal=O. 
6. Ifp=2, terminate the procedure, else decrement p by I and go 
to 4. 
low.. = p .  
IJ 
r , p  
4.4 Input level assignment 
-procedure SAI-optimization 
1. Compose aSAT instance by includingtheclauses fortwocon 
straints. 
2. Sort the sets of objective clauses in descending order of the 
lumber of reduced soft latches if they are satisfied. 
3.  If an unprocessed set of objective clauses remains, add the fil 
such set into the SAT instance, else terminate the procedure. 
4. Solve the SAT instance by a SAT solver. 
5. If this SAT instance is not solved, remove the added set of 
objective clauses. 
6. Go to 3. 
The LB input level assignment subproblem is formu- 
lated as a variant of the satisfiability problem (SAT) [ l l ] ,  
and is solved using an existing SAT solver. The goal of the 
SAT instance is to find a variable assignment such that every 
clause becomes true. A clause is a union of literals which 
are variables or their negations. First, we define a variable: 
xJJk = 1 : the ith input of thej th  LB be assigned to the 
kth level. 
xlJ (2 )  
k 
= 0 : not assigned there. 
We construct clauses to represent the constraints and the 
objective function of this subproblem. The first constraint is 
to assign every LB input signal to one of its permissible lev- 
els. The clause corresponding to the i-th signal of thej- th  
LB is described by: 
h w h , ,  nn (3) 
J I k = Ion,, 
where X represents a logical OR function. 
The second constraint says that no two signals can be 
assigned the same level. The clause is written in the con- 
junctive normal form (CNF) as: 
where n represents a logical A N D  operation and J is the set 
of all inputs to L B j .  
The objective of this subproblem is to maximize the 
number of soft latches which can be shared by different LB 
input signals on the same net. If each of b outgoing signals 
from one branch of a net is assigned p soft latches at LB 
inputs, these p soft latches are moved backward before the 
branch and are shared by all of them. As a result, a total of 
p(b-1) soft latches can be removed. 
Now, we consider the i-th signal to thej-th LB. If it is 
assigned to the kth level, the number of soft latches assigned 
there is given by: 
(5) 
In order to move p soft latches, this number must b e p  or 
k 
softij = o u t . -  (hi - k )  - arrival,.  
larger: 
(6) 
k softi, 2 p  
By solving these equations, the lowest level with p soft 
latch assignment is obtained: 
lowp, .  = p - o u t . + h . + a r r i v a l ,  J J  (7) 
Thus, we describe as follows the set of clauses to repre- 
sent the condition that every sink signal departing from one 
branch is assigned p soft latches: 
high, ,  h J d J , ,  high, ,  
l o w , ,  IOWPlnz lev,, 
(8) 
k k k 
x i ,  A c XlNl  A ” ’  XSl 
where it is assumed that the i-th signal of thej-th LB, the 1- 
th signal of the m-th LB, and the t-th signal of the s-th LB 
come from the same branch of a net. In the example in Fig- 
ure 5, the set of objective clauses for p=l is given by: 
Note that some of such objective clauses may not be 
satisfiable simultaneously in conventional designs. In such 
situations, we adopt a heuristic approach selecting satisfi- 
able objective clauses sequentially in the descending order 
of the number of soft latches shared. 
434 
Figure 5: Example-Soft latchsharing 
Circuit 
Property 4: Any feasible level assignment of LB input sig- 
nals requires the same number of soft latches if no latch 
sharing is done. Nets Latency Hard-L Soft-L 
Proof: The number of soft latches for thej-th LB is given 
by: soft. = C (hi- outi + arrival,, - levelii) , where levelii 
is the assigned level to the i-th signal of thej-th LB. Because 
are all con- hi. outj, arrival.., and 
stants, sofi  is constant regardless of the assignment levelii. 
h , -  I 
i = O  
h , -  1 
( h  - 1 )  level..  = h .  
‘J I ’2 
i = O  
5. Experimental results 
We tested our global routing technique on two types of 
benchmarks, MCNC benchmarks (with no feedback loops) 
and 2 array multipliers. We considered two different circuit 
decompositions so that we can see how the decomposition 
and the related interconnect delay affect the routing results. 
In Case 1, the heights of LBs do not exceed 4 levels. The 
resistance and capacitance of a wire between two neighbor- 
ing LBs is 1 1.67 L2 and 35.67pF respectively (assuming all 
LBs have aspect ratio 1 and dimensions of 100pm x 100pm 
in 0.25ym CMOS). In Case 2, the height is up to 8 levels 
and the delay parameters are 19.50 R and 65.00pF per LB 
span respectively. All delay parameters are obtained based 
on a 0.25pm TSMC technology. The clock frequency is 
fixed at 250 MHz. The LBs are placed using a commercial 
placer, where any LB is placed on a grid point. 
Tables 1 and 2 show experimental results for these cir- 
cuits for each case. The tables list each circuit’s name (Cir- 
cuit), the number of nets (Nets), the maximum number of 
latches on a path from a primary input to a primary output 
(Latency), the total number of hard latches (Hard-L), the 
total number of soft latches if they are not shared (Sofi-L), 
the reduced number obtained by sharing (Reduction), and its 
percentage (“YO) of the total. 
When we compare the required numbers of hard and 
soft latches in Case 1 and 2, Case 1 (=2359.4 on average) 
provides better results than Case 2 (=3775.0). Case 1 also 
gives better average latency. While this may appear counter- 
intuitive, this difference can be explained by the fact that in 
Case 1, the decomposition is such that nearly all LBs are of 
equal heights ( h  = 4). This results in smaller interconnect 
delays between LBs, and fewer latches are required to bal- 
ance interconnect delays between LBs of different heights. 
A better decomposition for Case 2 will lead to substantially 
better latency results. 
Table 1 : Experimental results (4 Levels) 
When we compare the number of hard and soft latches in 
the tables, the latter number is approximately 25 times 
greater than the former one for Case 1 and seven times than 
Case 2 before the soft latch sharing. The reduction of soft 
latches is critical for the efficient design in the Wave-Steer- 
ing design methodology. Our technique can reduce the total 
number of soft latches by about 40% on average. 
In order to reduce soft latches further, a proper circuit 
decomposition is critical. Figure 6 depicts the relationship 
between the average number of soft latches per sink and the 
number of sinks per net for “alu4” and “C7552 ”. This graph 
suggests that nets with more sinks usually require more soft 
latches per sink. In such nets, some sinks may request rela- 
tively earlier arrival times and some may request later times. 
Because these soft latches cannot be shared, more soft 
435 
latches have to  be assigned. The decomposition into smaller 
fanout nets is desired for the wave-steered circuits. 
Table 2: Experimental results (8 levels) 
16bm 1 303 I 57 I 225 I 9145 I 5587 161.09 
Avg. I 362.2 I 46.2 I 765.0 15377.9 1 2367.9 I 40.08 
3.5 
g 3  * 
2 2.5 
* 
3 2  
2 1.5 t
I 
* I  
p 0.5 68 
0 
Figure 6: Sinkslnet vs. avg.# soft latcheslsink 
6. Conclusions and future work 
In this paper we  have presented a global routing tech- 
nique for the wave-steering design methodology. Given a 
circuit decomposition and its placement, we  automatically 
find a rectilinear routing path hard-latch assignment. After 
the signal arrival times and the permissible levels to  LB 
input signals are calculated, an optimal signal level assign- 
ment is determined by solving the appropriate satisfiability 
problem. Current results indicate that the number of  soft 
latches is approximately ten times of that of hard latches, 
and their reduction is an essential task in the wave-steering 
methodology. 
In our future work, the algorithm to maximize the num- 
ber of  sharing soft latches will be refined so as to further 
reduce the burden of  soft latches. We adopt a heuristic in 
this paper, where sets of objective clauses are justified on a 
one-by-one basis, and the sets of  clauses once satisfied will 
never be removed. More sophisticated scheduling of objec- 
tive clause sets a n d o r  dynamic removals of satisfied sets 
will be our future objective. Furthermore, we will consider 
the rerouting of  paths after we obtain the hardsoft  latch 
assignments such that they are shared as much as possible. 
7. Acknowledgment 
The last three authors were supported in part by the NSF 
grant CCR 981 1528 and in part by California MICRO pro- 
gram through Xilinx. 
8. References 
[ I  1, V. Bertacco et al, “Decision Diagrams and Pass Transistor 
Logic Synthesis”, Proc. ACM/IEEE Int’l Workshop on Logic Syn- 
thesis, pp. 1-5, May 1997. 
[2] R. E. Bryant, “Graph-based Algorithms for Boolean functions 
manipulation”, IEEE Trans. Computers, Vol. C-35, pp. 677-691, 
Aug. 1986. 
[3] P. Buch, A. Narayan, A. R. Newton, and A. Sangiovanni-Vin- 
centelli, “Logic Synthesis for Large Pass Transistor Circuits”, 
Proc. ICCAD‘97, November 1997. 
[4] A. Mukherjee, R. Sudhakar, M. Marek-Sadowska, and S. 1. 
Long, “Wave Steering in YADDs: A Novel Non-iterative Synthesis 
and Layout Technique”, Proc. DAC’99, pp 466-47 1, 1999. 
[5] A. Mukhejee, M. Marek-Sadowska, and S. I. Long, “Wave 
Pipelining YADDs- A Feasibility Study”, Proc. IEEE Custom Inte- 
grated Circuits Conf. ‘99, pp. 559-562, 1999. 
[6] A. Singh, L. Macchiarulo, A. Mukhejee, and M. Marek-Sad- 
owska, “A Novel High-Throughput FPGA Architecture”, Proc. 
Eighth ACM International Symp. FPGAs, pp. 22-27, 2000. 
[7] A. Singh, A. Mukherjee, M. Marek-Sadowska, “Interconnect 
pipelining in a throughput intensive FPGA architecture”, Proc. 9th 
International Symposium on FPGAs, Feb. 2001, pp. 153- 160 
[SI L. Macchiarulo, S .  M. Shu, and M. Marek-Sadowska, “Wave 
Steered FSMs”, Proc. DATE 2000, pp. 270-276, 2000. 
[9] L. Macchiarulo and M. Marek-Sadowska, “Wave-Steering 
One-hot Encoded FSMs”, Proc. DAC’2000, pp. 357-360,2000. 
[lo] S. H. Gerez, “Algorithms for VLSI Design Automation”, 
Wiley 1998. 
[ I  I ]  N. Funabiki and T. Higashino, “A Minimal-State Processing 
Search Algorithm for Satisfiability Problems,” to appear in Proc. 
IEEE SMC 2001. 
[I21 M. R. Garey and D. S. Johnson, “Computers and Intractabil- 
ity: a Guide to the Theory of NP-Completeness,” Freeman, 1979. 
436 
