A novel asynchronous control unit and the application to a pipelined multiplier by 江正雄
A NOVEL ASYNCHRONOUS CONTROL UNIT AND THE APPLICATION 
TO A PIPELINED MULTIPLIER* 
Jen-Shiun Chiang and Jun-Yao Liao # 
Department of Electrical Engineering 
Tamkang University 
Tamsui, Taipei, Taiwan 
ABSTRACT 
This paper is to discuss the technique for asynchronous 
circuit design by using a novel asynchronous control unit. 
We use the very commonly used device, pass-transistor 
multiplexer, to design and implement the asynchronous 
control unit. Even though the architecture of the control 
unit is simple, the efficiency is well. A multiplier with 
pipelined structure has been designed to verify the 
usefulness of this technique. We use TSMC's 0.6um 
SPDM process to design and implement an 8-bX 8-b 
pipelined multiplier. The HSPICE simulation shows that 
the feed through rate of the inputs can be as high as 
250MHz. 
1. INTRODUCTION 
Due to the advanced VLSI technology, the size of the 
integrated circuit (IC) chips increases a lot, and hence the 
clock distribution becomes a measure problem in the 
synchronous system. In traditional synchronous circuits, 
blocks of combinational logic circuits are triggered by a 
globally distributed clock signal. Because of the large 
chip area, the global clock route becomes very huge and 
hence the capacitor load becomes very significant, and 
this symptom may cause clock skew and increase the 
power 'consumption. The clock skew may cause the 
performance of the system to degenerate a lot, and very 
hard to make the whole system synchronize. 
On the other hand, the asynchronous technology do not 
have the problems mentioned above. The events of 
asynchronous circuits are triggered by the input signals 
and do not have global clocks [l]. Therefore, we do not 
need to worry about the global clock. Since there is no 
global clock, we have no bothering of the dynamic clock 
power consumption, and theoretically the asynchronous 
* This research is supported by the National Science Council of 
the Republic of China under grant NSC 87-2215-E-032-004 
## Jun-Yao Liao is with the Holtek Corporation, Science Based 
circuits consume less power than the synchronous circuit 
does [ 13. Since the activation of the asynchronous circuits 
is triggered by the input signals to make the performance 
of the asynchronous system be in the average-case 
performance, instead of the worst-case performance for 
the synchronous system, and therefore, the asynchronous 
system potentially has faster speed than the synchronous 
design [2], and has longer lifetime for improved 
technology [l]. 
Since the asynchronous circuits have such characteristics, 
we will try to use the asynchronous design idea to design 
and implement an asynchronous circuit. In this paper we 
will design an asynchronous pipelined multiplier. The 
core of this pipelined multiplier is a novel asynchronous 
control unit. Actually the key point of the asynchronous 
design is the controller. There are several approaches 
proposed by the researchers [l-31. However, some of them 
are too complicated, and some are impractical to be 
implemented to a real circuit [ 11. Here a new design of the 
asynchronous control unit is proposed, and it has the 
characteristics of simple architecture and can be very 
easily implemented to the real VLSI circuit. Even the 
conventional synchronous system can be modified slightly 
to the asynchronous working mode by this control unit. 
In Section 2, the asynchronous control unit and the 
asynchronous pipelined scheme will be described. Section 
3 discusses the architecture of the multiplier. In Section 4, 
we will describe the design of an 8-b x 8-b asynchronous 
pipelined multiplier by the manner described in Section 2 
and Section 3. Finally, we conclude this paper in Section 
5. 
2. A NOVEL CONTROL UNIT AND THE 
PIPELINED SCHEME 
In asynchronous circuit design, the control unit is the soul 
of the circuit. Recently the asynchronous design uses 
event logic to control the system [1,3]. The Muller C- 
element is one of them [3]. This control unit is used to 
Industrial Park, Hsin-Chu, Taiwan. 
II- 169 
0-7803455-3/98/$10.00  1998 JEEE 
Authorized licensed use limited to: Tamkang University. Downloaded on March 24,2010 at 03:15:20 EDT from IEEE Xplore.  Restrictions apply. 
enable or disable the function of internal logic blocks. It is 
useful, however, it is not practical [I]. We are going to 
propose another architecture of the asynchronous control 
unit. The architecture of this novel control unit is not only 
simple but also feasible. 
The basic structure of the asynchronous circuit is 
composed of several stages. Except the first and last 
stages, each stage is connected by two stages, the 
preceding stage and the successive stage. There are 
control part and executing logic part in each stage. The 
executing logic part is composed of latches and 
differential logic such as CPL circuits [4]. The executing 
logic part will generate a “generate complete” (GC) 
signal to tell that the function generating in that stage has 
finished. Our proposed control unit tries to use the 
characteristics of the GC signal. In the asynchronous 
system, the GC signal will be sent to the previous stage 
and the successive stage respectively to tell the finished 
job of the current stage. When the previous stage receives 
the GC signal (GC,),  this signal acts like an 
acknowledge signal to tell the previous stage that the 
current stage has finished jobs, and the previous stage can 
send new data to the current stage. When the successive 
stage receives the GC, signal, this GC, signal acts 
like a request signal to request the successive stage that 
the current stage has finished the job and the data are on 
the way to the successive stage and also triggers the 
executing logic of the successive stage. Therefore, a well 
handshaking protocol ( GC as the acknowledge signal to 
the previous stage, and GC, as the request signal to the 
next stage) is formed and function normally and 
smoothly. 
Based on the protocol mentioned in the previous 
paragraph, a control unit is designed and is as shown in 
Figure 1. In Figure 1, GC, is the abbreviation for 
“general complete” signal of the current stage; GCi-l 
is the “generate complete” signal from the previous stage , 
and GCi,,is the “generate complete” signal in the 
successive stage. This control unit can operate by the 
same protocol as mentioned from the previous paragraph. 
GCi.l, GC, , and GCi+, all together activate the 
“enable” signal to trigger the latches in the current stage. 
The architecture of the asynchronous control unit is very 
simple. The components of this control unit are only 
transmission gates and inverters, and it can be 
implemented very easily. 
Let us discuss the function of this control unit which is 
embedded into the asynchronous circuits more detailedly. 
Figure 2 shows part of an asynchronous circuit and let us 
focus on stage i. In each stage , there are two multiplexers 
which form the control unit. The circle on the upper 
multiplexer expresses the complement of GC,,, , The 
logic part is implemented in differential logic. Initially 
both of the outputs of the differential logic are reset to ‘0’ 
(or ‘1’). The GC is composed of a two-input exclusive-or 
gate, and both of the outputs of the differential logic 
circuits are fed to the inputs of the GC circuit. When the 
logic function is finished, the outputs of the differential 
logic will generate two exclusive outputs which make GC 
to be 1 and thus generate the complete signal. Initially we 
set all GC’s to be 0, and the enable signal is reset. As soon 
as GC I-,is set to ‘l’, stage i is enabled and the latch in the 
logic part latches the data from stage i-1, and at the same 
time both of the differential outputs are reset to ‘0’ (or 
‘1’). 
GCi GCi+l 
Figure 1. The Novel Asynchronous Control Unit 
After the logic part finishes the execution, the GC 
becomes ‘1’ and makes the upper multiplexer select 
GCi+, , and makes stage i+l enable to accept data from 
stage i and activates stage i+l. Only when stage i+I 
finishes operation, otherwise stage i can not be activated 
by stage i - I .  By this manner the asynchronous circuits 
operate stage by stage, and function normally. 
3. THE ARCHITECTURE OF THE 
PIPELINE MULTIPLIER 
Because the asynchronous circuits work stage by stage, 
it is very similar to the pipelined architecture. In the 
11- 170 
Authorized licensed use limited to: Tamkang University. Downloaded on March 24,2010 at 03:15:20 EDT from IEEE Xplore.  Restrictions apply. 
Stage i 
GCi ! i 
GCi-1 
DATA 
i * 
U 
a a a  
U 1 DATA 
! 
Figure 2. The architecture of system with asynchronous control units 
, . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . , , , . . . . .. .. . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . , . . 
- ! j  UIj 
Figure 3. Pipelined Architecture of the Asynchronous 
Circuits 
G i 3  &-a 
4 1 r Booth Decoder 
Figure 4. Block diagram of multiplier 
pipelined architecture, we need latches to separate the 
stages. Here we use CPL to implement the logic block of 
the asynchronous circuits. Due to the characteristics of 
CPL the output signal is not full swing, and the non full 
swing signals may be reformed by the latches to the full 
find that the GC signal is detected from the latch. In this 
manner a pipelined multiplier E571 is designed to verify 
the control unit which is proposed in the precious section. 
The architecture of the pipelined multiplier is shown in 
Figure 4. The Booth algorithm [8] is applied to this 
multiplier. This multiplier has several components such as 
Booth decoder, 4-2 compressors [9], and conditional-sum 
adder [lo]. By Booth decoder, we can reduce the hardware 
cost. The 4-2 compressors can increase the speed of the 
multiplier. Since this is a pipelined design. The 
conditional-sum adder is very suitable to the pipelined 
architecture [lo]. The architecture shown in Figure 4. can 
be fit to any length of the pipelined multiplier. For 
simplicity we design an 8-bX 8-b multiplier. 
4. CIRCUIT DESIGN AND THE 
IMPLEMENTATION 
The multiply procedures of the multiplier are divided into 
ten stages to fit the pipelined architecture. Here the Booth 
decoder is divided into two stages; the 4-2 compressors are 
separated into two stages, and the conditional-sum adder is 
implemented in five stages. Totally there are ten stages for 
this pipelined multiplier. We use CPL circuits to design 
and implement each stage. Since the design is an 
asynchronous design, we do not need to worry about the 
clock frequency. The multiplier keeps working as long as 
the multiplicand and multiplier have dada. Averagely 
every 1.6ns can feed another set of data to the multiplier. 
This test chip is fabricated by TSMC’s 0.6 pm SPDM 
CMOS technology. We use the full-custom design method 
to complete the circuit layout. The layout is shown in 
Figure 5. The die size i s  about 970 p m x  1048 pm, and 
6428 transistors are used. The multiplication speed of the 
asynchronous pipelined multidier can be equivalently up 
swing wave forms. The block diagram of asynchronou$,I-l7fo a 
circuit with latches is shown in Figure 3. From Figure 3 we 
Of 250 MHz. The Of the 
HSPICE simulation are shown in Figure 6 and Figure 7. 
Authorized licensed use limited to: Tamkang University. Downloaded on March 24,2010 at 03:15:20 EDT from IEEE Xplore.  Restrictions apply. 
Other simulation results are shown in Table. 1. 5. CONCLUSION 
Theoretically asynchronous circuit consumes less power 
and the speed is faster than synchronous circuits. The 
control of the asynchronous circuit is hard and thus 
reduces the expected performance. In this paper, we 
propose an asynchronous control unit with simple 
architecture which is composed of transmission gates and 
inverters. By this approach, the asynchronous circuit can 
have good performance. We also verify the control unit by 
designing and implementing an 8-b x 8-b multiplier. This 
pipelined multiplier can feed data every 4ns and work very 
well. The asynchronous circuit design is proved to be 
feasible, and can be further applied to some other circuit 
which have been implemented. 
6. REFERENCES 
Figure 5. Layout of the 8-b x 8-b Asynchronous 
[ 11 Scott Hauck, “Asynchronous design methodologies : 
an overview”, IEEE Proceeding, vol. 83, no. 1, pp. 
Pipelined Multiplier 
69-93, Jan. 1995. 
1 0  [2] E. de Angel, and E. Swartzlander, Jr. “A new 
asynchronous multiplier using enable/disable CMOS 
1 5 0  differential logic”, ISDN:I 063-6404 pp. 302-305, 
.“--f3] Ivan E. Sutherland , “Micropiplines”, Commun. ACM, 
= 5 0  .) sm 
; 
t, ‘,“: 
I O  
5 0 0  on 1994. 
X 8 8  - Q s  6 0  
I) 6 0  
Figure  6.  The  HSPICE Simulation for the  
Input Data Rate of 100  MHz 
e 
Figure 7.  The  HSPICE Simulation for the Input 
Data Rates of 100/200/250 MHz 
Table 1. The simulation results 
Frequency 1 Delay I Power 
100 MHz I 16.4ns I 43.75mW 
200 MHz 1 16.3ns I 65.35mW 
250 MHz I 16.111s I 73.5mW 
vol. 32, no. 6, pp. 720-736, June. 1989. 
[4] Bellaouar and M. I. Elmasry, Low-Power Digital VLSI 
Design Circuits and Systems , Kluwer Academic 
Publishers, 1995. 
[5] Ohkubo, and M. Suzuki, “ A 4.411s CMOS 54-bX54-b 
multiplier using pass- transistor multiplexer”, IEEE J 
Solid-state Circuits, vo1.30, no.3, pp. 251-256 March 
1995. 
[6] Mori, M. Nagamatsu, M. Hirano, S. Tanaka, M. Noda, 
Y. Toyoshima, K. Hashimoto, and K. Maeguchi, “A 
10-ns 54-bX54-b parallel structured full array 
multiplier with 0.5-pm CMOS technology”, IEEE J.  
Solid-State Circuits, vo1.26, no.4, pp. 600-605 April 
1991. 
[7] Nagamatsu, S. Tanaka, J. Mori, K. Hirano, T. 
Noguchi, and K. Hatanaka, “A 15-ns 32-bX32-b 
CMOS multiplier with an improved parallel structure”, 
IEEE J. Solid-state Circuits, vo1.25, no. 2, pp. 494- 
497 April 1990. 
[8 Israel Koren, Computer Arithmetic Algorithms , 
Prentice-Hall, Inc. 1993. 
[9 Weste, K. Eshraghhian, Principles of CMOS VLSI 
Design : A Systems Perspective, 2nd, Addism-Wesley 
Publishing Company, 1993. 
[ 101 Kai Hwang, Computer Arithmetic Principles, 
Architecture, and Design , John Wiley & Sons, Inc. 
1979. II- 172 
Authorized licensed use limited to: Tamkang University. Downloaded on March 24,2010 at 03:15:20 EDT from IEEE Xplore.  Restrictions apply. 
