Implementation of Static and Semi-Static Versions of a Bit-wise Pipelined Dual-rail NCL 2S Complement Multiplier by Sankar, R. et al.
Missouri University of Science and Technology 
Scholars' Mine 
Electrical and Computer Engineering Faculty 
Research & Creative Works Electrical and Computer Engineering 
01 Jan 2007 
Implementation of Static and Semi-Static Versions of a Bit-wise 





et. al. For a complete list of authors, see https://scholarsmine.mst.edu/ele_comeng_facwork/1080 
Follow this and additional works at: https://scholarsmine.mst.edu/ele_comeng_facwork 
 Part of the Electrical and Computer Engineering Commons 
Recommended Citation 
R. Sankar et al., "Implementation of Static and Semi-Static Versions of a Bit-wise Pipelined Dual-rail NCL 
2S Complement Multiplier," Proceedings of the IEEE Region 5 Technical Conference, 2007, Institute of 
Electrical and Electronics Engineers (IEEE), Jan 2007. 
The definitive version is available at https://doi.org/10.1109/TPSD.2007.4380386 
This Article - Conference proceedings is brought to you for free and open access by Scholars' Mine. It has been 
accepted for inclusion in Electrical and Computer Engineering Faculty Research & Creative Works by an authorized 
administrator of Scholars' Mine. This work is protected by U. S. Copyright Law. Unauthorized use including 
reproduction for redistribution requires the permission of the copyright holder. For more information, please 
contact scholarsmine@mst.edu. 
228
Implementation of Static and Semi-Static Versions
of a Bit-Wise Pipelined Dual-Rail
NCL 2S Complement Multiplier
R. Sankar, V. Kadiyala, R. Bonam, S. Kumar, S. Mohan, F. Kacani, W. K. Al-Assadil, and S. C. Smith2
University of Missouri - Rolla, Department of Electrical and Computer Engineering
1870 Miner Circle, Rolla, MO 65409
Email: waLeed umredu' and smithsco du2
Abstract-This paper focuses on implementing a II. NCL OVERVIEW
2s complement 8x8 dual-rail bit-wise pipelined multiplier using NCL is a delay-insensitive asynchronous paradigm, which
the asynchronous NULL Convention Logic (NCL) paradigm. The
design utilizes a Wallace tree for partial product summation, and means that NCL circuits will operate correctly regardless of
is implemented and simulated in VHDL, the transistor level, and when circuit inputs become available; therefore NCL circuits
the physical level, using a 1.8V 0.18,um TSMC CMOS process. are said to be correct-by-construction (i.e., no timing analysis
The multiplier is realized using both static and semi-static is necessary for correct operation). NCL circuits utilize dual-
versions of the NCL gates; and these two implementations are rail logic to achieve delay-insensitivity. A dual-rail signal, D,
compared in terms of area, power, and speed. consists of two wires, Do and D', which may assume any
value from the set {DATAO, DATAI, NULL}. The DATAOI. INTRODUCTION state (Do = 1, D' = 0) corresponds to a Boolean logic 0, the
For the past few decades, the development of synchronous DATAI state (Do = 0, D' = 1) corresponds to a Boolean
circuits has dominated the semiconductor design industry. logic 1, and the NULL state (Do = 0, D' = 0) corresponds to
However, as we see a growing need for more power efficient, the empty set meaning that the value of D is not yet available.
higher performance, and more noise tolerant chips, the The two rails are mutually exclusive, such that both rails can
advantages offered by asynchronous logic paradigms, such as never be asserted simultaneously; this state is defined as an
NULL Convention Logic (NCL) [1], should not be ignored. illegal state.
This paper addresses the use of industry-standard CAD NCL uses threshold gates as its basic logic elements [3].
tools for designing NCL circuits. The multiplier was first The primary type of threshold gate, shown in Fig. 1, is the
designed at the gate-level using standard NCL design THmn gate, where 1 < m < n. THmn gates have n inputs,
techniques. VHDL simulation was then performed to ensure where at least m of the n inputs must be asserted before the
functional correctness. We utilized an NCL VHDL library, output will become asserted. In a THmn gate, each of the n
with delays based on physical-level simulations of static gates inputs is connected to the rounded portion of the gate; the
designed using TSMC's 1.8V, 0.18pim CMOS technology [2]. output emanates from the pointed end of the gate; and the
Next, we converted the static NCL transistor-level library into gate's threshold value, m, is written inside of the gate.
a semi-static version [3], and directly converted the semi-static input 1
transistor-level library to the physical level, using a Schematic input 2
otu
Driven Layout tool. Finally, we implemented the multiplier output
components at the physical level using an HDL driven layout input n
scheme, and simulated these to obtain power, speed, and area Fig. 1. THmn threshold gate.
comparisons for the static versus semi-static implementations. Another type of threshold gate is referred to as a weighted
The paper is organized as follows: Section II provides a threshold gate, denoted as THmnWw1w2. ... Weighted
brief overview of NCL; Section III describes the multiplier threshold gates have an integer value, m . WR > 1, applied to
design and implementation at the various levels of abstraction; inputR. Here 1 < R < n; where n is the number of inputs; m is
Section IV includes the simulation results and comparisons; the gate's threshold; and w1, W2, ... WR, each > 1, are the
and Section V concludes the paper. integer weights of input], input2, ... inputR, respectively. For
example, consider the TH34W2 gate shown in Fig. 2, whose
n =4 inputs are labeled A, B, C, and D. The weight of input A,
W('A), is therefore 2. Since the gate's threshold, m, is 3, this
implies that in order for the output to be asserted, either inputs
The authors gratefully acknowledge the support from the National Science B,CanDmutllbaseedoriptAu besetd
Foundation under CCLI grant DUE-0536343.''
along with any other input, B, C, or D.
1-4244-1280-3/07/$25.00 ©2007 IEEE 2007 IEEE Region 5 Technical Conference, April 20-21, Fayetteville, AR
Authorized licensed use limited to: IEEE Xplore. Downloaded on January 20, 2009 at 13:38 from IEEE Xplore.  Restrictions apply.
229




wavefront from overwriting the previous DATA wavefront,
C 3 Z by ensuring that the two DATA wavefronts are always
separated by a NULL wavefront. The acknowledge signals are
Fig. 2. TH34w2 threshold gate: Z =AB +AC +AD + BCD. combined in the Completion Detection circuitry to produce
the request signal(s) to the previous register stage. NCL
NCL threshold gates are designed with hysteresis state- registration is realized through cascaded arrangements of
holding capability, such that all asserted inputs must be single-bit dual-rail registers or single-signal quad-rail
deasserted before the output will be deasserted. Hysteresis registers. These registers consist of TH22 gates that pass a
ensures a complete transition of inputs back to NULL before DATA value at the input only when K, is request for data
asserting the output associated with the next wavefront of (rfd) (i.e., logic 1) and likewise pass NULL only when Ki is
input data. Therefore, a THnn gate is equivalent to an n-input request for null (rfn) (i.e., logic 0). They also contain a NOR
C-element and a THIn gate is equivalent to an n-input OR gate to generate Ko, which is rfn when the register output is
gate. There are 27 fundamental NCL gates, constituting the set DATA and rfd when the register output is NULL.
of all functions consisting of four or fewer variables [3], as An N-bit register stage, comprised ofN single-bit dual-rail
shown in Table 1. Since each rail of an NCL signal is NCL registers, requires N completion signals, one for each
considered a separate variable or value, a four variable register. The NCL completion component uses these Ko lines
function is not the same as a function of four literals, which to detect complete DATA and NULL sets at the output of
would normally consist of eight variables or values, assuming every register stage and request the next NULL and DATA
dual-rail signals. set, respectively. In full-word completion, the single-bit output
TABLE I of the completion component is connected to all Ki lines of the
27 FUNDAMENTAL NCL GATESTransistorsNTransistorsSprevious register stage. Since the maximum input threshold
NCL Gt BoeFTransistors Transistors gate is the TH44 gate, the number of logic levels in theNCL Gate Boolean Function (StatiC) (Semi-StatiC)
TH12 A + B 6 6 completion component for an N-bit register is given by
TH22 AB 12 8 Flog4 Ni. On the other hand, bit-wise completion only sends
TH13 A + B + C 8 8 the completion signal from bit b in register, back to the bits in
TH23 AB+AC+ BC 18 12 register.-, that took part in the calculation of bit b. This method
TH23w2 AA BC 14 10 may therefore require fewer logic levels than that of full-word
TH33w2 AB + AC 14 10 completion, thus increasing throughput.
TH14 A+B+C+D 10 10
TH24 AB+AC+AD+BC+BD+CD 26 16
TH34 ABC + ABD + ACD + BCD 24 16 III. DESIGN AND IMPLEMENTATION
TH44 ABCD 20 12 The multiplier implementation utilized the Modified Baugh-
TH24w2 A + BC + BD + CD 20 14 Wooley algorithm [4] for partial product generation and a
TH34W2 AB±+AC±+AD±+BCD 22 15
TH44w2 ABC+ ABD +ACD 23 15 Wallace tree for partial product summation. The multiplier
TH34w3 A + BCD 18 12 was then pipelined utilizing bit-wise completion in order to
TH44w3 AB + AC + AD 16 12 maximize throughput, reducing the average DATA-DATA
TH24w22 A+B+CD 16 12 cycle time, TDD, from 46 gate delays for the original non-
TH34w22 AB + AC + AD + BC + BD 22 14 pipelined design to 4 gate delays. This required an additional
TH44w22 AB ± ACD I BCD22 14 14 internal register stages of decreasing width. Note that at
TH34w32 A + BC+ BD 17 12 the last stage of the Wallace tree, a Ripple Carry Adder (RCA)
TH54w32 AB + ACD 20 12 was used instead of a Carry Look-Ahead Adder (CLA), since
TH44w322 AB + AC + AD + BC 20 14 the performance of asynchronous circuits depends on average-
TH54w322 AB + AC + BCD 21 14 case delay, not worst-case delay, as in synchronous circuits;
THxorO AB + CD 20 12 and both CLAs and RCAs have the same average case delay
TI2HandO AB + BC + AD B 19 13 of O(Log N) for an N-bit adder. Furthermore, an NCL RCA ismuch more area efficient than its equivalent CLA, and can be
NCL threshold gate variations include resetting THnn and pipelined with less difficulty. Hence, a RCA is the preferred
inverting THIn gates. Circuit diagrams designate resettable choice [5].
gates by either a d or an n appearing inside the gate, along A block diagram of the multiplier is shown in Fig. 3, and its
with the gate's threshold. d denotes the gate as being reset to components are listed and described below:
logic 1; n, to logic 0. Both resettable and inverting gates are 1)HA: This component is an NCL half adder. It is inherently
used in the design of delay-insensitive registers [1]. input-complete, has a delay of 2 gates, and utilizes 7 gates.
NCL systems contain at least two delay-insensitive 2) FA: This component is an NCL full adder. It is also
registers, one at both the input and at the output. Two adjacent inherently input-complete, has a delay of 2 gates, and
register stages interact through their request and acknowledge utilizes 12 gates.
1-4244-1280-3/07/$25.00 ©2007 IEEE 2007 IEEE Region 5 Technical Conference, April 20-21, Fayetteville, AR
Authorized licensed use limited to: IEEE Xplore. Downloaded on January 20, 2009 at 13:38 from IEEE Xplore.  Restrictions apply.
230
3) FA1: This is a specialized full adder component, where one inverter is realized by simply swapping rails.
of the inputs is always logic 1. It is inherently input- 5)AND2c: This is an input-complete two-input AND
complete, has a delay of 2 gates, and utilizes 7 gates. function, used for partial product generation. It has a delay
4) HAI: This is a specialized half adder component, where of 2 gates, and utilizes 5 gates.
one of the inputs is always logic 1. Its carry output is equal 6)NAND2c: This is an input-complete two-input NAND
to its input and its sum output is the inverse of its input; function, used for partial product generation. It has a delay







1|t6 4-Bit NCL Register 6
WallaceTreeCSA-8words FA Bit-Wse Completion
(2 gate delays) (1 _</e dlatye)
12 4-Bit NCL Register KWallace Tree CSA-68words m _ Bit-Wise Completionl (2 gate delays) K _ _ (1 gate delays)
Fi 34M-BitNCL Register Kdiagram.
24 83 $2 ©00 46 E 2076 IEE Regio ter A 2 F A
Wlla eTeeCSA-36words i i-seCompleion|
l(gate delays) (1 gate delays)
,| 35-Bit NCL Register 34L_=_
lacRCTfre CS -od strt words |j Bit-Wise Completion|
(2 gate delays) (1 gate delays)
272-Bit NCL Register K i
| ~ eRCSA-FA wod i-se Cmlto
l 2ate delays) __ _ _ _ ___ _ (1gtdeas
21 -Bit NCL Register K i2
(2 gate delay5 ll (1 gate delays)
*| 12~~~~~~2-BitNCL Register KoL__=___ -- -- - 1 t
>| 19-iNC1eitr K9_ ___ ___ _
RCA - FA Bit-Wise Completion
l (2 gate delays) (1 gate delays)
*~~~~~~~1 8-Bit NCL Register Ki ______ - -
RCA - FA Bit-Wise Completion
(2 gate delays) _____________________(1 gate delays)
.16-BitNCL Register Ki
*| 14-BitNCL Register K
_13__ ______
l (2gaedly) ||| ( gate delays)|
rr ASL~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~1
P- s ,4 in2-it4NCL Rei steri2P1 i1 Ki K P K PKiPs is P KiP P KiP oKo
Fig. 3. FAtple Bit-Wis Completi.
1-4244-1280-3/7/$25.00 t207IEEE 2007 IEEgatiodeas Tehnca gateene, derlays) ,Fyttvll,A
Authorized licensed use limited to: IEEE Xplore. Downloaded on January 20, 2009 at 13:38 from IEEE Xplore.  Restrictions apply.
231
A. VHDL Implementation
Each of the above components was implemented as a gate-
level structural design; and these were combined with generic
registration and completion components [2] to form a
hierarchical design of the entire 8x8 multiplier. The complete
gate-level structural VHDL version of the multiplier was then A
simulated using an exhaustive VHDL testbench, and was A
verified to be functionally correct. The simulation yielded a
TDD of 2.6 ns, using gate delays based on physical-level B
simulations with TSMC's 1.8V,0.181im CMOS technology. C B
B. Transistor-Level Implementation
As explained in Section II, NCL threshold gates are
designed with hysteresis state-holding capability, such that A L
after the output is asserted, all inputs must be deasserted A B C
before the output will be deasserted. Therefore, NCL gates
B
have both set and hold equations, where the set equation B
determines when the gate will become asserted and the hold
equation determines when the gate will remain asserted once it L c
has been asserted. The set equation determines the gate's
functionality as one of the 27 NCL gates, as listed in Table I, _ _,
whereas the hold equation is the same for all NCL gates, and
is simply all inputs ORed together. The general equation for Fig. 4. Static CMOS implementation of a TH23 gate: Z AB + AC + BC.
an NCL gate with output Z is: Z = set + (Z- * hold), where Z7
is the previous output value and Z is the new value. Take the
TH23 gate for example. The set equation is AB + AC + BC, C
as given in Table I, and the hold equation is A + B + C;
therefore the gate is asserted when at least 2 inputs are B
asserted and it then remains asserted until all inputs are
deasserted. A
To implement an NCL gate using CMOS technology, an A A
equation for the complement of Z is also required, which in
general form is: Z' = reset + (Z-' * set'), where reset is the B C
complement of hold (i.e., the complement of each input,
ANDed together), such that the gate is deasserted when all
inputs are deasserted and remains deasserted while the gate's Fig. 5. Semi-static CMOS implementation of a TH23 gate: Z = AB + AC + BC.
set condition is false. For the TH23 gate, the reset equation is Transistor-level libraries have been created for both the
A'B'C' and the simplified set' equation is A'B' + B'C' + static and semi-static versions of all the NCL gates used in the
A'C'. Directly implementing these equations for Z and Z , design, utilizing Mentor Graphics Design Architect (DA) tool.
after simplification, yields the static transistor-level All of the DA gates were then simulated with Accusim to
implementation of an NCL gate, as shown in Fig. 4 for the verify the design by observing the I/O waveforms. For the
TH23 gate. This requires the output, Z, to be fedback as an static version, minimum widths were used for all transistors to
input to the NMOS and PMOS logic to achieve hysteresis maximize gate speed; however, for the semi-static version,
behavior. larger transistors were required to overcome the weak
NCL gates can also be implemented in a semi-static feedback inverter to obtain proper gate functionality and
fashion, where a weak feedback inverter is used to achieve reduce propagation delay.
hysteresis behavior, which only requires the set and reset According to the TSMC018 rules, the ratio of widths of the
equations to be implemented in the NMOS and PMOS logic, PUN and PDN networks should be around 2.42. Therefore,
respectively. The semi-static TH23 gate is shown in Fig. 5. In for the semi-static gates, the size of the strong inverter is set as
general, the semi-static implementation requires fewer Wp=7, Wn=3, L=2- and the weak inverter is sized as Wp=7
transistors, but is slightly slower becauseoftheweak iverter. Wn=3, L=7. The rest of the gate's transistors are sized
Note that THin gates are simply OR gates and do not require accordgtW IW =1L2
any feedback, such that their static and semi-static dn oW~WD~/,L2
implementations are exactly the same. C. Physical-Level Implementation
Using Mentor Graphics IC Station, a physical layout was
1-4244-1280-3/07/$25.00 ©2007 IEEE 2007 IEEE Region 5 Technical Conference, April 20-21, Fayetteville, AR
Authorized licensed use limited to: IEEE Xplore. Downloaded on January 20, 2009 at 13:38 from IEEE Xplore.  Restrictions apply.
232
created for both static and semi-static versions of all NCL results can be viewed using EZwave to check the I/Os for
gates used in the design, using the Schematic Driven Layout functional correctness, and measure a variety of parameters,
(SDL) tool. All the NCL gate layouts were checked for such as Rise Time, Fall Time, Propagation Delay, etc.
coherency using Design Rules Check (DRC) and Layout D Design Flow
Versus Schematic (LVS), and then made into standard cells.
T
The gate layouts were then standardized so that they can be The overall design flow is depicted in Fig. 6. Note that the
used in hierarchical layout generation for the multiplier transistor-level and physical-level NCL gate libraries were
components. Standardization means that a gate is defined at only created for the semi-static gates, because these libraries
the block level, such that it is understood by the tool as a are available on the web for static NCL gates [6].
standard block. To do this, the cell is first set as a standard
cell, and the site type is specified as '1' so that the blocks can Design and Verify Multiplier
be placed side by side. Next, a floor plan is created to enclose using VHDL
the standard cell with floor plan blocks; the cell is aligned at _ _ i
the origin of the sheet; and VDD and GND are implemented as Create Semi-Static NCL
ports using Metall, and all other ports implemented using Gates in Transistor-Level
Metal2, without using Metal#.port layers, and placing the using DA and Accusim
ports close to the cell boundary. s
Once the gates are standardized, they are utilized to of Semi-Static Gates
implement the physical-level design of the basic multiplier using SDL in lCstation
components, HA, FA, HAI, FAI, AND2c, and NAND2c.
SDL used to implement the NCL gates becomes more difficult Verify Semi-Static Gate Layouts
to use as the complexity of the circuit increases; hence, a using Eldo and Ezwave
Verilog driven layout method is utilized instead to perform
automated layout (i.e., using the $autoplace_standard_cells( Standardize Layouts as
command), where the Verilog netlist is generated from the Standard-Cell Layouts for both
gate-level structural VHDL file, using Leonardo Spectrum. Semi-Static and Static Gates
The most complex task is routing between instances, which ,Theomostcomplexftaskis rouStationgandbtwee inncreas, wich Create Flattened Verilog Netlist Create Flattened Verilog Netlistshow up as overflows in IC Station, and which increase with from Multiplier VHDL Code from VHDL code for each
the complexity of the design. These can be routed using the using Leonardo Spectrum Multiplier Block
Auto Route option as follows: 4
1) Set the peek-on-view option to eliminate shorts between Create Verilog-Driven Physical Create Verilog-Driven Physical
metals . Layout for Flat ion Layout for each Mlualtiplier Block
2) Enable metal routing in both directions to provide the router using ICstation
with more flexibility. Extract Transistor-Level Extract Transistor-Level Netlist
3) Execute the RIP command to start routing. This might Netlist with Capacitances with Capacitances using Calibre
increase the number of overflows, but it helps yield better using Calibre-Pex -Pex for each Multiplier Block
routing. 4
4)Execute the RUN command to reduce the number of Simulate Extracted Netlist Simulate Extracted Netlist for
overflows. using ADMS each Multiplier Block using Eldo
Steps 3 and 4 are repeated until the number of overflows is Fig. 6. NCL multiplier design flow.
reduced to between 10 and 20. The remaining 10-20
overflows are then manually routed; and the VDD and GND
ports are made using Metall, and the remaining ports made IV. RESULTS AND COMPARISON
using Metal2. The physical-level simulation results for the static and
Now, a netlist file is generated from the component layout, semi-static multiplier components, designed using a 1.8V
and is changed to a CIR file by including information for 0.18ptm TSMC CMOS process, are shown below in Table II.
input waveform generation and plotting the results. Also, the Power is the average power reported by Eldo for an
TSMC018 library needs to be included using the . lib exhaustive test of all input combinations, using 10 ns for both
command. the DATA and NULL wavefront widths. This shows that
After generating the corresponding C I R file, the power dissipation and area is less for the semi-static
component can then be simulated using the Eldo tool, which implementation, as expected, since far fewer transistors are
creates a COuJ file. Eldo simulation provides information about required because the gate feedback logic is replaced with a
floating gates or improperly connected nets, and is therefore weak inverter for semi-static gate implementation. This
very useful for debugging the CIR file. Eldo also calculates reduces power dissipation because fewer transistors are
the component's power dissipation. After the Eldo simulation switching and there are fewer nodes to contribute to charge-
is complete without any errors or warnings, the simulation sharing. Propagation delay varies depending on the size of the
1-4244-1280-3/07/$25.00 ©2007 IEEE 2007 IEEE Region 5 Technical Conference, April 20-21, Fayetteville, AR
Authorized licensed use limited to: IEEE Xplore. Downloaded on January 20, 2009 at 13:38 from IEEE Xplore.  Restrictions apply.
233
component. Small components, such as AND2c and From the physical-level simulation results of the various
NAND2c, are faster when implemented using semi-static multiplier components, the semi-static implementation was
gates than with static gates. However, for larger circuits, such found to be better in terms of area and power dissipation,
as HA, FA1, and FA, the static implementation is faster. whereas the static implementation was faster for larger
components, but slower for the small components. The overall
TABLE II system-level design also showed that the semi-static version
ANALYSIS OF MULTIPLIER COMPONENTS required less area than the static version, but the system-level
Power (pW) Delay (ps) Area(tm2) simulations showed that the two versions had approximately
Static Semi- Static Semi- Static Semi-
Component static static static the same average cycle time, and that the static version
AND2c 201 184 497 376 1844 1543 required almost half the amount of energy compared to the
Nand2c 201 184 497 376 1844 1458 semi-static version, which contradicts what we expected,
HA 321 312 529 668 1941 1669 ,
o
FAI 321 312 511 635 1941 1675 based on the power results ofthe multiplier components.
FA 682 630 645 755 3164 3006 Future work includes using ADMS to calculate the energy
per operation for the static and semi-static multiplier
The overall system-level layout of the multiplier has been components to help investigate why the system-level semi-
completed, requiring 0.330836 mm2 for the static version and static design requires almost twice the energy per operation
0.305261 mm2 for the semi-static version. ADVance MS compared to the static version. This may be due to the weak
(ADMS), an extension of Mentor Graphics Eldo simulator, feedback inverter in the semi-static gates, which required
has been utilized to simulate the static and semi-static larger N-network and P-network transistors to obtain proper
physical-level designs using a VHDL testbench, as described gate switching. The design could also be optimized to increase
in [7], resulting in an average DATA-to-DATA cycle time throughput and reduce area. This can be achieved by
(TDD) of 5.642 ns for the semi-static design and 5.638 ns for designing the multiplier components using the Threshold
the static version, averaged over 9 random input vectors, Combinational Reduction method [8] for designing optimized
showing that at the system level, the two designs have almost NCL components. Doing so will substantially reduce the
the same speed. ADMS also automatically calculates the number of gates required to implement each component, and
average energy per operation, which was 84.007 pJ for the will also decrease the first stage's combinational delay
static design and 159.30 pJ for the semi-static design, (i.e., from 2 gates to 1 gate), thus increasing throughput for
averaged over 10 random input vectors, showing that the static the entire pipeline [9] (i.e., TDD will decrease from 4 gates to
version required almost half the amount of energy compared 3 gates), since the first stage is the throughput-limiting stage.
to the semi-static version, which contradicts what we
expected, based on the power results of the multiplier REFERENCES
components shown in Table II. [1] K. M. Fant and S. A. Brandt, "NULL Convention Logic: A Complete
This VHDL-controlled physical-level simulation method is and Consistent Logic for Asynchronous Digital Circuit Synthesis,"
absoluteyfor asynchronous circuits because the International Conference on Application Specific Systems,Architectures, and Processors, pp. 261-273, 1996.
inputs do not change relative to a periodic clock pulse, but [2] (available April 2007).
instead change value at various times based on handshaking [3] Gerald E. Sobelman and Karl M. Fant, "CMOS Circuit Design of
signals.Howeve, .itis very time-consuming, requiring Threshold Gates with Hysteresis," IEEE International Symposium onsignals. ever lt 1S hme-c m u1rm Circuits and Systems (II), pp. 61-65, 1998.
approximately 15 hours for each simulation, running on a [4] Behrooz Parhami, Computer Arithmetic Algorithms and Hardware
900 MHz Sun machine. Designs, Oxford University Press, New York, 2000.
[5] S. C. Smith, "Development of a Large Word-Width High-Speed
Asynchronous Multiply and Accumulate Unit," Elsevier's Integration,
the VLSI Journal, Vol. 39/1, pp. 12-28, September 2005.
V. CONCLUSION [6] ht_p.//web.umr.du/-smithscoNLSIhtmI (available April 2007).
[7] A. Singh and S. C. Smith, "Using a VHDL Testbench for Transistor-
We designed and implemented an 8X8 bit-wise pipelined Level Simulation and Energy Calculation," The 2005 International
dual-rail NCL 2s complement multiplier at all levels of Conference on Computer Design, pp. 115-121, June 2005.
abstraction, from VHDL to layout, using both static and semi- [8] s. C. Smith, R. F. DeMara, J. S. Yuan, D. Ferguson, and D. Lamb,
sttcgte.TegLmodel of the "Optimization of NULL Convention Self-Timed Circuits," Elsevier'sstatic gates. The gate-level structural VHDL mod Integration, the VLSIJournal, Vol. 37/3, pp. 135-165, August 2004.
entire system was successfully simulated and verified to be [9] s. C. Smith, R. F. DeMara, J. S. Yuan, M. Hagedorn, and D. Ferguson,
functionally correct. Furthermore, all of the major system "Delay-Insensitive Gate-Level Pipelining," Elsevier's Integration, the
components were implemented, simulated, and verified at the VLSIJournal, Vol. 30/2, pp. 103-131, October 2001.
transistor-level and physical-level. Additionally, the full
system-level implementation at the physical level was
completed for both the static and semi-static versions; and
these system-level designs were simulated using ADMS to
calculate energy per operation and average propagation delay.
1-4244-1280-3/07/$25.00 ©2007 IEEE 2007 IEEE Region 5 Technical Conference, April 20-21, Fayetteville, AR
Authorized licensed use limited to: IEEE Xplore. Downloaded on January 20, 2009 at 13:38 from IEEE Xplore.  Restrictions apply.
