Evaluation and Analysis of NULL Convention Logic Circuits by Brady, John Davis
University of Arkansas, Fayetteville 
ScholarWorks@UARK 
Theses and Dissertations 
12-2019 
Evaluation and Analysis of NULL Convention Logic Circuits 
John Davis Brady 
University of Arkansas, Fayetteville 
Follow this and additional works at: https://scholarworks.uark.edu/etd 
 Part of the Digital Circuits Commons, Power and Energy Commons, and the VLSI and Circuits, 
Embedded and Hardware Systems Commons 
Citation 
Brady, J. D. (2019). Evaluation and Analysis of NULL Convention Logic Circuits. Theses and Dissertations 
Retrieved from https://scholarworks.uark.edu/etd/3466 
This Dissertation is brought to you for free and open access by ScholarWorks@UARK. It has been accepted for 
inclusion in Theses and Dissertations by an authorized administrator of ScholarWorks@UARK. For more 
information, please contact ccmiddle@uark.edu. 
Evaluation and Analysis of NULL Convention Logic Circuits 
 
 
A dissertation submitted in partial fulfillment 
of the requirements for the degree of  







University of Arkansas 
Bachelor of Science in Computer Engineering, 2012 
University of Arkansas 














       





              
James Parkerson, Ph.D.    Dale Thompson, Ph.D. 




       







Integrated circuit (IC) designers face many challenges in utilizing state-of-the-art technology 
nodes, such as the increased effects of process variation on timing analysis and heterogeneous 
multi-die architectures that span across multiple technologies while simultaneously increasing 
performance and decreasing power consumption. These challenges provide opportunity for 
utilization of asynchronous design paradigms due to their inherent flexibility and robustness.  
 
While NULL Convention Logic (NCL) has been implemented in a variety of applications, current 
literature does not fully encompass the intricacies of NCL power performance across a variety of 
applications, technology nodes, circuit scale, and voltage scaling, thereby preventing further 
adoption and utilization of this design paradigm. 
 
This dissertation evaluates the nominal dynamic energy, voltage-scaled dynamic energy, and static 
power consumption of NCL across variations in circuit type, circuit scale, and technology node, 
including 130 nm, 90 nm, and 45 nm processes. These results are compared with synchronous 
counterparts and analyzed for a range of trends in order to identify and quantify advantages and 
disadvantages of NCL across a variety of applications. By providing an evaluation of a broad range 
of circuits and characteristics, an IC designer may effectively predict the advantages or 









First, I would thank my advisor, Dr. Di. Dr. Di gave me the opportunity to join his lab during my 
undergraduate education and has continued to push my development and provide opportunities for 
me throughout my graduate education. Dr. Di’s commitment to both my technical and                           
non-technical development has exceeded my expectations and is greatly appreciated. Without his 
mentorship, I might not have pursued this degree. 
 
I would like to thank the members of our lab. The open learning environment established for 
collaboration and the building and sharing of a knowledge base was greatly beneficial and will 
hopefully continue to be so for others. I would also like to thank many of my fellow lab members 
for their contributions to the libraries required for the analysis in this dissertation and the many 
projects previous. 
 
I thank my family for their support in pursuing the opportunities given to me. I thank Cara for her 

















Table of Contents 
 
 
1. Introduction ............................................................................................................................... 1 
2. Background ............................................................................................................................... 4 
     2.1 Null Convention Logic (NCL) ........................................................................................... 4 
     2.2 Unified NCL Environment (UNCLE) Toolset .................................................................. 8 
3. Methodology ........................................................................................................................... 10 
     3.1 Semiconductor Processes ................................................................................................. 10 
     3.2 Libraries ........................................................................................................................... 11 
          3.2.1 Synchronous ............................................................................................................. 11 
          3.2.2 NCL.......................................................................................................................... 12 
     3.3 Designs ............................................................................................................................. 13 
          3.3.1 Registration Design .................................................................................................. 14 
          3.3.2 Multiplier Design ..................................................................................................... 15 
          3.3.3 Finite State Machine Design .................................................................................... 15 
          3.3.4 Pipelined Combinational Design ............................................................................. 16 
          3.3.5 Universal Asynchronous Receiver Transmitter Design ........................................... 17 
     3.4 Synthesis .......................................................................................................................... 18 
          3.4.1 Synchronous ............................................................................................................. 18 
          3.4.2 NCL.......................................................................................................................... 19 
     3.5 Transistor-Level Simulation ............................................................................................ 20 
     3.6 Multiple Vdd ..................................................................................................................... 21 
     3.7 Voltage Scaling ................................................................................................................ 22 
          3.7.1 Synchronous ............................................................................................................. 22 
          3.7.2 NCL.......................................................................................................................... 24 
4. Results ..................................................................................................................................... 25 
     4.1 Registration Design .......................................................................................................... 25 
          4.1.1 Dynamic Energy – Nominal Vdd ................................................................................................................... 25 
          4.1.2 Dynamic Energy – Scaled Vdd ........................................................................................................................ 28 
          4.1.3 Static Power ............................................................................................................. 31 
     4.2 Multiplier Design ............................................................................................................. 32 
          4.2.1 Dynamic Energy – Nominal Vdd .............................................................................. 32 
          4.2.2 Dynamic Energy – Scaled Vdd ................................................................................. 35 
          4.2.3 Static Power ............................................................................................................. 37 
     4.3 Finite State Machine Design ............................................................................................ 40 
          4.3.1 Dynamic Energy – Nominal Vdd .............................................................................. 40 
          4.3.2 Dynamic Energy – Scaled Vdd ................................................................................. 44 
          4.3.3 Static Power ............................................................................................................. 48 
     4.4 Pipelined Combinational Design ..................................................................................... 51 
          4.4.1 Dynamic Energy – Nominal Vdd .............................................................................. 51 
          4.4.2 Dynamic Energy – Scaled Vdd ................................................................................. 54 
          4.4.3 Static Power ............................................................................................................. 56 
     4.5 Universal Asynchronous Receiver Transmitter Design ................................................... 59 
          4.5.1 Dynamic Energy – Nominal Vdd .............................................................................. 60 
          4.5.2 Dynamic Energy – Scaled Vdd ................................................................................. 63 
          4.5.3 Static Power ............................................................................................................. 64 
5. Conclusion .............................................................................................................................. 67 













































List of Tables 
 
 
Table 2.1: All NCL gates and Boolean Functions ........................................................................ 6 
Table 3.1: Transistor information by semiconductor process ..................................................... 10 
Table 3.2: Registration design specifications ............................................................................. 14 
Table 3.3: Multiplier design specifications ................................................................................. 15 
Table 3.4: FSM design specifications ......................................................................................... 16 
Table 3.5: Pipelined combinational design specifications .......................................................... 17 
Table 4.1: Register design specifications.................................................................................... 25 
Table 4.2: Dynamic energy per operation for all registration circuits ........................................ 26 
Table 4.3: Dynamic energy per operation data for voltage-scaled registration circuits ............. 29 
Table 4.4: Static power for registration circuits .......................................................................... 31 
Table 4.5: Multiplier design specifications ................................................................................. 32 
Table 4.6: NCL and synchronous multiplier dynamic energy per operation at nominal Vdd ..... 33 
Table 4.7: NCL and synchronous multiplier circuit Vdd-scaled dynamic energy per operation . 35 
Table 4.8: Static power data for multiplier circuits .................................................................... 38 
Table 4.9: FSM design specification .......................................................................................... 40 
Table 4.10: NCL and synchronous FSM dynamic energy data per operation at nominal Vdd ... 41 
Table 4.11: NCL and synchronous FSM voltage-scaled dynamic energy per operation (1/2) ... 45 
Table 4.12: NCL and synchronous FSM voltage-scaled dynamic energy per operation (2/2) ... 46 
Table 4.13: NCL and synchronous FSM static power data ........................................................ 48 
Table 4.14: Pipelined combinational design specifications ........................................................ 51 
Table 4.15: NCL and sync. pipelined comb. circuit energy per operation data at nominal Vdd . 52 
Table 4.16: NCL and sync. pipelined comb. circuit energy per operation at scaled Vdd (1/2) ... 54 
Table 4.17: NCL and sync. pipelined comb. circuit energy per operation at scaled Vdd (2/2) ... 55 
Table 4.18: Static power data for pipelined combinational circuits ........................................... 57 
Table 4.19: Dynamic energy results for UART circuits at nominal Vdd .................................... 61 
Table 4.20: Dynamic energy results for Vdd-scaled UART circuits (1/2) .................................. 63 
Table 4.21: Dynamic energy results for Vdd-scaled UART circuits (2/2) .................................. 63 

































List of Figures 
 
 
Figure 2.1: NCL gate symbol........................................................................................................ 5 
Figure 2.2: NCL pipeline .............................................................................................................. 7 
 
Figure 2.3: NCL Moore machine .................................................................................................. 7 
Figure 3.1: NCL registration circuit............................................................................................ 14 
Figure 4.1: NCL vs. synchronous registration circuit dynamic energy trends by process ......... 27 
Figure 4.2: NCL registration circuit trends by process ............................................................... 28 
Figure 4.3: NCL and synchronous scaled-Vdd dynamic energy trends for registration circuits . 30 
Figure 4.4: NCL vs synchronous multiplier circuit dynamic energy trends at nominal Vdd ...... 33 
Figure 4.5: NCL multiplier circuits dynamic energy trends by process ..................................... 34 
Figure 4.6: NCL vs. synchronous multiplier circuit voltage-scaled dynamic energy trends ...... 36 
Figure 4.7: NCL vs. synchronous multiplier static power trend comparison ............................. 39 
Figure 4.8: NCL FSM input logic scaling dynamic energy trends at nominal Vdd .................... 42 
Figure 4.9: NCL vs. synchronous FSM dynamic energy trends at nominal Vdd ........................ 43 
Figure 4.10: NCL vs. synchronous FSM voltage-scaled dynamic energy trends ....................... 47 
Figure 4.11: NCL vs. synchronous FSM circuit static power trends .......................................... 49 
Figure 4.12: NCL vs. synchronous FSM circuit static power trends when scaling input logic.. 50 
Figure 4.13: NCL vs. sync. pipelined comb. circuit dynamic energy trends at nominal Vdd ..... 53 
Figure 4.14: NCL vs. synchronous pipeline circuit registration dynamic energy trends ........... 54 
Figure 4.15: NCL vs. synchronous voltage-scaled pipelined comb. circuit energy trends ........ 56 
Figure 4.16: NCL pipelined combinational circuit static power trends across each process ..... 58 
Figure 4.17: NCL vs. sync. pipelined comb. circuits static power trends across each process .. 58 
Figure 4.18: NCL vs. Synchronous UART dynamic energy trend at nominal Vdd .................... 61 
Figure 4.19: NCL and synchronous UART combinational and registration energy trends ........ 62 
Figure 4.20: Vdd-scaled UART dynamic energy trend ............................................................... 64 








New challenges continue to arise as the improving deep-submicron CMOS technologies 
progressively increase the transistor density and performance capabilities of digital integrated 
circuits (ICs); and likewise, as system architecture design further diverges from monolithic IC 
design to heterogeneous multi-die architectures encompassing multiple technology nodes. The 
major challenges in utilizing such technologies include: meeting power constraints while providing 
high performance; reliably and efficiently adapting performance based on available power; and, 
flexibly integrating multiple technologies dedicated for distinct tasks and levels of performance. 
 
One method utilized to meet power requirements in synchronous ICs is dynamic voltage and 
frequency scaling (DVFS). DVFS provides the ability to decrease power consumption by lowering 
the power supply voltage of an IC. However, in accomplishing this feature, synchronous ICs must 
adapt their clock frequency to match the resulting slower performance. In combination with a trend 
of increased transistor density while simultaneously retaining fabrication of reticle-limited die, 
these conditions are challenging for synchronous ICs due to effects such as increased clock skew 
and voltage drop in low voltage operation. The culmination of these conditions and effects 
increasingly challenges synchronous IC design in verifying and validating a system for correct 
operation.  
 
With the increasing importance of addressing these challenges and others such as process variation 
and design reuse, quasi-delay insensitive (QDI) asynchronous logic paradigms provide increased 
reliability and modularity, lower noise/emission, etc. [1-5] These advantages in NULL Convention 
Logic (NCL) designs are valuable in applications facing the increasingly-complex challenges of 
current and future designs.  However, there is a lack of complete information, and likewise 
2 
 
contradicting information, in power-consumption comparison and analysis between asynchronous 
circuits and their synchronous counterparts. On one hand, designs such as the IBM TrueNorth 
neuromorphic chip (>60% asynchronous logic) and the follow-on IBM NorthPole neural inference 
processor demonstrate the low-power features of asynchronous logic; but on the other hand, 
disparate design, simulation, and analysis practices can often lead to observations that are 
inconsistent or even opposite (note that since asynchronous circuits do not have a clock, energy is 
a more appropriate metric compared to power).  
 
In one example, three scaling implementations of NCL and synchronous ripple carry adders were 
simulated in a 180 nm process [6], resulting in a synchronous-to-NCL power ratio trend increasing 
from 2.56 to 3.04 as the scale of the circuit increases, thereby indicating a strong trending 
advantage for NCL. However, this significant NCL advantage may be misleading to a circuit 
designer because of the lack of comparison with equal circuit frequencies between logic paradigms 
and other factors such as designing the synchronous circuit simulations to demonstrate the worst-
case power consumption. 
 
In another example, an NCL and a synchronous 32-bit floating-point co-processor were 
implemented in the IBM 130 nm Bulk CMOS Process [7]. Analyzed for low-voltage scaling 
applications, the NCL design simulated an increase of 148% in active energy consumption and an 
increase of 61% in static power consumption over the synchronous counterpart when scaling the 
Vdd to 0.5 V. A synchronous active energy and static power advantage over NCL continued in 
trend as Vdd was decreased. This data indicates a significant NCL disadvantage and conflicts with 
the previous 180 nm ripple carry adder analysis, despite being one technology node step smaller. 
Additionally, by not including data such as a nominal Vdd comparison, questions remain as to 
3 
 
whether the technology node, voltage, circuit scaling, circuit type, etc., created the contradicting 
trend. 
 
The state of available data and analysis not only creates confusion for IC designers, but more 
importantly, it hinders the adoption of asynchronous logic by the semiconductor IC design 
industry. Therefore, a comprehensive evaluation of the energy-consumption comparison between 
asynchronous and synchronous circuits along multiple dimensions is needed to resolve these 
contradictions and provide guidance in choosing between asynchronous and synchronous 
paradigms when designing a specific circuit. This evaluation is needed to show how each of these 
logic paradigms performs in energy and power consumption for a specific type and size of circuit 
in a specific application scenario that uses a specific semiconductor technology.  
 
In this dissertation, NCL and synchronous circuits are analyzed across a multitude of design 
characteristics, including circuit type, circuit scale, and technology node, in order to quantify the 
advantages and disadvantages for NCL and synchronous approaches in each situation. Each circuit 
is analyzed for dynamic energy consumption at nominal Vdd and scaled Vdd, and static power 
consumption. The results are evaluated to determine trends across technology nodes and circuit 
characteristics, thereby modeling the expected trade-off for an asynchronous design based on the 










2.1 NULL Convention Logic (NCL) 
 
NCL is a quasi delay-insensitive asynchronous style circuit design methodology that uses dual-rail 
logic [8]. In NCL, dual-rail signals allow for three valid states: NULL (rail0 = 0, rail1 = 0), DATA0 
(rail0 = 1, rail1 = 0), and DATA1 (rail0 = 0 and rail1=1). The two rails are mutually exclusive, 
meaning that the last state (rail0 = 1 and rail1 = 1) is an invalid state for all signals. A value of 
DATA0 or DATA1 constitutes DATA. 
 
NCL circuits are implemented with a combination of 27 fundamental gates that follow the    
naming convention THmn.  The fundamental gates represent a complete set of Boolean functions 
allowing any Boolean function to be represented with NCL gates. For most circuits, a separate 
function comprised of a combination of the 27 fundamental gates is created for each rail of a      
dual-rail signal. 
 
For a THmn gate (Figure 2.1) to assert its output, at least m of the n inputs must be asserted. Due 
to hysteresis within all NCL gates, all n inputs must first have a value of zero before the output is 
deasserted. As a result, once all gates in a set of combinational logic have propagated DATA, each 
input of each gate must receive a ‘0’ to ensure that previous DATA retained by the logic is erased 





Figure 2.1: NCL gate symbol. 
 
In addition to the THmn notation, each input can have additional weight over other inputs to the 
gate, which is denoted as THmnWw1w2..wn (w1 refers to A, w2 refers to B, etc.). An example of 
this type of gate is the TH23w2, which assigns a weight of 2 (all inputs have a weight of 1 unless 
specified otherwise) to the first input of the gate, A. In this case, the assertion of A would fulfill 
the threshold of 2 in order for the TH23w2 gate to assert its output. If A is not asserted, both B and 
C must be asserted before the output will be asserted, as they have a combined weight of 2. Table 
























Table 2.1: All NCL gates and Boolean functions. 
NCL Gate Boolean Function 
TH12 A + B 
TH22 AB 
TH13 A + B + C 
TH23 AB + AC + BC 
TH33 ABC 
TH23w2 A + BC 
TH33w2 AB + AC 
TH14 A + B + C + D 
TH24 AB + AC + AD + BC + BD + CD 
TH34 ABC + ABD + ACD + BCD 
TH44 ABCD 
TH24w2 A + BC + BD + CD 
TH34w2 AB + AC + AD + BCD 
TH44w2 ABC + ABD + ACD 
TH34w3 A + BCD 
TH44w3 AB + AC + AD 
TH24w22 A + B + CD 
TH34w22 AB + AC + AD + BC + BD 
TH44w22 AB + ACD + BCD 
TH54w22 ABC + ABD 
TH34w32 A + BC + BD 
TH54w32 AB + ACD 
TH44w322 AB + AC + AD + BC 
TH54w322 AB + AC + BCD 
THxor0 AB + CD 
THand0 AB + BC + AD 
TH24comp AC + BC + AD + BD 
 
 
The NCL pipeline architecture (Figure 2.2) contains a register set, combinational logic, and 
completion logic for each stage. NCL circuits contain at least two sets of asynchronous delay-
insensitive registers to create a single stage design or pipelined design. Each register uses 
handshaking signals, Ko and Ki, to indicate to the previous set of registers when it is ready for the 
next NULL or DATA wave. A NULL wave is a set of data where all values of the dual-rail signals 
equal NULL. A DATA wave is a set of data where all values of the dual-rail signals equal DATA, 
which is any combination of DATA0 and DATA1. Each DATA wave in an NCL stage is followed 
7 
 
by a NULL wave to ensure that DATA previously propagated through a stage does not mix with 
the data in the next DATA wave.  
 
 
Figure 2.2: NCL pipeline. 
 
All registers in a register set must have the same Ko value before the register set can request a 
different wave from the previous register set. This function is executed by the completion logic. If 
a DATA wave has completed, and all registers output a Ko of 0, a request-for-NULL (rfn) is sent 
to the previous register set allowing the next NULL wave to be released. Likewise, if a NULL 
wave has completed, and all registers output a Ko of 1, a request-for-DATA (rfd) is sent to the 
previous register set. 
 
In addition to the completion logic, modified registers, and dual-rail functions required to create a 
single-stage NCL design, a finite state machine (FSM) requires the insertion of two additional 
stages. The structure for this type of circuit can be seen in Figure 2.3.  
 
 




As seen in Figure 2.3, two of the stages contain completion logic (C/L) but do not contain 
combinational logic. These two stages serve to store and propagate DATA waves without 
manipulating the DATA. The addition of two register sets is implemented in order to create a 3-
stage pipeline with feedback, which is required in order to isolate each DATA wave from the 
previous or following DATA waves. The 3-stage pipeline allows one DATA wave, which is both 
proceeded and preceded by a NULL wave, to propagate through the feedback loop. 
 
2.2 Unified NCL Environment (UNCLE) Toolset 
 
 
One of the challenges in implementing asynchronous designs is developing a synthesis flow for 
each semiconductor process. Unified NCL Environment (UNCLE) is an academia-developed 
toolset for synthesizing Verilog RTL designs into NCL circuits [9]. UNCLE leverages commercial 
tools, i.e., Synopsys Design Compiler and Cadence Encounter RTL Compiler, to synthesize a 
Verilog RTL design to an NCL gate-level netlist containing registers, latches, and combinational 
logic.  
UNCLE begins by synthesizing the Verilog RTL design into a single-rail synchronous netlist. 
After synthesizing the netlist, UNCLE converts the single-rail netlist into a dual-rail netlist. Once 
the dual-rail combinational logic and registration are implemented, a series of modifications and 
verification steps are executed, including adding an acknowledge network in order to facilitate the 
required handshaking between each stage of the circuit. It should be noted that UNCLE does not 
properly synthesize all styles for RTL design and that attention must be paid to the specifications 
in the user manual and the type of designs inserted as input.  
 
The resulting NCL gate-level netlist from UNCLE can be simulated in a functional simulator, such 
as ModelSim, or imported into Cadence for transistor-level verification. UNCLE includes a built-
9 
 
in functional simulator, allowing the user to import a custom asynchronous library complete with 
characterization files for synthesis and analysis. In addition to functional verification, UNCLE 
offers various functionalities, including inputting random or user-specified data for verification 
and tracking gate switching activity within the circuit. These functions can serve as preliminary 
analysis in order to reduce the required investment from a user when determining the value of 

























The focus of this research is to comprehensively evaluate the advantages and disadvantages of 
NCL along multiple dimensions, including circuit type, circuit scale, and semiconductor 
technology node. In selecting the designs for evaluation, the intention was to avoid optimized 
application-specific designs representative of specific advantages for synchronous or NCL 
circuits. Instead, the designs were selected to avoid bias towards either paradigm in order to extract 
advantages and disadvantages that can be applied to a broad set of applications. In the pursuit of 
generalized applications and characteristics, the resulting analysis from this work is applicable to 
many situations in which a designer is trying to determine whether synchronous or NCL design is 
appropriate. 
 
3.1 Semiconductor Processes 
 
In developing a methodology for evaluation of NCL circuits, a range of technology nodes were 
selected, including 130 nm Bulk CMOS, 90 nm Bulk CMOS, and 45 nm partially-depleted silicon-
on-insulator (PD-SOI) CMOS processes. Table 3.1 shows semiconductor process information, 
transistor threshold voltages, and data by process.  
 















130 nm 1.2 Std. Vt Std. Vt -404.1 334.1 - - 
90 nm 1.2 Std. Vt Low Vt -402.1 370.7 -355 459.7 
45 nm  1.0 Std. Vt Std. Vt -439.4 278.2 - - 
 
 
It is important to understand how the process and affect data and corresponding analysis when 
considering how each type relates to each application. While the 90 nm and 130 nm processes 
11 
 
implement Bulk CMOS, the 45 nm process included for evaluation implements FD-SOI CMOS. 
This contrast creates mismatches when considering trends, but it is also valuable to evaluate 
circuits that are affected by both the challenges that deeper submicron semiconductor processes 
face, including process variation and leakage current, and the solutions that enable smaller 
semiconductor nodes, specifically FD-SOI. 
 
The goal when developing libraries or selecting standard cell libraries was to select transistor types 
to add consistency across circuit designs, methodologies, and semiconductor nodes. The standard 
threshold voltage libraries were selected in all available cases. The utilization of the standard 
voltage threshold transistors for each process prevents bias of circuits for applications such as low 
power or high performance. For results containing multiple transistor types, it is important to note 
the effects that each library’s transistor type may place on performance data such as transistor 







For each process in this dissertation, the standard cell libraries were utilized for synchronous circuit 
synthesis and simulation. The standard cell libraries for the 130 nm, 90 nm, and 45 nm processes 
were developed by the corresponding process's foundry and were included in the process develop 
kits. These libraries were not modified in developing any of the circuits presented in this 
dissertation. Due to the commercial development of each synchronous library, these libraries do 




Two of the main advantages of the well-developed standard cell libraries are the range of gate 
drive strengths and cell functions. For each gate, there is a corresponding set of drive strengths (up 
to 8) that offer increased drive strength depending on the load capacitance and timing constraints 
of the output. This allows more buffering options for the synthesis process because the synthesis 
tool may elect to replace the gate with a different size, especially if the required drive strength is a 
minor increase over the available drive strength. In some cases, this may not require an overall 
increase in the footprint of the layout because the required modifications can be implemented 
without incurring increases to the height or width of the cell. Also, for cell functionality, it is 
typical practice for these developed standard cell libraries to implement commonly used functions, 
such as an adder, as an optimized cell. 
 
Due to the level of optimization and development for standard cell libraries, there is also more 
variation between the standard cell libraries in each process based on the level of granularity in 
scaling drive strength sizes of each gate. A wider range of gate sizes allows the synthesis tool to 
insert a more closely matched gate size to drive a particular node. The differences are most 
prominent when compared to NCL synthesis where multiple gate sizes are not typically available; 




Each of the NCL libraries was developed based on balancing each gate to meet average timing 
constraints for a typical capacitive load. The NCL libraries consist of the fundamental NCL gates, 
as discussed in section 2.1, and do not contain other optimized functions. The libraries were 
developed to have one gate drive strength, which means a buffer must be inserted at the output if 
the maximum drive strength is exceeded. This contrasts the Boolean standard cell libraries that 
13 
 
have a range of drive strengths for each gate as a more efficient buffering option before inserting 
buffer gates. Due to this factor, and others such as the previously mentioned cell functionality, the 
optimization provided by each NCL library is lesser than each corresponding Boolean standard 
cell library. Each library was sized based on an estimated minimum drive strength requirement. 
The goal was to prevent unnecessarily large drive strengths for average cases which may result in 
increased power consumption and area. 
 
Based on the capacitance of the minimum drive strength requirement, 4 buffer sizes were created. 
Each buffer size meets the rise and fall time requirements while driving double the capacitance of 
the next smallest buffer. The smallest buffer drives double the capacitance set for each gate in the 
library. As a result of the design methodology for these libraries, the smallest buffer is typically 




In order to evaluate NCL advantages and disadvantages, a set of circuits was selected in order to 
represent prominent characteristics existing in most circuits. These circuits represent control-
based, feedback-oriented, data-driven, and pipeline characteristics. By selecting circuits according 
to these characteristics, a user may utilize this data to relate to the characteristics of their design 
application. The designs selected include a registration design, a multiplier, a pipelined 









3.3.1 Registration Design 
 
The registration design was chosen in order to represent the effects of scaling register sets for 
increased data widths. Effectively, this design represents a single stage without combinational 
logic between the two register sets. For the registration design, 3 variably-sized sets of registers 
are implemented. These register set sizes are implemented with varying input widths, as shown in 
Table 3.3. 
 
Table 3.2: Registration design specifications. 
Circuit Name Input Bit Width 





Two register sets are used for each circuit, instead of one, in order to ensure that synthesis for 
synchronous designs is constrained by timing data, and when appropriate, increases the size or 
strength of the clock tree. As in other implemented designs, the timing constraints are based on 
simulation of the NCL registration circuits. The diagram for the NCL registration circuit is shown 
in Figure 3.1. 
 
 





The synchronous registration circuit contains two register sets, but it does not contain completion 
logic or any other type of combinational logic between them. 
 
3.3.2 Multiplier Design 
 
The multiplier implemented in this research serves multiples purposes: 1) it represents a non-
pipelined design; and 2) it is a purely combinational logic circuit for comparison between NCL 
and synchronous design. Creating a purely combinational logic circuit for evaluation allows 
comparison of the effects of combinational logic to be isolated. By isolating these effects, trends 
can be analyzed across different circuits with varying combinations of registration circuitry, 
feedback, etc., in order to better understand the source of advantages and disadvantages. 
Additionally, three multipliers were implemented to understand the effects of scaling. Table 3.2 
lists each multiplier and the corresponding bit-width of the inputs. 
 
Table 3.3: Multiplier design specifications. 






3.3.3 Finite State Machine Design 
 
The finite state machine (FSM) was selected in order to represent control logic and feedback-based 
design characteristics. When deciding which type of FSM model to implement, the Moore machine 
was selected due to the more predictable and stable nature as compared to a Mealy machine. In 
addition to instability, the asynchronous nature of a Mealy machine’s output creates challenges for 




In implementing the Moore machine, two aspects of the FSM are scaled: the size of the input logic 
and the number of states. Two input-bit sizes were implemented in combination with three state 
sizes. The result is a total of six FSMs for comparison, which are shown in Table 3.5. The variation 
in input bit size represents the effect of increasing combinational logic for a control-based circuit, 
and the variation in the number of states represents the effect of increasing the number of registers. 
 
Table 3.4: FSM design specifications. 
Circuit Name Number of States Number of Input Bits 
FSM 8-1 8 1 
FSM 16-1 16 1 
FSM 32-1 32 1 
FSM 8-3 8 3 
FSM 16-3 16 3 
FSM 32-3 32 3 
 
 
Each FSM circuit was designed with generic state information. However, the states and transitions 
were modified to ensure each state was reachable. If all states are not reachable, the synthesis tool 
will remove logic for the unreachable states, thereby optimizing the netlist. This optimization 
would result in an inaccurate representation for purposes of analysis. 
 
3.3.4 Pipelined Combinational Design 
 
In order to represent data-driven and pipelining design characteristics, a modified pipelined FIR 
filter was implemented which herein is referred to as the pipelined combinational design. Due to 
the targeted circuit characteristics for this design, the function of the circuit was not important. 
Instead, the requirements were that the design was traditionally pipelined without communication 
between stages external to each register set separating stages. A pipelined combinational design 
17 
 
allows evaluation of the effects of clock tree distribution for synchronous circuits when compared 
to the localized nature of NCL acknowledge networks.  
 
In order to represent the effect of increasing the number of stages and the scale of the circuit, 8-
stage, 16-stage, and 32-stage pipelined combinational circuits were implemented for comparison. 
Table 3.5 contains each circuit implemented and clarifies the naming scheme. 
 
Table 3.5: Pipelined combinational design specifications. 





3.3.5 Universal Asynchronous Receiver-Transmitter Design 
 
For representing communication characteristics and the synchronization mechanisms required, the 
Universal Asynchronous Receiver-Transmitter (UART) communication design was selected. The 
implemented design has 8 bits per character, does not use parity, and requires one stop bit. It also 
implements a configurable baud rate at runtime via a divisor register. 
 
As expected from the experience detailed in section 2.2, UNCLE was unable to synthesize the 
RTL design implementing the synchronous UART due to the multi-faceted feedback controls and 
the synchronization mechanism. As a result, the NCL UART design was the most modified and 
manually implemented of the designs included in this work. While the NCL UART functions 
correctly and implements most of the functionality included in the synchronous UART, it is 
important to note the difference when considering the results and corresponding analysis. As a 
result, the data collected from the synchronous and NCL UART will provide more value in trends 







The synthesis process for synchronous circuits utilized the Cadence Genus Synthesis Solution 
software, synthesizing Verilog RTL designs into a gate-level netlist for each of the 45nm, 90nm, 
and 130nm processes. In addition to the Verilog RTL code, timing libraries and gate 
characterization files are required inputs into Genus for synthesis. These timing libraries and gate 
characterization files were developed by the foundry and were not modified at any point for 
synthesizing the synchronous circuits.  
  
The last required input for synthesizing synchronous circuits was the target clock period. The clock 
period for each synchronous circuit was based on the simulated data-to-data period (Tdd) for the 
corresponding NCL circuit, meaning that each NCL circuit was simulated before synthesizing the 
final corresponding synchronous circuit. This method was chosen in order to provide the most 
equality in circuit performance for analysis and comparison. If a synchronous design was 
synthesized to stricter or looser timing constraints, relative to the simulated timing characteristics 
of a corresponding NCL circuit, this could result in a synthesized netlist that misrepresents the 
energy consumption for a circuit application. 
 
After synthesizing the netlist through Genus, clock tree synthesis is executed for each of the 
synchronous circuits to ensure that the overhead required in correctly distributing the clock signal 
is represented in the analysis and comparisons between synchronous and NCL circuits. Cadence 
Innovus Implementation System software was used for clock tree synthesis and corresponding 
timing analysis. For the clock tree synthesis process, files dictating physical layout characteristics, 
19 
 
layout abstracts, and process rules and constraints were required. Clock tree synthesis requires the 
synchronous circuit to be placed and routed in order to determine how much effort is required to 
distribute the clock signal across the circuit while meeting timing constraints. The clock period 




UNCLE was used to synthesize full NCL designs, or sections of NCL designs, followed by manual 
modifications and additions. The goal for these designs was not meant to be optimized for 
performance or efficiency; instead, the intention was to implement designs representing a wide 
range of general use cases in order to provide the most widely-applicable results and analysis. 
 
UNCLE supports various syntaxes of Verilog in the RTL, but often the synthesized netlist 
malfunctions during simulation. If the syntax and design styles programmed into UNCLE are not 
implemented in the RTL design, design aspects within the RTL design may be missing or incorrect 
in the netlist. Many of the issues in synthesis reside in the acknowledge network within the NCL 
netlist. This includes a variety of issues, such as missing acknowledgement logic or incorrect logic. 
These synthesis issues become more prominent when the RTL design is not a data-driven standard 
pipeline, and the errors can be difficult to diagnose and detect due to the gate-level netlist output. 
Reiterating on the original RTL design in search of UNCLE-accepted methods and syntax is 
therefore time-consuming.  
 
Buffering each NCL netlist was accomplished through custom software that calculated the 
required drive strength for each output in the netlist. The input capacitances for each NCL gate 
were acquired through simulation. Once the input capacitances were simulated for each gate, 
20 
 
custom software calculated the capacitive load for each node in the netlist. If a capacitive load for 
a given node exceeded the drive strength of the gate driving that node, a buffer, or set of buffers, 
was inserted based on required drive strength. As expected in a functional buffering process, this 
method allowed each node for each circuit to retain the rise and fall time requirements 
implemented for each gate across all NCL libraries.  
 
The design requiring the most manual changes after synthesis was the NCL UART transceiver. 
While the synchronous UART transceiver synthesized correctly without modifications to the 
design, the NCL UART transceiver was not synthesizable as a complete circuit. When synthesized 
in larger sections, the design was not functional post-synthesis and contained many issues related 
to the acknowledgement network due to the feedback within both the receiver and transmitter. The 
solution for this design was incrementally synthesizing sections for the transmitter and receiver 
separately, as well as adding manual modifications for the acknowledgement network and 
interconnect. Functionally, the NCL UART transceiver transmits and receives data correctly. 
However, it should be noted that this design potentially has the most variation from another NCL 
UART design due to the amount of required modifications.  
 
3.5 Transistor-Level Simulation 
 
Cadence Virtuoso was used for synchronous and NCL schematic development, transistor-level 
simulation, and analysis.  
 
For synchronous circuit simulation, several input data characteristics were modified in order to 
provide average use case results. Variation in the set of inputs for a given operation and the 
previous state of each gate within the circuit have a noticeable effect on the energy consumption. 
21 
 
In the case of one circuit within this work, a synchronous FSM with a higher number of states, and 
therefore more registers, measured lower consumption for registration energy when compared to 
a synchronous FSM with less state-holding registers. Effectively, this means there is more 
variation in energy consumption per operation and selecting a single clock cycle for measuring 
energy consumption is insufficient. 
 
In order to represent average case energy consumption of synchronous circuits, the input data was 
created such that half of the data switches per clock cycle. Additionally, a method of averaging the 
energy over 20 clock cycles was selected in order to more accurately represent each synchronous 
circuit. Through simulation, averaging energy data over 20 clock cycles was found to be sufficient 
in removing misrepresentation of energy consumption by outlying cases.  
 
For each NCL circuit simulation, a Verilog-A controller was implemented in order to simulate an 
asynchronous test environment. The ability of each controller to respond asynchronously to the 
feedback from each NCL circuit allowed each circuit to behave as it correctly would in an 
asynchronous system. Similar to the synchronous test setup, the data timing requirements of the 
Verilog-A controller were modified per process in order to represent the correct rise and fall time 
requirements.  
 
3.6 Multiple Vdd 
 
 
Each of the design types analyzed in this work has a unique proportion of registration logic and 
combinational logic. In order to quantify the source of advantages or disadvantages for NCL or 
synchronous circuits regarding the effect of this proportion, the energy consumption is categorized 
by its source: registration circuitry or combinational logic. Each circuit therefore contains two 
22 
 
power networks in order to distinguish energy consumption from registration circuitry and 
combinational logic.  
 
For synchronous circuits, registration circuitry includes registers and all control circuitry required 
to control registers. More specifically, the register control circuitry includes reset and clock buffer 
trees. For NCL circuits, registration circuitry includes all registers, acknowledge network logic, 
and buffers required to distribute control signals to this logic. 
 
The multiplier and registration circuits aid in this effort by illustrating the isolated effects of scaling 
combinational circuits and registration circuits. Therefore, these two circuits, and their variations, 
were used as a baseline when compared to other circuit types containing different circuit 
characteristics and proportions of combinational logic and registration circuitry. 
 
3.7 Voltage Scaling 
 
In addition to evaluating the energy consumption of each NCL and synchronous circuit at the 
nominal Vdd per process, each circuit was evaluated after scaling Vdd to the minimum voltage 




When voltage scaling was applied to each synchronous circuit, the functionality of the circuits was 
evaluated at 0.1 V decrements. No modifications were made to input data except the duration of 
reset and the rise and fall times of the input data. The rise and fall times were simulated based on 




Voltage scaling for synchronous circuits increases the sensitivity of each circuit to input 
characteristics such as reset timing (too short of a reset can result in changed initial output values) 
and input data timing. For example, in the simulation results of the 130 nm 32-tap synchronous 
130nm FIR filter at a Vdd of 0.95 V, the second data output changed depending on the duration of 
the reset. Additionally, the synchronous circuits were sensitive to the input signal rise and fall 
times.  
 
This means that for each synchronous circuit, it is not only a question of which voltages retain 
correct circuit functionality, but in some cases, other aspects, such as reset timing, must be 
modified relative to their characteristics at nominal Vdd. In other words, it was not as simple as 
changing the power supply voltage. 
 
Additionally, the output data may be correct after a few invalid cycles, which means the circuit 
could be useful, but for a given voltage and operation, the circuit’s external control would need to 
understand which outputs from the circuit are valid and which are invalid. Depending on the 
circuit, the first operation would need executed multiple times depending on how many of the 
initial outputs are expected to be invalid. Additionally, this may create problems for applications 
where the circuit is powered on for processing of a single data. If the circuit is powered on to 
calculate a single data, the time required is now increased by a number of clock cycles equal to the 
number of invalid data on the front end. 
 
However, for the purposes of this research, it was assumed that each design does not contain a 







A similar methodology was implemented in scaling the Vdd of NCL circuits. The requirements for 
scaling the voltage were modifying the rise and fall times of the input data to simulate inputs 
received from an external voltage-scaled controller. The rise and fall times were based on 
simulation of a typical gate with a typical capacitive load at each voltage applied when scaling 
Vdd.  
 
When simulating voltage-scaled NCL circuits, the main modifications for the test setup were Vdd 
and the input data rise and fall times. In order to find the rise and fall times, one gate from each 
NCL library meeting the nominal Vdd rise and fall time requirements was simulated. Each NCL 
circuit’s Verilog-A controller was then modified to correctly represent data time requirements that 

















This chapter presents NCL and synchronous circuit simulation results for the registration design, 
multiplier design, finite state machine design, pipelined combinational design, and UART 
communication design under various conditions. For each design, results on nominal-Vdd dynamic 
energy per operation, scaled-Vdd dynamic energy per operation, and static power consumption are 
discussed for comparison between NCL and synchronous designs. These comparisons are 
analyzed for trends across variations in circuits and semiconductor nodes. 
 
4.1 Registration Design 
 
 
The registration design consists of three variations across the 130 nm, 90 nm, and 45 nm processes. 
Each registration circuit contains two sets of registers where the width of each register set is 
specified by the Input Bit Width column in Table 4.1 below.  
 
Table 4.1: Registration design specifications. 
Circuit Name Input Bit Width 





4.1.1 Dynamic Energy - Nominal Vdd 
 
 
Table 4.2 shows the dynamic energy per operation for each registration circuit across the three 
semiconductor processes. The Frequency column is the frequency for both the NCL and 
synchronous circuits since each synchronous circuit was synthesized based on the speed of the 
NCL counterpart. The last column in the table is the ratio between the NCL and synchronous 
dynamic energy for each registration circuit. Additionally, NCL-to-NCL circuit data comparisons 
26 
 
across each process are utilized when evaluating and illustrating circuit scaling and semiconductor 
process trends throughout this chapter. 
 
As indicated by the Ratio data in Table 4.2, the 130 nm process yields the least energy consumption 
for NCL registration circuits when compared to synchronous circuits. The ratio of NCL-to-
synchronous dynamic energy overall indicates that NCL will consume relatively more energy than 
synchronous as circuits are implemented in smaller technology nodes.  
 
Table 4.2: Dynamic energy per operation for all registration circuits. 
Process Circuit Freq. (MHz) NCL (pJ) Synchronous (pJ) Ratio 
 
130 nm 
Reg-32 588 0.88 2.92 0.30 
Reg-64 541 1.78 5.78 0.31 




Reg-32 1000 0.83 0.61 1.36 
Reg-64 909 1.65 1.30 1.27 




Reg-32 3,448 0.43 0.34 1.26 
Reg-64 3,333 0.87 0.70 1.24 
Reg-128 2,500 1.84 1.35 1.36 
 
 
The chart in Figure 4.1 illustrates how the dynamic energy consumption trends between the NCL 
and synchronous variants of each registration circuit. Notably, the dynamic energy consumption 
for the 45 nm process is the only trend to increase as the scale of the circuit increases, signifying 
that as larger-scale registration circuits are implemented, the dynamic energy consumption of an 





Figure 4.1: NCL vs. synchronous registration circuit dynamic energy trends by process. 
 
 
From the data in Table 4.2 and the trends seen in Figure 4.2, it is indicated that the energy 
consumption of the clock buffer tree is not creating a relatively increasing effect in energy 
consumption on the synchronous registration circuits in each process. The relationship between 
synchronous clock tree and NCL acknowledge network scaling is further addressed in the results 
of the pipelined combinational circuits in Section 4.4. 
 
Figure 4.2 illustrates the dynamic energy consumption trends for each set of NCL registration 
circuits per process where the Reg-32 circuit serves as the baseline for each ratio (e.g., the NCL 
130 nm Reg-64 circuit has a ratio of 2.02 when compared to the dynamic energy consumption of 
the NCL 130 nm Reg-32 circuit). This process of normalizing data is utilized throughout the results 
chapter as a means of clarifying trends. 
 
As the scale of the registration circuits is increased, the 90 nm and 130 nm processes trend nearly 



















NCL vs. Synchronous Registration Circuit Dynamic Energy Trends
130 nm 90 nm 45 nm
28 
 
128 circuits. Overall, the energy consumption trend of NCL registration circuits will increase 
superlinearly for all technology nodes as the scale of each circuit is increased. 
 
 
Figure 2: NCL registration circuit trends by process.  
 
 
4.1.2 Dynamic Energy – Scaled Vdd 
 
 
For each NCL and synchronous registration circuit in each process, Vdd was scaled to identify the 
minimum voltage where each circuit retained correct functionality. Table 4.3 below lists the results 
for each circuit, identifying the minimum Vdd which is shown in the NCL Vdd and Sync Vdd 
columns. As described in Section 3.7.1, the clock frequency of each synchronous circuit was not 
modified when scaling Vdd to determine the minimum functional-circuit voltage. The minimum 























NCL Registration Circuit Dynamic Energy Trends
130 nm 90 nm 45 nm
29 
 
Table 4.3: Dynamic energy per operation data for voltage-scaled registration circuits. 














Reg-32 0.6 0.7 105 588 0.32 1.26 
Reg-64 0.6 0.7 100 541 0.65 2.43 




Reg-32 0.7 0.7 333 1000 0.26 0.17 
Reg-64 0.7 0.7 286 909 0.51 0.43 




Reg-32 0.4 0.5 200 3,448 0.06 0.07 
Reg-64 0.4 0.5 167 3,333 0.12 0.14 
Reg-128 0.4 0.6 154 2,500 0.24 0.45 
 
 
For each process, the NCL registration circuits share a trend of linear increase as the scale of the 
register sets is doubled. The trend for the synchronous dynamic energy consumption is similar 
across all processes, but is not always consistent within each process due to variations in voltage-
scaling. The Reg-128 circuit in the 45 nm process did not function correctly at the minimum Vdd 
of the Reg-32 and Reg-64 variants. The higher minimum Vdd for the Reg-128 circuit resulted in a 
320% increase in energy consumption over the Reg-64 circuit, as opposed to the 100% increase in 
energy consumption between the Reg-32 and Reg-64 circuits. It also should be noted that for both 
the 130 nm and 45 nm processes, each NCL registration circuit was able to achieve both a lower 
and more consistent Vdd.  
 
The trends between the NCL and synchronous voltage-scaled registration circuits are shown below 
in Figure 4.3. The value for each circuit is the ratio between NCL and synchronous energy 





Figure 4.3: NCL and synchronous scaled-Vdd dynamic energy trends for registration circuits. 
 
 
For the 130 nm process, the dynamic energy consumption trends nearly the same for both the NCL 
and synchronous registration circuits, while the 90 nm and 45nm processes trend in favor of the 
NCL registration circuit as the scale of the circuit is increased. Again, the energy trend between 
the 45 nm Reg-64 and Reg-128 is noticeably different due to the higher minimum Vdd required by 
the synchronous Reg-128 circuit (0.6 V). 
 
A dynamic energy comparison between the trends in Figure 4.3 and the previous chart (Figure 4.1) 
should also be examined. In Figure 4.1, each trend is either neutral or in favor of the synchronous 
nominal-Vdd circuit as the scale of the technology node decreases, but the voltage-scaled trends 


























NCL vs. Synchronous Voltage-Scaled Dynamic Energy Trends
130 nm 90 nm 45 nm
31 
 
4.1.3 Static Power 
 
 
In Table 4.4, the static power for each registration circuit is displayed. The last column, Ratio, is 
a comparison of each NCL registration circuit’s static power to the synchronous circuit 
counterpart. Additionally, each of the Area columns specifies the total transistor channel area per 
circuit. It is important to distinguish this measurement from other typical circuit area estimates, 
such as total gate area. For example, a gate layout often retains the same height regardless of the 
channel widths of each transistor inside the gate; therefore, you could see a drastically different 
total transistor channel area while retaining the same gate area. 
 
Table 4.4: Static power for registration circuits. 




NCL (µW) Sync. (µW) Ratio 
 
130 nm 
Reg-32 56.64 260.13 0.15 0.19 0.79 
Reg-64 112.05 517.33 0.29 0.38 0.76 




Reg-32 47.86 47.65 0.07 0.10 0.7 
Reg-64 93.67 96.41 0.12 0.33 0.36 




Reg-32 18.68 15.77 11.1 6.29 1.76 
Reg-64 37.73 31.54 22.9 14.3 1.60 
Reg-128 75.51 62.65 49.2 25.0 1.97 
 
 
Generally, the NCL registration circuits have a static power advantage, except in the 45 nm 
process. In addition to the dramatic increase of static power for all circuits in the 45 nm process 
[10], the NCL static power increases at higher rate as the scale of the registration circuit increases. 
 
For the 90 nm process, the simulation results showed a strong advantage for the NCL registration 
circuits. Here the transistor type in the 90 nm process for synchronous circuits is important because 
low threshold voltage transistors inherently leak more than standard or high threshold voltage 
32 
 
transistors. This advantageous trend for NCL is also seen in the 130 nm process. NCL registration 
circuits do not begin to evidence a disadvantage until implemented in smaller technology nodes, 
as seen in the 45 nm results. 
 
The static power difference is not as prominent for small-scale NCL and synchronous registrations 
circuits, namely the Reg-32, between each of the processes. Overall, the static power consumption 
trends in favor of the synchronous registration circuits as the scale of the technology node 
decreases. 
 
4.2 Multiplier Design 
 
 
For the multiplier design, three multiplier circuits were simulated in each of the 130 nm, 90 nm, 
and 45 nm processes. The specifications of each design are located below in Table 4.5. 
 
Table 4.5: Multiplier design specifications. 






4.2.1 Dynamic Energy – Nominal Vdd 
 
 
Table 4.6 displays the dynamic energy per operation data for each NCL and synchronous multiplier 
circuit across the 130 nm, 90 nm, and 45 nm processes. Additionally, the last column, Ratio, is a 
ratio comparing the NCL circuit’s energy to the synchronous circuit’s energy for each multiplier 






Table 4.6: NCL and synchronous multiplier dynamic energy per operation at nominal Vdd. 




Mult-8 286 1.58 0.39 4.05 
Mult-16 154 6.57 3.03 2.17 




Mult-8 357 1.48 0.21 7.05 
Mult-16 179 6.19 1.71 3.62 




Mult-8 1,428 0.63 0.10 6.3 
Mult-16 714 2.83 0.44 6.43 
Mult-32 417 12.30 1.39 8.84 
 
 
For the multiplier simulations, it was clear that the dynamic energy consumption of the 
synchronous circuits was heavily dependent on the operation. In one case, for the Mult-8 circuit, 
the difference from one result to the next was a 200% increase in energy consumption. This result 
necessitated the averaging of synchronous circuit simulation results across data operations, as 
discussed in Section 3.5 of the methodology chapter.  
 
 






















NCL vs. Synchronous Multiplier Dynamic Energy Trends
130 nm 90 nm 45 nm
34 
 
The 130 nm NCL multiplier data presents the largest improvement in the Mult-32 circuit when 
compared to the 90nm data, which is shown in Fig. 4.4 above. This trend, however, is not seen 
between the 90 nm NCL data and the 45 nm NCL data; it evident for this dataset that the Mult-8 
design shows the most improvement. As the scale of combinational logic increases, in combination 
with a smaller semiconductor node implementation, the NCL implementation of the multiplier 
circuit will tend to perform worse in comparison to the synchronous implementation. 
 
 
Figure 4.5: NCL multiplier circuits dynamic energy trends by process. 
 
 
The 130 nm and 90 nm processes result in the lowest dynamic energy for NCL multiplier circuits 
as the size of the circuit increases. Figure 4.5 illustrates a comparison of the dynamic energy for 
the NCL multiplier circuits in each process where the Mult-8 circuit’s energy serves as the 
reference point for the ratio in order to normalize the data. The trends show that the 45 nm circuits 
incur the largest energy cost for combinational logic. Additionally, it is illustrated that for 
















NCL Multiplier Circuit Dynamic Energy Trends
130 nm 90 nm 45 nm
35 
 
in smaller semiconductor technology nodes, particularly as the scale of the combinational logic 
increases.  
 
Overall, when the NCL multiplier circuit data is compared between each process, as well as to the 
corresponding synchronous data, improved energy consumption trends toward large-scale 
combinational circuits at larger technology nodes. For small-scale combinational circuits, the 
dynamic energy differences are much smaller between each process, and the 45 nm process 
performs comparatively well for NCL circuits. 
 
4.2.2 Dynamic Energy – Scaled Vdd 
 
 
The dynamic energy data for each voltage-scaled NCL and synchronous multiplier is displayed in 
Table 4.7 below. This dataset includes: the frequency of each circuit, the Vdd at which each circuit 
was simulated, the resulting energy for each of the circuits, and the ratio of energy between each 
corresponding NCL and synchronous multiplier circuit (e.g., the energy ratio of the NCL and 
synchronous Mult-8 circuit is 4.8).   
 
Table 4.7: NCL and synchronous multiplier Vdd-scaled dynamic energy per operation. 















Mult-8 0.6 0.6 63 286 0.36 0.075 4.8 
Mult-16 0.6 0.6 31 154 1.53 0.66 2.32 




Mult-8 0.7 0.6 105 357 0.46 0.050 9.2 
Mult-16 0.7 0.6 52 179 1.91 0.44 4.34 




Mult-8 0.4 0.6 71 1,428 0.09 0.021 4.29 
Mult-16 0.4 0.6 54 714 0.44 0.093 4.73 





The multiplier circuit dynamic energy trends for each of the processes are displayed in Figure 4.6 
below. The energy ratios are the ratios displayed in the Ratio column of Table 4.7. It is notable 
that the trend for each set of multiplier circuits changes for each of the processes. For the 130 nm 
process, it is clear that increasing the scale of the multiplier circuit continues to be an advantage 
for NCL. It should also be noted that this trend exists at a Vdd of 0.6 V for both NCL and 
synchronous circuits, which is atypical in itself due to the general lack of ability for synchronous 
circuits to scale Vdd as low as NCL circuits without clock frequency scaling.  
 
 
Figure 4.6: NCL vs. synchronous multiplier circuit voltage-scaled dynamic energy trends. 
 
 
When transitioning to the 90 nm process, the trend for NCL circuits is clearly advantageous as the 
scale of the multiplier is increased between the Mult-8 and Mult-16 circuits. Contrary to the 130 
nm process, however, a disadvantageous trend for NCL is shown between the Mult-16 and Mult-
32 circuits. Additionally, in the 90 nm process, the Vdd of the synchronous multipliers was not 





















NCL vs. Synchronous Multiplier Circuit Dynamic Energy Trends 
130 nm 90 nm 45 nm
37 
 
The 45 nm process indicates a dominant trend in the opposite direction of both the 90 nm and 130 
nm process; in this case, the dynamic energy data in the voltage-scaled simulations indicate that 
as the scale of the multiplier is increased, the synchronous multiplier circuits perform increasingly 
well compared to their NCL counterparts.  
 
These findings indicate that the trends for the voltage-scaled multiplier circuits follow the same 
trends as the nominal-Vdd circuits; and that, additionally, the voltage-scaled multiplier results show 
a stronger trend for each of the processes. The 130 nm process shows a comparatively lower 
dynamic energy consumption offset from the 90 nm process. While the advantage to the 
synchronous multiplier circuits in energy consumption for the 45 nm process exists for both the 
voltage-scaled and nominal Vdd data, the absolute difference in the trend noticeably improves for 
the voltage-scaled Vdd results. 
 
4.2.3 Static Power 
 
 
Table 4.8 below displays the static power for each multiplier circuit across each process. The last 
column compares the NCL multiplier’s static power consumption to that of the synchronous 















Table 4.8: Static power data for multiplier circuits. 




NCL (µW) Sync. (µW) Ratio 
 
130 nm 
Mult-8 246.13 87.31 0.16 0.06 2.7 
Mult-16 990.80 364.64 0.63 0.25 2.52 




Mult-8 197.75 62.31 0.27 0.08 3.4 
Mult-16 775.90 222.53 1.11 0.60 1.83 




Mult-8 67.50 19.39 5.03 4.76 1.06 
Mult-16 261.52 71.23 20.7 17.5 1.18 
Mult-32 1098.34 262.50 79.8 63.6 1.25 
 
 
For the 90 nm process, the effects of the low threshold voltage transistors are shown in the 
comparison between the NCL and synchronous circuits.  The synchronous Mult-8 circuit performs 
worse than the NCL Mult-8 circuit, relative to the Mult-8 circuit comparison in the 130 nm process, 
despite the implementation of the synchronous Mult-8 low threshold voltage transistor in the 90 
nm process. Additionally, it appears that the 130 nm process would retain the static power trend 
advantage for NCL for larger-scale multiplier circuits. 
 
The results in Table 4.8 trend such that NCL combinational logic static power consumption 
improves relative to the scale of the circuit for the 130 nm process. The 90 nm results illustrate the 
static power advantage for NCL as the multiplier scales, which is largely due to the low threshold 
voltage transistor implementation for the synchronous 90 nm multiplier circuits. While the trend 
for the 45 nm process is favorable to larger-scale synchronous designs, the difference is marginal. 







Figure 4.7: NCL vs. synchronous multiplier static power trend comparison. 
 
 
When considering the nominal Vdd dynamic energy trends for the multiplier and registration 
circuits, it is indicated that NCL’s most prominent dynamic energy trend exists for larger 
semiconductor nodes where the design contains a high ratio of registers to combinational logic.  
 
When analyzing the effects of voltage scaling on the registration circuit dynamic energy 
consumption, the smaller semiconductor nodes show improved consumption. While NCL circuits 
in the 45 nm process do not perform as well as the 130 nm process, NCL registration circuits 
consume less energy than their synchronous counterparts for each process. The voltage-scaled 
multiplier circuits display a similar trend for small-scale circuits. As the scale of the combinational 
logic increases, however, it is clear that trends favor the larger technology nodes, including the 
130 nm and 90 nm processes. 
 
For static power consumption of the multiplier circuit, the 45 nm process results in the most 



















NCL vs. Synchronous Multiplier Static Power Trends
130 nm 90 nm 45 nm
40 
 
registration circuits, and the fact that the 45 nm registration circuit results are fairly even for the 
NCL and synchronous implementations, it is reasonable to consider 45 nm, and smaller technology 
node processes, as the favorable process for implementing NCL designs specifically for low duty 
cycle applications. 
 
When considering the dynamic energy consumption of low duty cycle circuits, which would be 
heavily influenced by scale of the combinational logic (discussed more per circuit type later in this 
chapter), the 130 nm process is most favorable for small-scale combinational logic, but the 90 nm 
process shows a more favorable trend as the scale of the combinational logic increases. Deciding 
which of the above factors is most important would depend on the circuit designer’s application.  
 
4.3 Finite State Machine Design 
 
 
Six circuits were implemented for the finite state machine (FSM) design in order to characterize 
the effects of scaling both the state memory and the input combinational logic. The specifications 
for each circuit are listed below in Table 4.9. 
 
Table 4.9: FSM design specifications. 
Circuit Name Number of States Number of Input Bits 
FSM 8-1 8 1 
FSM 16-1 16 1 
FSM 32-1 32 1 
FSM 8-3 8 3 
FSM 16-3 16 3 
FSM 32-3 32 3 
 
 
4.3.1 Dynamic Energy – Nominal Vdd 
 
 
The dynamic energy results for each FSM circuit are displayed in Table 4.10, including the circuit 
frequency, combinational logic energy per operation, registration energy per operation, total 
41 
 
energy per operation, and NCL-to-synchronous energy ratio. The Frequency listed is based on the 
DATA-to-DATA period of the NCL circuit, and the Ratio specifies the ratio of total NCL energy 
to total synchronous energy. Each synchronous circuit was synthesized according to the 
corresponding NCL circuit’s DATA-to-DATA period.  
 
Table 4.10: NCL and synchronous FSM dynamic energy data per operation at nominal Vdd. 
























FSM 8-1 250 0.557 0.100 0.104 0.127 0.662 0.227 2.92 
FSM 16-1 200 1.342 0.125 0.140 0.201 1.482 0.326 4.54 
FSM 32-1 167 3.021 0.230 0.182 0.138 3.203 0.368 8.70 
FSM 8-3 200 1.429 0.249 0.130 0.109 1.559 0.357 4.37 
FSM 16-3 200 2.677 0.399 0.176 0.107 2.853 0.506 5.64 





FSM 8-1 400 0.65 0.02 0.12 0.04 0.77 0.06 12.83 
FSM 16-1 333 1.61 0.04 0.16 0.04 1.77 0.08 22.13 
FSM 32-1 267 3.55 0.07 0.20 0.05 3.75 0.12 31.25 
FSM 8-3 400 1.65 0.11 0.17 0.04 1.82 0.15 12.13 
FSM 16-3 313 3.17 0.13 0.22 0.04 3.39 0.17 19.94 






FSM 8-1 1,667 0.31 0.016 0.06 0.017 0.37 0.033 11.20 
FSM 16-1 1,250 0.76 0.027 0.08 0.028 0.84 0.055 15.27 
FSM 32-1 1,000 1.76 0.032 0.12 0.032 1.88 0.064 29.38 
FSM 8-3 1,333 0.74 0.039 0.07 0.015 0.82 0.054 15.19 
FSM 16-3 1,250 1.50 0.082 0.11 0.022 1.61 0.104 15.48 
FSM 32-3 1,000 3.24 0.201 0.12 0.025 3.36 0.226 14.87 
 
 
Figure 4.8 below illustrates the effect of scaling the input logic for each NCL FSM circuit (FSM 
8, FSM 16, and FSM 32). The value of each of the circuits is the ratio between the larger input bit-
width variation and the smaller input bit-width variation (e.g., FSM 8-3 and FSM 8-1). The scaling 
of the input logic is a result of increasing the number of input bits for comparison. As 
demonstrated, the scaling of input logic for these sequential circuits has nearly the same slight 
NCL advantage for each of the semiconductor processes; the most significant improvement occurs 
42 
 
when transitioning from the FSM 8 circuits to the FSM 16 circuits across all processes. An NCL 
circuit designer may expect no particular disadvantage for selecting any of the processes, outside 
of the slight advantage in the 130 nm process.  
 
 
Figure 4.8: NCL FSM input logic scaling dynamic energy trends at nominal Vdd. 
 
 
In the 130 nm process, the FSM 16-1 registration energy exceeds the energy of the FSM 32-1 
registration circuitry. The reason for this is because of a synthesis issue for the 130 nm synchronous 
FSM 16-1 circuit. While the circuit functions correctly, the synthesis tool over-buffered the FSM 
16-1 circuit, resulting in increased dynamic energy consumption, as shown in Table 4.10. It was 
verified that the additional buffering was not required, but neither NCL nor synchronous circuits 
were modified post-synthesis for buffering or any type of optimization. This factor should be 
considered when evaluating NCL and synchronous trends in the 130 nm process and when 
evaluating trends in Sections 4.3.2 and 4.3.3. 
 
Figure 4.9 displays the FSM dynamic energy trends by charting the NCL-to-synchronous energy 
















NCL FSM Input Logic Scaling Dyanmic Energy Trends
130 nm 90 nm 45 nm
43 
 
each process is represented by two separate lines: one line represents the progression in dynamic 
energy of the FSM circuits as the state memory is scaled (e.g., “130 nm – 1” represents the data 
for the 130 nm FSM 8-1, FSM 16-1, and 32-1 circuits), and one line represents the dynamic energy 
data for the FSM circuits as the input logic is scaled (e.g., “130 nm – 3” represents the data for the 
130 nm FSM 8-3, FSM 16-3, and FSM 32-3 circuits). 
 
 
Figure 4.9: NCL vs. synchronous FSM dynamic energy trends at nominal Vdd. 
 
 
The most prominent trend is that the NCL-to-synchronous ratio for each process improves as the 
input logic is scaled. This trend is best illustrated by the 45 nm process, which shows a ratio 
reduction of 49% when comparing the FSM 32-3 and FSM 32-1 circuits. Though less apparent in 
the chart, the 130 nm FSM 32-3 and FSM 32-1 circuits experience a similar reduction. The 90 nm 




















NCL vs. Synchronous FSM Dynamic Energy Trends
130 nm - 1 130 nm - 3 90 nm - 1
90 nm - 3 45 nm - 1 45 nm - 3
44 
 
Relative to the synchronous FSM circuits, the NCL FSM energy consumption is best for circuits 
implemented in the 130 nm process. Despite the increase in energy from the FSM 32-1 circuit to 
FSM 32-3 circuit, the energy ratio does not approach the minimum energy ratios for other 
processes at smaller technology nodes. It is clear that NCL FSMs consume the least energy in 
larger technology nodes and will trend better as the scale of the FSM circuit is increased, whether 
in scaling state memory or input logic. Overall, however, the required feedback structure of an 
NCL FSM circuit creates a large dynamic energy disadvantage over a synchronous FSM, even 
when implementing the NCL FSM in larger semiconductor processes. 
 
4.3.2 Dynamic Energy – Scaled Vdd 
 
 
The dynamic energy per operation results for NCL and synchronous FSM circuits are shown below 
in Table 4.11. Each data point is the result of simulating the circuit at the minimum Vdd where the 
circuit retains correct functionality. These minimum voltages are displayed for both NCL and 
synchronous circuits, as are the circuit frequency, combinational logic energy consumption, and 











Table 4.11: NCL and synchronous FSM voltage-scaled dynamic energy per operation (1/2). 

























FSM 8-1 0.6 0.7 41 250 0.15 0.040 0.027 0.050 
FSM 16-1 0.7 0.6 57 200 0.42 0.033 0.050 0.053 
FSM 32-1 0.7 0.6 50 167 0.94 0.059 0.058 0.050 
FSM 8-3 0.6 0.7 36 200 0.36 0.099 0.036 0.045 
FSM 16-3 0.6 0.7 32 200 0.70 0.150 0.047 0.049 





FSM 8-1 0.7 0.6 91 400 0.21 0.0064 0.038 0.0078 
FSM 16-1 0.7 0.6 80 333 0.50 0.0170 0.048 0.0110 
FSM 32-1 0.7 0.6 63 267 1.10 0.0160 0.069 0.0170 
FSM 8-3 0.7 0.7 83 400 0.50 0.0330 0.048 0.0150 
FSM 16-3 0.7 0.8 71 313 1.00 0.0820 0.067 0.0260 






FSM 8-1 0.4 0.6 69 1,667 0.044 0.0058 0.009 0.0085 
FSM 16-1 0.4 0.6 59 1,250 0.11 0.0083 0.012 0.0091 
FSM 32-1 0.4 0.5 50 1,000 0.24 0.0067 0.017 0.0070 
FSM 8-3 0.4 0.6 65 1,333 0.11 0.015 0.011 0.0060 
FSM 16-3 0.4 0.6 56 1,250 0.21 0.036 0.015 0.0100 
FSM 32-3 0.4 0.7 47 1,000 0.44 0.089 0.019 0.0120 
 
 
Table 4.12 is a continuation of Table 4.11, adding the total dynamic energy consumption and the 












Table 4.12: NCL and synchronous FSM voltage-scaled dynamic energy per operation (2/2). 




















FSM 8-1 0.6 0.7 41 250 0.18 0.090 2 
FSM 16-1 0.7 0.6 57 200 0.47 0.085 5.53 
FSM 32-1 0.7 0.6 50 167 0.99 0.110 9 
FSM 8-3 0.6 0.7 36 200 0.40 0.140 2.86 
FSM 16-3 0.6 0.7 32 200 0.75 0.200 3.75 





FSM 8-1 0.7 0.6 91 400 0.24 0.014 17.14 
FSM 16-1 0.7 0.6 80 333 0.55 0.028 19.64 
FSM 32-1 0.7 0.6 63 267 1.17 0.033 35.45 
FSM 8-3 0.7 0.7 83 400 0.55 0.048 11.46 
FSM 16-3 0.7 0.8 71 313 1.07 0.110 9.73 






FSM 8-1 0.4 0.6 69 1,667 0.009 0.014 0.64 
FSM 16-1 0.4 0.6 59 1,250 0.012 0.017 0.71 
FSM 32-1 0.4 0.5 50 1,000 0.017 0.014 1.21 
FSM 8-3 0.4 0.6 65 1,333 0.011 0.021 0.52 
FSM 16-3 0.4 0.6 56 1,250 0.015 0.046 0.33 
FSM 32-3 0.4 0.7 47 1,000 0.019 0.100 0.19 
 
 
One complexity in this section for comparing NCL and synchronous circuits is the inconsistency 
in the minimum Vdd for synchronous circuits. This characteristic of synchronous circuits is due to 
the fact that while each synchronous circuit meets the timing requirements, exactly how far each 
circuit exceeds the timing requirements is unique to each circuit. As a result, the minimum scaled 
Vdd at which each synchronous circuit remains functional varies from circuit to circuit. 
 
Among the six sets of synchronous FSMs (two sets of three per process), the first set of FSMs 
from the 90 nm process is the only set to have a consistent minimum Vdd, which is 0.6 V. This 
adds noise to the comparisons and resulting trends. One instance of this is displayed in the total 
energy consumption comparison between the synchronous FSM 16-3 and FSM 32-3 circuits in the 
90 nm process. Each has a total energy consumption of 0.11 pJ, but the minimum Vdd of the FSM 
47 
 
32-3 circuit is 0.2 V lower than the FSM 16-3 circuit. The variation in minimum Vdd was confirmed 
to illustrate that the margin in timing requirements of synchronous circuits is unique to each circuit. 
The result is that each synchronous circuit inherently has a unique minimum Vdd before requiring 
other inputs, such as clock frequency, to be modified. 
 
Despite the complexity of inconsistent minimum Vdd, it is shown in Figure 4.10 that the dynamic 
energy trends from the nominal Vdd data are amplified here as Vdd is scaled. For the FSM 32 circuits 
in the 130 nm process, the energy ratio between the NCL and synchronous FSM 32-3 circuits 
decreases by 73% when compared to the FSM 32-1. For the same trend in the nominal-Vdd 130 
nm FSM circuits, the transition results in a 47% decrease. This voltage-scaled dynamic energy 
trend illustrates a stronger NCL advantage when increasing the scale of the input logic relative to 
the scale of the state memory. 
 
 





















NCL vs. Synchronous FSM Voltage-Scaled Dynamic Energy Trends
130 nm - 1 130 nm - 3 90 nm - 1
90 nm - 3 45 nm - 1 45 nm - 3
48 
 
4.3.3 Static Power 
 
 
The static power results for each NCL and synchronous FSM circuit in the 130 nm, 90 nm, and 45 
nm processes are located in Table 4.13. The Ratio column specifies the static power ratio between 
each NCL and synchronous FSM circuit, and the Area columns specify the total transistor channel 
area within each circuit. 
 
Table 4.13: NCL and synchronous FSM static power data. 








FSM 8-1 53.60 25.39 0.05 0.020 2.5 
FSM 16-1 122.06 44.68 0.10 0.035 2.86 
FSM 32-1 271.90 66.66 0.19 0.049 3.88 
FSM 8-3 12.84 53.72 0.10 0.037 2.70 
FSM 16-3 236.19 95.83 0.19 0.073 2.60 





FSM 8-1 49.32 7.25 0.07 0.029 2.41 
FSM 16-1 113.12 12.09 0.17 0.030 5.67 
FSM 32-1 247.57 20.15 0.34 0.052 6.54 
FSM 8-3 111.65 15.92 0.15 0.043 3.49 
FSM 16-3 221.48 32.75 0.31 0.073 4.25 






FSM 8-1 15.66 2.26 3.96 0.72 5.5 
FSM 16-1 35.62 3.91 8.27 1.17 7.07 
FSM 32-1 78.10 6.21 17.00 1.78 9.55 
FSM 8-3 34.73 5.29 6.19 1.35 4.59 
FSM 16-3 67.75 11.37 12.60 2.79 4.52 
FSM 32-3 137.77 20.41 25.70 5.03 5.11 
 
 
It is clear that as the scale of the state memory and input logic is increased, the static power 
consumption trend improves for NCL, relative to each synchronous counterpart. In the case of the 
45 nm process, the energy ratio is not only lower for each state memory size, but the trend is far 




These trends are illustrated in Figure 4.11 by plotting the ratio of each circuit’s NCL and 
synchronous static power. In the chart, one line per semiconductor process represents the static 
power consumption of the FSM circuits as the state memory is scaled (e.g., 130 nm – 1 represents 
FSM 8-1, FSM 16-1, and FSM 32-1), and one line per semiconductor process represents the static 
power consumption of the FSM circuits as the input logic is scaled. The lines are denoted for each 
process in the legend with a suffix of ‘1’ if they represent the state memory scaled FSMs (FSM 8-
1 through FSM 32-1), and a suffix of ‘3’ if they represent the input logic scaled FSM circuits (FSM 
8-3 through FSM 32-3). 
 
 
Figure 4.11: NCL vs. synchronous FSM circuit static power trends.  
 
 
The FSM circuits present an increased amount of data due to the fact that both the scale of the 
input logic and the scale of the state memory are increased separately. Figure 4.12 below exhibits 
the effects on scaling the size of the input logic for each of the three state memory sizes of the 





















NCL vs. Synchronous FSM Circuit Static Power Trends
130 nm - 1 130 nm - 3 90 nm - 1
90 nm - 3 45 nm - 1 45 nm - 3
50 
 
the ratios in the Ratio column of Table 4.13 between each FSM circuit and its input logic scaled 
counterpart in each of the three process. For example, the ratio in Figure 4.12 for FSM 8 in the 130 
nm process is the ratio between the 130 nm FSM 8-1 and FSM 8-3 NCL-to-synchronous ratios (a 
ratio of 1.08 in this case) in the last column of Table 4.13.  
 
 
Figure 4.12: NCL vs. synchronous FSM circuit static power trends when scaling input logic. 
 
 
By charting these static power ratios in Figure 4.12 above, the effect of scaling the input logic for 
each process is clearly shown. The trend shows that the static power consumption of NCL 
continues to improve as the scale of the input logic increases and as the scale of the technology 
node is decreased. This effect is illustrated by the 45 nm process where the FSM 32-3 NCL-to-
synchronous static power ratio is 0.54× the size of the FSM 32-1 NCL-to-synchronous static power 
ratio. As the scale of the state memory is increased, the NCL FSMs continue to perform worse 
























NCL vs. Synchronous FSM Circuit Static Power Trends
130 nm 90 nm 45 nm
51 
 
Unlike the trends for the multiplier design, increasing the scale of the combinational logic 
improves the relative performance of NCL to synchronous FSM circuits. The 45 nm process 
yielded the worst relative dynamic energy consumption for NCL multipliers as the scale of the 
multipliers were increased, but increasing the scale of the combinational logic for the 45 nm NCL 
FSM 32-1 displayed the most improvement for any NCL FSM circuit. Again, this is due to the 
structure of NCL FSMs requiring two additional sets of registers to allow correct propagation of 
DATA and NULL waves.  
 
4.4 Pipeline Combinational Design 
 
 
In Section 4.4, the results of the NCL and synchronous pipeline circuits in the 130 nm, 90 nm, and 
45 nm processes are analyzed and discussed. Table 4.14 below lists each pipelined combinational 
circuit simulated and the corresponding number of stages for each circuit. 
 
Table 4.14: Pipelined combinational design specifications. 






4.4.1 Dynamic Energy – Nominal Vdd 
 
 
In Table 4.15, the dynamic energy simulation results of each pipeline circuit are listed in terms of 
circuit frequency, dynamic energy consumption per operation, and energy ratio. The dynamic 
energy consumption is listed by total consumption, combinational logic consumption, and 
registration consumption. The energy ratio listed in the last column is a comparison of total NCL 




Table 4.15: NCL and sync. pipelined comb. circuit energy per operation data at nominal Vdd. 























Pipe-8 100 9.94 3.45 3.74 7.62 13.7 11.07 1.24 
Pipe-16 100 13.10 6.13 7.64 17.6 20.70 23.73 0.87 




Pipe-8 333 10.00 0.98 2.95 1.73 12.95 2.71 4.78 
Pipe-16 333 14.2 1.87 5.87 3.80 20.7 5.67 3.65 




Pipe-8 500 4.25 0.45 2.26 1.01 6.51 1.46 4.46 
Pipe-16 500 5.53 0.62 4.26 1.96 9.78 2.58 3.79 
Pipe-32 500 9.60 0.85 8.41 3.73 18.01 4.58 3.93 
 
 
Figure 4.13 below illustrates the dynamic energy trends between each NCL and synchronous 
circuit by plotting the energy ratios shown in Table 4.15. These energy ratios clarify the effects on 
overall energy consumption as the number of stages increase. The most significant aspect of these 
trends is the absolute difference between the 130 nm process trend and the 90 nm / 45 nm process 
trend. For each pipeline circuit, the energy consumption ratio of the 130 nm circuits is around 3 
times smaller. Additionally, there is a relative decrease in the energy consumption trend between 
the 8-stage and 16-stage pipeline circuits for each process. Between the 16-stage and 32-stage 
pipeline circuits in each process, the effect of scaling the number of stages on the registration 





Figure 4.13: NCL vs. sync. pipelined comb. circuit dynamic energy trends at nominal Vdd. 
 
 
From the data in Table 4.15, it is observed that NCL has an absolute advantage over synchronous 
circuits for registration dynamic energy consumption in the 130 nm process. This trend persists 
for each of the three pipeline circuits in the 130 nm process, as illustrated in Figure 4.14 below. 
The trend also indicates that this advantage continues to exist as the number of stages is increased. 
However, as the technology node scale is decreased, NCL does not have an absolute advantage 
over synchronous circuits, but the trend is in favor of NCL as the scale of the pipeline circuit 
increases. This data distinguishes the effect of increasing the number of stages in a pipeline design 
on the registration dynamic energy consumption compared to the registration design where the 


















NCL vs. Synchronous Pipelined Combinational Circuit Dynamic 
Energy Trends 




Figure 4.14: NCL vs. sync. pipelined comb. circuit registration dynamic energy trends. 
 
 
4.4.2 Dynamic Energy – Scaled Vdd 
 
 
Table 4.16 details the Vdd, operating frequency, combinational logic dynamic energy per operation, 
and registration dynamic energy operation for each voltage-scaled pipeline circuit across each 
process.  
 
Table 4.16: NCL and sync. pipelined comb. circuit energy per operation at scaled Vdd (1/2). 
























Pipe-8 0.6 0.6 20 100 1.73 0.75 0.74 1.67 
Pipe-16 0.6 0.6 14 100 2.81 1.37 1.68 3.84 




Pipe-8 0.7 0.6 32 333 2.99 0.23 0.88 0.40 
Pipe-16 0.7 0.7 32 333 4.34 0.59 1.87 1.23 




Pipe-8 0.4 0.5 24 500 0.60 0.10 0.32 0.22 
Pipe-16 0.4 0.5 24 500 0.79 0.13 0.66 0.43 


















NCL vs. Synchronous Pipeline Combinational Circuit 
Registration Energy Trends
130 nm 90 nm 45 nm
55 
 
Table 4.17 below is a continuation of Table 4.16 displaying the total energy consumption for each 
pipeline circuit in each of the three processes. Additionally, the Ratio column is added to clarify 
the NCL dynamic energy consumption comparison to the synchronous counterpart for each 
pipeline circuit. 
 
Table 4.17: NCL and sync. pipelined comb. circuit energy per operation at scaled Vdd (2/2). 



















Pipe-8 0.6 0.6 20 100 2.47 2.42 1.02 
Pipe-16 0.6 0.6 14 100 4.49 5.20 0.86 




Pipe-8 0.7 0.6 32 333 3.87 0.63 6.14 
Pipe-16 0.7 0.7 32 333 6.21 1.82 3.41 




Pipe-8 0.4 0.5 24 500 0.92 0.31 2.97 
Pipe-16 0.4 0.5 24 500 1.45 0.56 2.59 
Pipe-32 0.4 0.5 24 500 2.70 0.99 2.73 
 
 
It is observable in the results that the NCL circuits are more consistent in regards to voltage-scaling 
limits. In the 90 nm process, there was a 0.4 V difference in the synchronous capabilities between 
the Pipe-8 and Pipe-32 variations of the pipeline design. This characteristic is not unique to the 
pipeline circuit, but is a result of the synthesized timing requirements unique to each synchronous 
circuit. The critical path of each stage in a synchronous circuit will vary in timing margins and 





Figure 4.15: NCL vs. synchronous voltage-scaled pipelined comb. circuit energy trends. 
 
 
In Figure 4.15, the dynamic energy consumption trend for the 90 nm process shifts greatly from 
the 90 nm dynamic energy trend at nominal Vdd. This difference is due to the ability of the NCL 
circuits within the 90 nm process to scale the Vdd more consistently than the corresponding 
synchronous circuits. The 45 nm process trends slightly worse for NCL as the Vdd is scaled, and 
the 130 nm process trends similarly for nominal and scaled Vdd dynamic energy consumption. 
 
4.4.3 Static Power 
 
 
The results listed in Table 4.18 display the static power data for each pipeline circuit in each of the 
three semiconductor processes. The Area columns specific the total transistor channel area per 
circuit, and the last column, Ratio, is the ratio of NCL static power to the corresponding 





















NCL vs. Synchronous Pipelined Combinational Circuit 
Dynamic Energy Trends
130 nm 90 nm 45 nm
57 
 
Table 4.18: Static power data for pipelined combinational circuits. 




NCL (µW) Sync. (µW) Ratio 
 
130 nm 
Pipe-8 900.91 1073.97 0.66 0.76 0.87 
Pipe-16 1389.25 2175.08 1.13 1.57 0.72 




Pipe-8 742.64 249.54 0.88 0.61 1.44 
Pipe-16 1166.22 498.03 1.21 1.25 0.97 




Pipe-8 257.34 75.97 47.6 25.6 1.86 
Pipe-16 400.41 153.04 87.8 52.7 1.67 
Pipe-32 742.08 319.20 148.0 109.0 1.36 
 
 
Overall, the pipeline design trends favorably for NCL for each process as the scale of the circuit 
is increased. Additionally, for the 130 nm process, the NCL consumes less static power for each 
variant of the pipeline circuit. However, for both NCL and synchronous designs, it is clear that the 
total static power consumption increases heavily as the scale of the technology node decreases. 
For example, the Pipe-32 circuit’s static power consumption is 67 times larger for NCL and 33 
times larger for the synchronous counterpart when comparing the 130 nm and 45 nm processes, 
respectively. 
 
Figure 4.16 displays the energy ratio for each process between each NCL Pipe-8 circuit and the 
corresponding NCL Pipe-16 and Pipe-32 circuits. While the 90 nm process shows an advantage, 






Figure 4.16: NCL pipelined combinational circuit static power trends across each process. 
 
 
Figure 4.17 illustrates the NCL-to-synchronous static power ratios between each pipeline circuit 
in Table 4.18. When scaling between the Pipe-16 and Pipe-32 circuits, the 45 nm process shows 
the largest trend advantage for NCL, despite both the 130 nm and 90 nm processes showing 
absolute static power consumption advantages for each circuit.  
 
 



















NCL Pipelined Combinational Circuit 
Static Power Trends















NCL vs. Synchronous Pipelined Combinational Circuits 
Static Power Trends
130 nm 90 nm 45 nm
59 
 
After analyzing the results of the pipeline design and characterizing the sources of advantages and 
disadvantages for NCL, the data can be compared to the results of the registration and multiplier 
results in Sections 4.1 and 4.2. By comparing these results, differences between scaling the size of 
a non-pipelined circuit and scaling the number of stages of a pipeline circuit are more clearly 
observed. While this is not a perfect comparison, it may serve as a guide for what to expect when 
considering each type of design. 
 
One difference is shown in the dynamic energy consumption trends at nominal Vdd. It is observed 
that the pipeline design favors NCL more heavily, especially for the 130 nm process. The 130 nm 
NCL registration design dynamic energy trends at 0.3 times the synchronous counterpart while the 
NCL multiplier design dynamic energy trends from 4 times to 2 times the dynamic energy for the 
synchronous counterpart. The 130 nm NCL pipeline design, however, trends from 1.24 times to 
0.85 times the dynamic energy of the synchronous counterpart as the scale increases, indicating an 
improvement for NCL when comparing a non-pipelined design to a pipelined design. This trend 
is also shown when comparing the registration dynamic energy from the pipelined combinational 
design to the dynamic energy of the registration design. 
 
For voltage-scaled dynamic energy consumption, an NCL advantage for the pipeline design is 
observable in the 45 nm process, which trends linearly near 3 times the energy consumption of the 
synchronous 45 nm pipeline circuits. In contrast, the 45 nm multiplier circuits consume 4 times to 
8 times the synchronous counterparts as the scale of the multiplier circuit increases.   
 
4.5 Universal Asynchronous Receiver Transmitter Design 
 
 
This section discusses the NCL and synchronous Universal Asynchronous Receiver Transmitter  
60 
 
(UART) nominal Vdd dynamic energy consumption, scaled Vdd dynamic energy consumption, and  
static power across the 130 nm, 90 nm, and 45 nm processes. In the results for this section, each 
data point for a UART circuit represents the combined data for both the transmitter and receiver. 
 
It is important to note that there are structural differences between the NCL and synchronous 
UART designs. For example, the synchronization mechanisms used in the synchronous UART 
receiver and transmitter are not required in an asynchronous paradigm. One result is that the timing 
and frequency of data for the NCL and synchronous UART circuits are dissimilar, unlike each of 
the other designs in this dissertation. These factors are important to consider when looking at the 
results and place more emphasis on trends when comparing the NCL and synchronous UART 
circuits. 
 
4.5.1 Dynamic Energy – Nominal Vdd 
 
 
The dynamic combinational, registration, and total energy data for each UART is shown in Table 
4.19. The frequency is the NCL UART DATA-to-DATA frequency, which corresponds to the 
synchronous UART clock frequency. Because of the additional synchronization functionality 
required only by the synchronous UART, there is a difference in transmission periods between the 
NCL and synchronous UART circuits. It is necessary to measure a transmission period in order to 
capture the entire behavior of the NCL and synchronous UART circuits. Therefore, each data point 





























130 nm 154 32.70 33.40 11.60 208.0 44.30 241.40 0.18 
90 nm 125 31.68 9.55 11.52 66.0 43.20 75.50 0.57 
45 nm 909 14.60 4.70 5.88 37.50 20.44 42.20 0.48 
 
 
The NCL and synchronous dynamic energy trend is charted below in Figure 4.18. The 90 nm 
results are skewed by the implementation of low threshold voltage transistors in the synchronous 
UART. However, it is clear that the NCL circuits increase in energy consumption, relative to the 
synchronous circuits, as the scale of the technology node decreases. This trend aligns with data 
that was seen in the NCL registration and multiplier circuits in that the dynamic energy 
consumption increases superlinearly as the technology node decreases in scale. Additionally, while 
the NCL UART design does not require the synchronous UART’s synchronization mechanisms, 
it does incur the register and energy costs of implementing NCL FSMs for the transmitter and 
receiver, as compared to the synchronous FSM dynamic energy trends. 
 
 

















The data shows that the combinational logic energy consumption and registration energy 
consumption for the NCL UART circuits scale evenly as the scale of the technology node 
decreases. However, the synchronous UART combinational logic displays a larger reduction in 
energy consumption, which is consistent with data from the synchronous FSM circuits across each 
process.  
 
The UART combinational and registration dynamic energy trends are shown in Figure 4.19. The 
data presented is the ratio of NCL to synchronous combinational and registration energy data for 
each process. The variation in transistor threshold voltages between the 90 nm NCL and 
synchronous UART circuits creates an outlier in the data, but the trend is clear for both registration 
and combinational energy data. NCL has a disadvantageous trend as the technology node becomes 
smaller, but NCL also has a distinct advantage in registration energy due to the additional 
synchronization circuitry required for the synchronous UART to function correctly.  
 
 



















4.5.2 Dynamic Energy – Scaled Vdd 
 
 
The NCL and synchronous Vdd-scaled UART dynamic energy data for one 8-bit transmission 
period is shown in Tables 20 and 21 below. While the NCL UART frequencies are reduced due to 
the transistor performance loss resulting from Vdd scaling, the synchronous UART circuits are able 
to function correctly down to the indicated voltages at the same clock frequency due to slack in 
timing margins at nominal Vdd. For the dynamic energy data, both combinational energy and 
registration energy are shown. In Table 21, the total dynamic energy is displayed for each NCL 
and synchronous UART circuit as well as the total dynamic energy ratio between each NCL and 
corresponding synchronous UART circuit.  
 























130 nm 43 154 0.7 0.6 10.30 7.61 3.87 47.50 
90 nm 31 125 0.7 0.6 10.20 2.16 3.93 1.53 
45 nm 44 909 0.4 0.5 2.17 0.94 0.98 8.42 
 
 


















130 nm 43 154 0.7 0.6 14.17 55.11 0.26 
90 nm 31 125 0.7 0.6 14.13 3.69 3.83 
45 nm 44 909 0.4 0.5 3.15 9.36 0.34 
 
 
The trend-skewing effect of the difference in 90 nm device threshold voltages in the NCL and 
synchronous UART implementations is also seen in this data. However, it is clear that NCL’s 
advantage over the synchronous UART implementation increases, relative to the nominal-Vdd 
64 
 
UART dynamic energy trends, as the scale of the technology node reaches 45 nm. The NCL-to-
synchronous ratio of total dynamic energy is charted in Figure 4.20.   
 
 
Figure 4.20: Vdd-scaled UART dynamic energy trend. 
 
 
In the 45 nm process, NCL’s Vdd-scaling advantage over the nominal-Vdd results is due to the 
lower minimum scaled Vdd. The 45 nm NCL UART consumes 34% of the dynamic energy required 
for the synchronous counterpart. While the 45 nm synchronous UART would decrease its energy 
consumption with additional voltage scaling enabled by frequency scaling, the additional circuitry 
to provide this functionality would also increase energy consumption. Overall, NCL’s trending 
advantage from the nominal-Vdd dynamic energy data is enhanced for the scaled-Vdd dynamic 
energy data. 
 
4.5.3 Static Power 
 
The static power data for each NCL and synchronous UART circuit in each process is shown below 
















NCL vs. Synchronous Vdd-Scaled UART Dynamic Energy Trend
65 
 
and the total transistor channel area for each circuit. 
 
Table 4.22: Static power data for UART circuits. 




NCL (µW) Sync. (µW) Ratio 
130 nm 285.38 344.42 0.35 0.14 2.50 
90 nm 259.83 96.93 0.27 0.26 1.04 
45 nm 84.76 29.25 23.80 9.80 2.43 
 
 
For the static power data, the trend is a similar NCL-to-synchronous ratio of static power 
consumption between the largest process, 130 nm, and the smallest process, 45 nm.  The data is 
charted below in Figure 4.21. While the NCL-to-synchronous static power ratio at 45 nm is nearly 
identical to the 130 nm process, the increase in static power due to the 45 nm process’ increased 
device leakage is clearly seen. For low duty cycle applications, the static power characteristics of 
the NCL UART are important to consider, as the trend is roughly 2.5× NCL static power over the 
synchronous static power in addition to the increased static power consumption of smaller 
technology node processes. This data is consistent in the NCL and synchronous static power 






Figure 4.21: NCL vs. synchronous static power results for each process. 
 
 
Overall in energy consumption, the NCL UART performs better than the synchronous UART. At 
nominal Vdd, NCL has the dynamic energy advantage and displays an increase in this advantage 
during Vdd scaling. The dynamic energy consumption disadvantages of the NCL FSM registration 
structure are seen in the results, but are not as prominent as the effects of the required 
synchronization mechanisms for the synchronous UART. These synchronous implementation 
requirements provide both a transmission rate and dynamic energy advantage for NCL. 
Additionally, NCL performs best at larger technology nodes. This is specifically seen in the static 





































Through transistor-level simulation of static power and dynamic energy at nominal Vdd and scaled 
Vdd, the analysis of NCL and synchronous circuits began with evaluating the characteristics of 
combinational and registration designs. Following this, the results were analyzed when comparing 
additional circuit types, including finite state machines, pipelined combinational circuits, and the 
UART, as each type contains different proportions of combinational logic and registration 
circuitry, as well as how they each are utilized.  
 
The dynamic energy consumption results of the registration design illustrated a considerable 
advantage for NCL in the 130 nm process at a consistent ratio of 0.3 to the synchronous circuits, 
regardless of scale. While the synchronous circuits performed better in the 45 nm process, the 
analysis shows an NCL-to-synchronous dynamic energy ratio of 1.3 was retained. Similar trends 
were shown in the multiplier circuits, which illustrated that NCL’s strongest performance was in 
the 130 nm process. When combining these characteristics, as in the pipelined combinational 
design, it is clear to expect that the lowest relative NCL energy consumption is also found in the 
130 nm process. In contrast, the 45 nm process NCL-to-synchronous trend exceeds that of the 130 
nm process when scaling the pipelined combinational design. 
 
For the NCL finite state machines, the 130 nm process yields the best NCL-to-synchronous 
dynamic energy ratios. Additionally, the energy ratio trend is fairly neutral as the scale of the FSM 
is increased among the FSMs with larger input logic circuits. This trend continues as the size of 
the technology node is decreased to 45 nm. The UART design behaves similarly in the largest and 
smallest technology nodes; the advantages of the NCL UART enables an 80% reduction and a 
50% reduction in dynamic energy consumption at the 130 nm and 45 nm process, respectively. 
68 
 
While the NCL UART contains FSMs, which are generally disadvantageous to synchronous FSMs 
in energy and power performance, the requirement of dedicated synchronization mechanisms for 
the synchronous UART resulted in a significant NCL UART dynamic energy consumption 
advantage across each technology node. 
 
When analyzing voltage-scaling, it is clear that NCL contains a clear advantage due to the quasi-
delay insensitivity characteristic. Without additional circuitry, NCL is able to adapt to circuit 
performance degradation as a result of decreased supply voltage. The results showed NCL’s 
consistent capability to scale to a lower voltage as compared to each synchronous counterpart. 
However, dynamic frequency scaling was not implemented when voltage-scaling each 
synchronous circuit and therefore the energy consumption effects were not analyzed in terms of 
additional energy consumption and maximum voltage scaling ability.  
 
Consistently, it was clear that NCL retained the largest voltage-scaling dynamic energy advantage 
in the 130 nm process, except for the NCL registration design which performed best in the smallest 
technology node simulated, the 45 nm process. Additionally, NCL displayed a trending advantage 
in energy performance as the scale of each circuit was increased. In many cases, such as the 
pipelined combinational circuits, the synchronous circuits consumed less energy, but the energy 
ratio between each NCL circuit and corresponding synchronous circuit would continue to decrease 
as the scale of the circuit was increased. This trend is especially the case for the FSM design. As 
the scale of the logic external to the state memory is increased, the disadvantage of the energy cost 
of the state memory is increasingly minimized. 
 
In analyzing the static power consumption results for the basic designs, the registration and 
multiplier designs, NCL’s static power advantage was shown in the 130 nm process. As the scale 
69 
 
of the multiplier design increased, the static power trend between the NCL and synchronous 
circuits favored NCL. When characteristics of both of these designs are combined, as seen in the 
pipelined combinational design, the circuit scaling trend favors NCL. For larger 45 nm pipelined 
circuits, the designer can expect similar static power characteristics while retaining the adaptability 
of NCL. 
 
The structural disadvantage of the NCL finite state machine is shown again in the static power 
results. In the best comparative case, NCL consumes 2.5 times more static power over the 
synchronous counterpart in the 130 nm process and scales well in the NCL-to-synchronous power 
ratio as the scale of the state memory and input logic is increased. While the UART contains FSMs, 
which are generally disadvantageous in energy and power performance, the lack of logic dedicated 
to synchronization mechanisms results in improved static power consumption across each 
technology node. However, the UART results align with other circuit types indicating that the 
static power results are best in the 130 nm process. 
 
Static power consumption heavily impacts the low-power capabilities for duty cycle dependent 
applications. While the NCL-to-synchronous static power ratio in many cases may be acceptable, 
the scale of the power consumption compared to the 130 nm process is large. It is also important 
to note the synchronous static power results do not include the control circuitry required to enable 
low or variable duty cycle functionality which would increase the static power consumption. 
 
This dissertation work serves to clarify how NCL performs across various circuit types and circuit 
characteristics. With this information, a circuit designer is capable of understanding better how 
advantageous or disadvantageous a specific application may be for NCL while also considering 
the adaptability and robustness provided by NCL. Applying the relevant design characteristics of 
70 
 
a circuit designer’s application will result in the most successful guidance on what to expect from 
an asynchronous design.  
 
Future work in this area would expand on the characterization of NCL at smaller technology nodes, 
additional circuit characteristics, and areas of less traditional design, such as artificial intelligence. 
Additionally, this work could expand into multi-chip systems. As silicon process scaling 
approaches the boundaries of feasible mass IC fabrication, multi-chip heterogeneous architectures 
will continue to become more prevalent to the meet the demand for cost and performance. 
Characterizing NCL’s capabilities among various aspects of these architectures, such as inter-chip 




















[1] K. M. Fant, Logically Determined Design. New York, NY, USA. Wiley, 2005. 
[2] J. Di and S. Smith, Designing Asynchronous Circuits using NULL Convention Logic (NCL). 
Morgan & Claypool Publishers, (2009). 
 
[3] J. Brady, A. M. Francis, J. Holmes, J. Di, and H. A. Mantooth, “An Asynchronous Cell Library 
for Operation in Wide-Temperature & Ionizing-Radiation Environments,” in 2015 IEEE 
Aerospace Conference. 
 
[4] P. Shepherd, S. C. Smith, J. Holmes, A. M. Francis, N. Chiolino, and H. A. Mantooth, “A 
Robust, Wide-Temperature Data Transmission System for Space Environments,” in Proceedings 
of the IEEE Aerospace Conference, Big Sky, MT, 2013. 
 
[5] B. Hollosi, M. Barlow, G. Fu, C. Lee, J. Di, S. C. Smith, H. A. Mantooth, M. Schupbach, 
“Delay-Insensitive Asynchronous ALU for Cryogenic Temperature Environments,” in 2008 IEEE 
51st Midwest Symposium on Circuits and Systems. 
 
[6] A. Vakil, J. K.P, S. Hegde, and D. Koppad, “Comparative Analysis of NULL Convention Logic 
and Synchronous CMOS Ripple Carry Adders,” in 2017 IEEE Second International Conference 
on Electrical, Computer, and Communication Technologies (ICECCT). 
 
[7] A. Arthurs, J. Roark, and J. Di, “Ultra-Low Voltage Digital Circuit Design: A Comparative 
Study,” in 2012 IEEE Faible Tension Faible Consommation.  
 
[8] K. M. Fant and S. A. Brantd. “NULL Convention Logic: a complete and consistent logic for 
asynchronous digital circuit synthesis,” in ASAP 96, Chicago, IL, Aug. 1996. 
 
[9] R. B. Reese. “Uncle (Unified NCL Environment),” my.ece.msstate.edu/faculty/reese/uncle/ 
UNCLE.pdf. December 2011. 
 
[10] A. Rastogi, K. Ganeshpure, and S. Kundu. “A study on impact of leakage current on dynamic 
power,” in 2007 IEEE International Symposium on Circuits and Systems.  
 
 
 
 
 
 
