University of Arkansas, Fayetteville

ScholarWorks@UARK
Electrical Engineering Undergraduate Honors
Theses

Electrical Engineering

5-2011

Reducing energy usage of NULL Convention Logic circuits using
NULL Cycle Reduction combined with supply voltage scaling
Brett Sparkman
University of Arkansas, Fayetteville

Follow this and additional works at: https://scholarworks.uark.edu/eleguht
Part of the Controls and Control Theory Commons

Citation
Sparkman, B. (2011). Reducing energy usage of NULL Convention Logic circuits using NULL Cycle
Reduction combined with supply voltage scaling. Electrical Engineering Undergraduate Honors Theses
Retrieved from https://scholarworks.uark.edu/eleguht/19

This Thesis is brought to you for free and open access by the Electrical Engineering at ScholarWorks@UARK. It has
been accepted for inclusion in Electrical Engineering Undergraduate Honors Theses by an authorized administrator
of ScholarWorks@UARK. For more information, please contact scholar@uark.edu.

REDUCING ENERGY USAGE OF NULL CONVENTION
LOGIC CIRCUITS USING NULL CYCLE REDUCTION
COMBINED WITH SUPPLY VOLTAGE SCALING

REDUCING ENERGY USAGE OF NULL CONVENTION
LOGIC CIRCUITS USING NULL CYCLE REDUCTION
COMBINED WITH SUPPLY VOLTAGE SCALING

A thesis submitted to the Honors College in partial
fulfillment of the requirements for the degree of
Honors Bachelors of Science
in Electrical Engineering

By

Brett Sparkman

May 2011
University of Arkansas

ABSTRACT

The NULL Cycle Reduction (NCR) technique can be used to improve the performance of
a NULL Convention Logic (NCL) circuit at the expense of power and area. However, by
decreasing the supply voltage of certain components, the power of the NCR circuit can be
reduced. Since NCR has increased performance, it could be possible to decrease the power while
maintaining the original performance of the circuit.
To verify this, the NCR circuit will be implemented using a 4-bit by 4-bit dual-rail
multiplier as the test circuit. This circuit will be simulated in ModelSim to ensure functionality,
synthesized into a Verilog netlist using Leonardo, and imported into Cadence to perform
transistor-level simulations for power calculations. The supply voltage of the duplicated circuits
will be decreased until the performance matches the design of the original multiplier, resulting in
overall lower energy usage.

This thesis is approved for recommendation to the Honors College.

Thesis Director:

Dr. Scott C. Smith

THESIS DUPLICATION RELEASE

I hereby authorize the University of Arkansas Libraries to duplicate this thesis when needed for
research and/or scholarship.

Agreed~~

Refused

----------------------------------

ACKNOWLEDGMENTS

I thank Dr. Scott Smith, my thesis advisor, for the project topic and opportunity to
conduct this undergraduate research. His continued assistance with verification and optimistic
outlook truly made this an enjoyable experience.
In addition, I thank Liang Zhou, a graduate student, for his continued help with all of the
software used during the project. This included synthesizing VHDL files in Leonardo, importing
the files into Cadence, and running simulations in Cadence using UltraSim.
I would also like to thank my fiancee, Alexandra Gammill, for the continued support
throughout this project and my mother, Michele Walker, for the proofreading assistance.

v

TABLE OF CONTENTS

1. Introduction .............................................................................................................................. 1

1.1 Problem ................................................................................................................................ 1
1.2 Thesis Statement .................................................................................................................. 1
1.3 Approach .............................................................................................................................. 1
1.4 Potential Impact ................................................................................................................... 2
2. Background .............................................................................................................................. 3

2.1 Power Reduction ........................ .......................................................................................... 3
2.2 NULL Convention Logic (NCL) Overview [1] ................................................................... 3
2.2.1 Delay-insensitivity ........................................................................................................ 4
2.2.2 Logic gates ..................................................................................................................... 5
2.2.3 Completeness ................................................................................................................. 6
2.2.4. Observability ................................................................................................................. 7
2.3 NULL Cycle Reduction (NCR) Overview [1] ..................................................................... 7
3. Approach and Implementation ............................................................................................. 10

3.1 VHDL with ModelSim ...................................................................................................... 10
3.1.1 Modifications to VHDL .............................................................................................. 10
3 .1.2 Simulating the Original Design .................................................................................. 10
3 .1.3 Simulating the NCR Design ........................................................................................ 11
3.2 Verilog Synthesis in Leonardo .......................................................................................... 11
3.2.1 Importing VHDL ........................................................................................................ 12
3.2.2 Running Scripts ........................................................................................................... 12

Vl

3.3 Cadence Spectre ................................................................................................................. 12
3.3.1 Importing Verilog ....................................................................................................... 12
3.3.2 Generating Controller ................................................................................................. 13
3.3.3 Power Simulations ...................................................................................................... 14
3.3.4 Two-Multiplier Circuit Modification .......................................................................... 16
3.3.5 Four-Multiplier Circuit Modification ......................................................................... 18

4. Conclusions ............................................................................................................................. 20
References .................................................................................................................................... 22

A.

Single Multiplier VHDL Files ............................................................................................ 24
A.1 demux.vhd ......................................................................................................................... 24
A.2 demux_gen. vhd ................................................................................................................. 25
A.3 mult4x4_1stage.vhd .......................................................................................................... 25
A.4 mult4x4 mc.vhd ............................................................................................................... 33
A.5 mux_gen.vhd ..................................................................................................................... 35
A.6 select.vhd .......................................................................................................................... 35
A.7 tb- mult4x4- full.vhd .......................................................................................................... 37

B. Verilog File .............................................................................................................................. 52
C. Additional-Multiplier VHDL Files ....................................................................................... 57

C.1 Two-Multiplier Design: mult4x4_1stage2.vhd ................................................................. 57
C.2 Four-Multiplier Design: mult4x4_1stage4.vhd ................................................................. 57

Vll

LIST OF FIGURES

Figure 1. Thnm threshold gate. [1] ................................................................................................. 6
Figure 2. NCR architecture ............................................................................................................. 8
Figure 3. One 4-bit by 4-bit multiplier .......................................................................................... 11
Figure 4. Two-multiplier design ........................................................................... ........................ 16
Figure 5. Four-multiplier design ................................................................................................... 18

LIST OF TABLES

Table 1. One-multiplier design results .......................................................................................... 15
Table 2. Two-multiplier design results ......................................................................................... 17
Table 3. Four-multiplier design results ......................................................................................... 19

Vlll

1. INTRODUCTION

1.1 Problem
As circuits are continually produced with increasing numbers of transistors and switching
frequencies, circuit power also increases. Although these improvements can drastically raise the
performance of circuits, they also have a downside: the circuits consume larger amounts of
power. This increase in power consumption has several downsides: the circuits will heat up more
due to higher power dissipation, the circuits will last shorter amounts of time on a single battery
charge, and the circuits have a higher cost of operation for the same amount of time.

1.2 Thesis Statement
The goal of this research is to investigate applying the NULL Cycle Reduction (NCR)
technique to a circuit and reducing the supply voltage of the duplicated portion in an effort to
reduce the overall energy usage of the circuit while maintaining equivalent performance.

1.3 Approach
In order to determine if reducing the supply voltage of a circuit can reduce its power, a
series of simulations was performed. First, a simulation of the VHDL design was performed in
Mode!Sim to ensure that the circuit performed as desired. Next, the files were synthesized using
Leonardo in order to generate a Verilog netlist, which was then imported into cadence. The final
steps involved running numerous transistor-level simulations in Cadence to determine the effects
of reducing the supply voltage in terms of power and performance.

1

1.4 Potential Impact
This research has the potential to impact power reduction methods used by Dr. Smith and
his graduate students. If a standard circuit performs as desired but consumes too much power,
this technique could be applied to lower the power of the circuit while maintaining the
performance.

2

2.BACKGROUND

2.1 Power Reduction
The power of a circuit is given by the equation:
P =ex: CLVJvf+cx: tscVvvfpeak

+ Vvvlleakage

where ex: is the activity factor, CL is the capacitance of the circuit, Vvv is the supply voltage, f is
the clock frequency, tsc is the short-circuit time, lpeak is the short-circuit current spike
amplitude, and heakage is the leakage current of the transistors. Since Voo is present in all three
terms, a large reduction in power consumption can be achieved by reducing the supply voltage.
However, reducing the voltage can have a negative impact on the circuit, decreasing the
performance and potentially causing the circuit to perform incorrectly. This resulting dilemma is
one of the large tradeoffs in digital design: performance vs. power.

2.2 NULL Convention Logic (NCL) Overview [1]
The following two sections are the work of Dr. Scott C. Smith. NCL offers a self-timed
logic paradigm where control is inherent with each datum. NCL follows the so-called "weak
conditions" of Seitz's delay-insensitive signaling scheme [2]. As with other self-timed logic
methods discussed herein, the NCL paradigm assumes that forks in wires are isochronic [3]. The
origins of various aspects of the paradigm, including the NULL (or spacer) logic state from
which NCL derives its name, can be traced back to Muller's work on speed-independent circuits
in the 1950s and 1960s [4].

3

2.2.1 Delay-insensitivity

NCL uses symbolic completeness of expression [5] to achieve delay-insensitive behavior.
A symbolically complete expression is defined as an expression that only depends on the
relationships of the symbols present in the expression without a reference to their time of
evaluation. In particular, dual-rail signals or other Mutually Exclusive Assertion Groups
(MEAGs) can be used to incorporate data and control information into one mixed signal path to
eliminate time reference [6]. A dual-rail signal, D, consists of two wires, D 0 and D 1, which may
assume any value from the set {DATAO, DATAl, NULL}. The DATAO state (D0 = 1, D 1 = 0)
corresponds to a Boolean logic 0, the DATAl state (D0 = 0, D 1 = 1) corresponds to a Boolean
logic 1, and the NULL state (D 0 = 0, D 1 = 0) corresponds to the empty set meaning that the value
of D is not yet available. The two rails are mutually exclusive so that both rails can never be
asserted simultaneously; this state is defined as an illegal state. Dual-rail signals are space
optimal 1-out-of-N delay-insensitive codes requiring two wires per bit. Other higher order
MEAGs are not wire count optimal; however, they can be more power efficient due to the
decreased number of transitions per cycle.
Most multi-rail delay-insensitive systems [2,5,7], including NCL, have at least two
register stages, one at both the input and at the output. Two adjacent register stages interact
through their request and acknowledge lines, Ki and K 0 , respectively, to prevent the current
DATA wavefront from overwriting the previous DATA wavefront, by ensuring that the two
DATA wavefronts are always separated by a NULL wavefront.

4

2.2.2 Logic gates
NCL, like [3], differs from the other delay-insensitive paradigms [2,7] in that these other
paradigms only utilize one type of state-holding gate, the C-e/ement [4]. A C-element behaves as
follows: when all inputs assume the same value, then the output assumes this value; otherwise
the output does not change. On the other hand, all NCL gates are state-holding. Thus, NCL
optimization methods can be considered as a subclass of the techniques for developing delayinsensitive circuits using a pre-defined set of more complex components, with built-in hysteresis
behavior.
NCL uses threshold gates for its basic logic elements [8]. The primary type of threshold
gate is the THmn gate, where 1 :::; m :::; n, as depicted in Figure 1. THmn gates have n inputs. At
least m of the n inputs must be asserted before the output will become asserted. Because NCL
threshold gates are designed with hysteresis, all asserted inputs must be de-asserted before the
output will be de-asserted. Hysteresis ensures a complete transition of inputs back to NULL
before asserting the output associated with the next wavefront of input data. Therefore, a THnn
gate is equivalent to ann-input C-element and a THin gate is equivalent to ann-input OR gate.
In a THmn gate, each of the n inputs is connected to the rounded portion of the gate; the output
emanates from the pointed end of the gate; and the gate's threshold value, m, is written inside of
the gate. NCL threshold gates may also include a reset input to initialize the output. Resettable
gates are denoted by either aD or anN appearing inside the gate, along with the gate's threshold,
referring to the gate being reset to logic 1 or logic 0, respectively.

5

lnput1---r
Input 2

Figure 1. Thmn threshold gate. [1]
By employing threshold gates for each logic rail, NCL is able to determine the output
status without referencing time. Inputs are partitioned into two separate wavefronts, the NULL
wavefront and the DATA wavefront. The NULL wavefront consists of all inputs to a circuit
being NULL, while the DATA wavefront refers to all inputs being DATA, some combination of
DATAO and DATAl for dual-rail inputs. Initially, all circuit elements are reset to the NULL
state. First, a DATA wavefront is presented to the circuit. Once all of the outputs of the circuit
transition to DATA, the NULL wavefront is presented to the circuit. After all of the outputs of
the circuit transition to NULL, the next DATA wavefront is presented to the circuit. This
DATA/NULL cycle continues repeatedly. As soon as all outputs of the circuit are DATA, the
circuit's result is valid. The NULL wavefront then transitions all ofthese DATA outputs back to
NULL. When the outputs transition back to DATA again, the next output is available. This
period is referred to as the DATA-to-DATA cycle time, denoted as T00, and has an analogous
role to the clock period in a synchronous system.

2.2.3 Completeness
The completeness of input criterion [5], which NCL combinational circuits and circuits
developed from other delay-insensitive paradigms [2,7] must maintain in order to be delayinsensitive, requires the following criteria: 1. all the outputs of a combinational circuit may not
transition from NULL to DATA until all inputs have transitioned from NULL to DATA, and 2.
6

all the outputs of a combinational circuit may not transition from DATA to NULL until all inputs
have transitioned from DATA to NULL. In circuits with multiple outputs, it is acceptable,
according to Seitz's weak conditions [2], for some of the outputs to transition without having a
complete input set present, as long as all outputs cannot transition before all inputs arrive.
Furthermore, circuits must also adhere to the completion-completeness criterion [9],
which requires that completion signals only be generated such that no two adjacent DATA
wavefronts can interact within any combinational component. This condition is only necessary
when the bit-wise completion strategy is used with selective input-incomplete components, since
it is inherent when using the full-word completion strategy and when using the bit-wise
completion strategy with no input-incomplete components [9].

2.2.4. Observability

One more condition must be met to ensure delay-insensitivity for NCL and other delayinsensitive circuits [2,7]. No orphans may propagate through a gate [10]. An orphan is defined as
a wire that transitions during the current DATA wavefront, but is not used in the determination
of the output. Orphans are caused by wire forks and can be neglected through the isochronic fork
assumption [3] as long as they are not allowed to cross a gate boundary. This observability
condition, also referred to as indicatability or stability, ensures that every gate transition is
observable at the output, which means that every gale that transitions is necessary to transition at
least one of the outputs.

2.3 NULL Cycle Reduction (NCR) Overview [1]

The technique for reducing the NULL cycle, thus increasing throughput for any delayinsensitive circuit developed according to the paradigms [2,5,7], is shown in Figure 2. The NCR
7

architecture m Figure 2 is specifically designed for dual-rail circuits utilizing full-word
completion, where all bits at the output of a registration stage are conjoined to form one
completion signal. Bit-wise completion only sends the completion signal from bit b in register
back to the bits in registeri-I that took part in the calculation of bit b. This method may therefore
require fewer logic levels in the completion circuitry than that of full-word completion, thus
increasing throughput.
Demultiplexer

Circuit#l

DATA

Multiplexer

NULL

A
Kil

rfd

Ko

Ki

rfn

DATA
0

Completion
Detection

0

Reset

Circuit 112

NULL

8

t-!fn- Ko
52

Sl

Sl

Ki2

tfn

Reset to
Ko

Trfn

rfdT

rfd

Ki

NULL

Output
)

DATA
8

Ki1

$2

1000

Sl

52

0010

0010

Sequencer #1

Reset

Reset to
NUll

DATA

Input

~>-

A

1000

Sequencer #2

Reset

Reset

T

T

Ki

rfd

Ki

Figure 2. NCR architecture.
Circuit #I and Circuit #2 are both dual-rail delay-insensitive combinational circuits

utilizing full-word completion, developed from one of the following delay-insensitive paradigms
[2,5, 7], with at least an input and output registration stage. (Additional registration stages may be
present, thus further partitioning the combinational circuitry.) Both circuits have identical
functionality and are both initialized to output NULL and request DATA upon reset. In the case
of the NCL paradigm, the combinational functionality can be designed using the Threshold
Combinational Reduction method described in [11]; and the resulting circuit can also be
pipelined, as described in [12], to further increase throughput. The Demultiplexer partitions the
input, D, into two outputs, A and B, such that A receives the first DATA/NULL cycle and B
8

receives the second DATA/NULL cycle. The input continuously alternates between A and B.
The Completion Detection circuitry detects when either a complete DATA or NULL wavefront
has propagated through the Demultiplexer and requests the next NULL or DATA wavefront,
respectively. Sequencer #1 is controlled by the output of the Completion Detection circuitry and
is used to select either output A orB of the Demultiplexer. Output A of the Demultiplexer is input
to Circuit # 1, when requested by Kn; and output B of the Demultiplexer is input to Circuit #2,
when requested by Ki2. The outputs of Circuit # 1 and Circuit #2 are allowed to pass through their
respective output registers, as determined by Sequencer #2, which is controlled by the external
request, Ki. The Multiplexer rejoins the partitioned data path by passing a DATA input on either
A or B to the output, or asserting NULL on the output when both A and B are NULL. Figure 2

shows the state of the system when a DATA wavefront is being input before its acknowledge
flows through the Completion Detection circuitry, and when a DATA wavefront is being output
before it is acknowledged by the receiver.

9

3. APPROACH AND IMPLEMENTATION

3.1 VHDL with ModelSim

Previously used VHDL files were supplied by Dr. Smith to allow the project to start.
These files included a 4-bit by 4-bit multiplier to use as the test circuit and a testbench to test the
circuit. Also included were the files necessary for implementing the NCR architecture: a dual-rail
signal declaration, mappings of common NCL gates, a demultiplexer, a multiplexer, a sequence
generator, and the completion detection circuitry.

3.1.1 Modifications to VHDL

Unfortunately, when the previous VHDL code was written, it was not intended for use in
Cadence. Several of the K;, K0 , and reset signals were declared as dual-rail input or outputs. Only
one of the rails was used in the design, and the unused rail could cause potential problems when
running simulations in Cadence. All of the design files using these dual-rail signals were
modified to be standard logic input signals.
There were further modifications that needed to be done to allow the code to work
properly. Since the design was old, it was checked with the Cadence libraries for NCL
component discrepancies. Unfortunately, the mappings for TH12bx0, TH24compx0, and
THandOxO were different. To correct this error, the NCL map file was altered to account for the
different input and output mappings.

3.1.2 Simulating the Original Design

Initially, a single multiplier, shown in Figure 3, was compiled with its testbench and
simulated in ModelSim to ensure that functionality. Several output vectors were compared to
10

their desired values based on the input. These outputs were correct, so the design was
functioning properly. The testbench also included an "incorrect" signal that would transition to
logic high if an output was incorrect. This always stayed logic low, so the design was functioning
as desired. This signal was used in future simulations to ease the functionality checking.

4x4 Dual-Rail
Multiplier

[7 .. 0)

Kilf----

Figure 3. One 4-bit by 4-bit multiplier.

3.1.3 Simulating the NCR Design
Using the NCR architecture shown in Figure 2, Circuit #1 and Circuit #2 each consisted
of a 4-bit by 4-bit multiplier. The design files, found in Appendix A and the following files from
[13], NCL_signals.vhd, NCL_gates.vhd, NCL_components.vhd, and NCL_functions.vhd, were
compiled, and the modified design was simulated to guarantee that it also was functional. The
incorrect signal always remained low, so the design was functioning properly.

3.2 Verilog Synthesis in Leonardo
To perform power simulations using Cadence, it was necessary to have a Verilog netlist
of the circuit. Leonardo, a VHDL to Verilog synthesis tool, was used to generate this file. A
series of steps had to be followed in order to secure a proper Verilog file generation.

11

3.2.1 Importing VHDL

To import the VHDL, a sample library was used in Leonardo. The ASIC/Sample/SCL05u
library was loaded. The design files were then read into Leonardo in a top-down order to ensure
that all entities were mapped properly. The design was then optimized through flattening. Once
this was done, the correct netlist was generated by selecting the output type to be Verilog.

3.2.2 Running Scripts

Unfortunately, the generated file was not ready to be imported into Cadence: it lacked the
fanout, buffering, and supply voltage and ground signals required to perform power simulation.
To fix this, a series of scripts were run on the generated Verilog file. Before running any scripts,
however, the current Verilog file needed to be modified. The comments created by Leonardo
were removed at the beginning, and any additional module definitions other than the multiplier
design were deleted. The file could now be read properly by the scripts. The first script, fan.py,
inserted the fanout. The second script, buffer.py added in the necessary buffering. The final
script, AddPowGndFlatten.py, added in the supply voltage and ground signals.

3.3 Cadence Spectre
Utilizing Cadence Spectre was the final step in simulating the circuits. Cadence was used
to perfonn the numerous simulations of the circuits with altered supply voltages. The power was
measured from these simulations, and the results were compiled.

3.3.1 Importing Verilog

The Verilog netlist, found in Appendix B, was easily imported using the "Import
Verilog" feature in Cadence. The Target Library Name was NCL_lvt_Li_zhen_brett, a low
12

threshold voltage library copied so modifications could be done without affecting any other
designs or cluttering up a commonly used library. Under Schematic Generation Options, Full
Place and Route was disabled. Doing this reduced the amount of wires on the schematic; all of
the component input and output pins were tied together using net names. Both the single
multiplier and the NCR design were imported. Once imported, the specific portions of the NCR
design were given their own voltage sources whose values could easily be modified by changing
parameters. The parameter names assigned to the multiple supply voltages were as follows:
V global for the demultiplexer, sequencer #1, and completion detection circuitry; V1ocal for the
circuit #1 and circuit #2; Vmux for the mux; and Vsel for the sequencer #2.

3.3.2 Generating Controller
In order to simulate the design, a control circuit that would generate the input patterns
was necessary. This design was contributed by Liang Zhou, who had a controller written in
VerilogA for another circuit that he had worked on previously. This controller was imported into
Cadence using the method mentioned in Section 3.3.1. The generated symbol was put into a
schematic along with the multiplier design. Initially, a single input vector of all 1's was included
to verify the circuit functionality after being imported into Cadence. Once all of the designs
performed as expected, the controller's output vector was modified to generate a random set of
inputs using the VerilogA random() function. A parameter was included within the pan:nlhesis to
generate the same inputs every time the controller was simulated. The random value was
checked to see if it was even or odd, and then the controller assigned the dual-rail signals to be
either a 1 or 0, respectively.

13

3.3.3 Power Simulations
To simulate the designs, the Analog Design Environment within Cadence was used.
UltraSim was chosen as the simulator to produce fairly accurate results quickly. A transient
analysis was performed from Ons to 150ns. To easily modify the supply voltages, the parameters
mentioned in Section 3.3.1 were incorporated and modified as the simulation required. The
supply voltage currents, reset, K;, K 0 , and input and output signals were set to be plotted and
saved. Once these settings were correct, initial simulations were run for the single multiplier and
NCR designs.
After the first simulation of each design was finished, the output plots were checked to
safeguard that the design was properly imported. The controller was then modified as described
in Section 3.3.2 to generate a random input. The designs were then re-simulated using a range of
supply voltages with the new input patterns, and the outputs were plotted.
For each simulation using the NCR design, the Vtocab Vmux, and Vsel were reduced in
certain sets. The first voltage reduced was Vtocat because circuit #1 and circuit #2 were larger
than the multiplexor or sequence generator #2. Reducing only Vtocat would reduce the overall
power the most effectively. The next voltage reduced was V mux because the multiplexer was
larger than sequence generator #2. The last voltage reduced was Vset because the output select
was the smallest out of the three components that the reduced voltage could be applied to. These
were reduced until the period and power of the NCR design were lower than that of the single
multiplier design, if possible, with a smallest resolution of 10mV.
On the simulation plots, it was noted that the outputs took a short amount of time before
they began appearing. This delay occurred because the pipeline took a small amount of time to
fill up before the correct output could be observed. In order to calculate the period of the circuit,

14

the period of the main Ko was averaged between the 1oth rising edge and the 20th rising edge.
Taking the average in this manner ensured that the circuit had reached a steady state. Similarly,
the currents of all the voltage supplies were integrated from 50ns to 150ns to determine the
energy used by the circuit. From this data, the energy per operation was calculated using the
following equation:
Energy

_

't'n [
rlSOns
]
L.x=1 Vx• Jsons lxdt

Operation -

lOons

-T-

The results for the single multiplier design and the NCR design are shown below in Table
1.
Energy Calculation and Delay
Period
(ns)

Energy I Op

4.18

6.10

7.19

3.33

8.71

1.20

6.33

3.76

7.61

4.11

1.20

6.21

3.78

7.56

1.10

4.08

1.10

5.48

3.80

7.51

95.0

1.20

4.30

1.20

5.46

4.36

6.72

1.00

94.7

1.00

3.18

1.20

5.39

4.39

6.67

1.00

92.8

1.00

3.13

1.00

4.23

4.46

6.56

Design

Vglobal
(V)

Iglobal

Single
Multiplier

1.20

121.5

1.20

52.38

1.20

153.1

1.20

5.14

1.20

1.20

45.60

1.10

122.4

1.20

4.62

1.20

45.39

1.10

121.6

1.10

1.20

45.48

1.10

120.4

1.20

39.49

1.00

1.20

39.52

1.20

39.26

NCR Design

(~)

Vlocal
(V)

Ilocal
(~)

Vmux
(V)

lmux
(~)

Vsel
(V)

lsel
(~)

(~)

Table 1. One-multiplier design results.
Unfortunately, there was no possibility of reducing the supply voltages of the NCR
design so that the period and power would be less than the single multiplier circuit. The
reduction of power with a comparable delay was closest when V 1ocai was reduced to 1.00V and
everything else remained at 1.2V. The period and power could not be reduced where both would
be better than the single multiplier because circuit # 1 and circuit #2, the multiplier copies, were
fairly small; hence, the overhead of the added DEMUX, MUX, and Sequencers outweighed the
power savings. If the circuit that was duplicated was larger, a larger power reduction would be
15

seen when compared to the period increase, potentially allowing the circuit to be lower power
and faster.

3.3.4 Two-Multiplier Circuit Modification

To produce a circuit with a lower power and period would reqmre enlarging the
duplicated circuit so that reducing the supply voltage would lessen power by a larger factor. Two
additional circuits were designed and simulated to test this hypothesis. Although these circuits
used the same multiplier, there were more copies of the multiplier which formed the larger
circuit.
The first additional circuit that was designed and simulated was two of the multipliers in
series, as shown in Figure 4. The new circuit no longer performed the same function as the
original circuit, but it served as a simple example of a larger circuit. The two-multiplier circuit
would now take the place of circuit #1 and circuit #2 in the NCR architecture. The 8-bit output
was split apart and sent to the two 4-bit inputs of the multiplier. In order to properly simulate the
design, the entire process from simulations in ModelSim to Verilog Synthesis to Cadence
simulation needed to be performed. The multiplier VHDL file was copied and altered to contain
two multipliers, tying the output of one to the input of another as shown in Appendix C.l. The
incorrect signal in the testbench was also modified to produce the correct output of the
multipliers strung together. The two-multiplier circuit and the new NCR design were simulated,
and the incorrect signal remained low, indicating a functional circuit.

Multiplier

; (3 •• D)

4x4 Dual-Rail
Multiplier

L..-_ _ __Ki
_ j! f - - - - - - t K o

Figure 4. Two-multiplier design.
16

Ki l f - - - -

The VHDL files were synthesized into Verilog netlists and imported into Cadence as
mentioned in Section 3.3.1. The schematics had to be modified to include the multiple supply
voltages, and the controller had to be included into the schematics as well. An all-1 's input
vector was simulated in UltraSim to ensure that the outputs of the imported two-multiplier circuit
and two-multiplier NCR design were correct, and the results matched the expected values. The
controller's outputs were modified again to match the same random input values of the multiplier
circuits as before. A series of simulations was performed, as in Section 3.3.3, and the results are
shown below in Table 2.
Energy Calculation and Delay
Period
(ns)

Energy I Op

4.22

12.21

6.69

3.39

14.87

1.20

6.03

3.83

12.77

4.06

1.20

6.03

3.84

12.75

1.10

4.01

1.10

5.23

3.86

12.65

212.4

1.05

3.50

1.05

4.54

4.16

11.70

1.04

210.0

1.20

4.31

1.20

5.42

4.16

11.64

41.20

1.04

208.2

1.04

3.44

1.20

5.44

4.19

11.56

1.20

41.04

1.04

205.3

1.04

3.43

1.04

4.41

4.23

11.46

1.20

41.26

1.03

203.1

1.20

4.27

1.20

5.36

4.23

11.43

1.20

41.03

1.03

202.6

1.03

3.33

1.20

5.29

4.25

11.39

1.20

39.80

1.03

200.9

1.03

3.33

1.03

4.37

4.30

11.30

1.20

38.85

1.00

187.1

1.20

4.23

1.20

5.16

4.43

10.86

1.20

39.10

1.00

185.2

1.00

3.11

1.20

5.15

4.46

10.77

1.20

38.91

1.00

181.7

1.00

3.07

1.00

3.89

4.52

10.65

Design

Vglobal
(V)

Iglobal

Single
Multiplier

1.20

241.0

1.20

51.20

1.20

303.0

1.20

5.11

1.20

1.20

45.44

1.10

242.4

1.20

4.58

1.20

45.59

1.10

241.7

1.10

1.20

45.38

1.10

238.9

1.20

41.47

1.05

1.20

41.22

1.20

NCR Design

(llA)

Vlocal
(V)

Ilocal

(llA)

Vmux
(V)

lmux
(llA)

Vsel
(V)

lsel
(llA)

(~)

Table 2. Two-multiplier design results.
Using the two-multiplier NCR design, it was possible to achieve lower power and a
smaller period. The supply voltage parameter settings that accomplished this have been made
bold in Table 2. Using a 1.04V Vtocat and Vmux while maintaining the 1.2V supply on all other
17

circuit elements decreased the period by 0.03ns and the energy per operation by 0.65f.lJ. The
decreases correspond to a performance increase of approximately 0.7% and an energy decrease
of approximately 5.3%. Although these results were positive, the result was not as beneficial as
desired, so another circuit was designed.

3.3.5 Four-Multiplier Circuit Modification
To further show that supply voltage can have a large impact on power consumption when
using the NCR design, an even larger third circuit was designed. This circuit simply strung
together four of the multipliers, as shown in Figure 5. As with the first modification, the VHDL
files had to be edited to account for the additional multiplier circuits, as shown in Appendix C.2.
The new four-multiplier circuit and the four-multiplier NCR design were simulated to ensure
functionality. The incorrect signal stayed low during the simulation, so the circuit performed as
expected.

Multiplier

4x4 Dual-Rail
Multiplier

Multiplier
Ki

Ko

J7..0J
l(j

Figure 5. Four-multiplier design.
Once again, the VHDL files were synthesized and imported into Cadence. An all-1 's
input vector was simulated using the controller, and the results of the four-multiplier circuit and
NCR design matched the expected values. Simulations using the same random inputs were
performed, and the results are shown below in Table 3.
Simulating the four-multiplier NCR design further showed the benefits of reducing
supply voltages in terms of power. It was possible for the NCR design to consume far less power
and maintain performance. The supply voltage parameter settings that accomplished this have
18

been made bold in Table 3. Using a l.OOV V1oca1 and V mux while maintaining the 1.2V supply on
all other circuit elements decreased the period by O.Olns and the energy per operation by

6.02~-tJ.

The decreases correspond to a performance increase of approximately 0.02% and an energy
decrease of approximately 24.8%. Savings such as this could greatly benefit situations where
circuits require lower power to operate. It was observed that the lower MUX supply voltage
produced the same lower voltage at the output compared to the original design.

Power Calculation and Delay
Period
(ns)

Energy I Op

4.25

24.25

6.69

3.01

24.23

1.20

6.04

3.60

21.69

4.12

1.20

6.03

3.61

21.65

1.10

4.10

1.10

5.27

3.64

21.60

378.1

1.01

3.18

1.01

4.10

4.24

18.51

1.00

374.8

1.20

4.26

1.20

5.16

4.21

18.29

39.48

1.00

372.8

1.00

3.13

1.20

5.17

4.24

18.23

1.20

38.94

1.00

366.0

1.00

3.11

1.00

3.93

4.31

18.11

1.20

39.32

0.99

363.7

1.20

4.32

1.20

5.05

4.29

17.94

1.20

38.99

0.99

360.9

0.99

3.05

1.20

4.95

4.32

17.85

Design

Vglobal
(V)

Iglobal

Single
Multiplier

1.20

475.1

1.20

51.65

1.20

606.4

1.20

5.17

1.20

1.20

46.12

1.10

485.5

1.20

4.63

1.20

45.76

1.10

483.8

1.10

1.20

45.52

1.10

479.8

1.20

39.80

1.01

1.20

40.19

1.20

NCR Design

(J,!A)

Vlocal
(V)

Ilocal

(llA)

Vmux
(V)

lmux
(J.!A)

Vsel
(V)

Table 3. Four-multiplier design results.

19

lsel
(llA)

(ILl)

4. CONCLUSIONS
Although it was impossible to reduce the power and maintain the performance of the
initial one-multiplier NCR design, it was possible to greatly reduce the power while maintaining
performance of additional circuits by scaling the supply voltage. This power reduction was
demonstrated by enlarging the duplicated circuit. By stringing together two-multiplier and fourmultiplier NCR designs and performing transistor-level simulations in Cadence to calculate
power, it was clearly seen that the power reduction possible greatly increases as the duplicated
circuit size increases. The decrease in power consumption occurred because the lesser supply
voltage was distributed over a larger portion of the entire NCR design. The preferred design has
a reduced supply voltage connected to only the duplicated circuit; this connection will ensure
that the outputs are at the nominal supply voltage level and are therefore equivalent to the
original design.
The technique of applying the NCR architecture to a circuit and then reducing the supply
voltage to the duplicated circuits could be extremely useful in reducing the power of large
circuits. As circuit size increases, the benefits of this technique increase rapidly. The supply
voltage levels and components thereby supplied can be fine-tuned to produce a circuit with the
exact same performance as the individual circuit with far less power usage.
This method of lowering power could tremendously increase the benefits seen by circuits
designed at the University of Arkansas especially if power consumption is the primary concern.
Since the technique is easy to implement in asynchronous designs, it could also be applied to any
previously designed circuits to reduce power provided that the circuit is large enough to benefit.
For future work, this technique could be applied to different-sized circuits more
extensively to determine the exact benefits of size. Other methods of reducing power could also

20

be investigated in parallel. These include altering the threshold voltages of the transistors,
applying the global supply voltage to the critical path of the duplicated circuit while further
lowering the local supply voltage, or transistor reordering.

21

REFERENCES

[1] S.C. Smith, "Speedup ofNULL convention digital circuits using NULL cycle reduction,"

Journal ofSystems Architecture, vol. 52, pp. 411-422, 2006.
[2] C.L. Seitz, System timing.: Addison-Wesley, 1980.
[3] A.J.

Martin,

"Programming

in

VLSI,"

in

Development

in

Concurrency

and

Communication.: Addison-Wesley, 1990, pp. 1-64.
[4] D.E. Muller, "Asynchronous logics and application to information processing," in Switching

Theory in Space Technology.: Stanford University Press, 1963, pp. 289-297.
[5] K.M. Fant and S.A. Brandt, "NULL convention logic: a complete and consistent logic for
asynchronous digital circuit synthesis," in International Conference on Application Specific

Systems, Architectures, and Processors, 1996, pp. 261-273.
[6] T. Verhoff, "Delay-insensitive codes-an overview," Distributed Computing, vol. 3, pp. 18, 1988.
[7] I. David, R. Ginosaur, and M. Yoeli, "An efficient implementation of boolean functions as
self-timed circuits," IEEE Transactions on Computers, vol. 41, no. 1, pp. 2-10, 1996.
[8] G.E. Sobelman and K.M. Fant, "CMOS circuit design of threshold gates with hysteresis," in

IEEE International Symposium on Circuits and Systems, vol. II, 1998, pp. 61-65.
[9] S.C. Smith, "Completion-completeness for NULL convention digital circuits utilizing the
bit-wise completion strategy," in The 2003 International Conference on VLSI, 2003, pp.
143-149.
[10] A. Kondratyev, L. Neukom, 0. Roig, A. Taubin, and K. Fant, "Checking delay-insensitivity:

22

104 gates and beyond," in Eighth International Symposium on Asynchronous Circuits and
Systems, 2002, pp. 137-145.
[11] S.C. Smith, R.F. DeMara, J.S. Yuan, D. Ferguson, and D. Lamb, "Optimization ofNULL
convention self-timed circuits," Integration, The VLSI Journal, vol. 37, no. 3, pp. 135-165,
2004.
[12] S.C. Smith, R.F. DeMara, M. Hagedorn, and D. Ferguson, "Delay-insensitive gate-level
pipe lining," Integration, The VLSI Journal, vol. 30, no. 2, pp. 103-131, 2001.
[13] Scott

C.

Smith.

(2011,

March)

Dr.

Scott

http://comp. uark.edul~smithsco/CCLI async.html

23

C.

Smith:

Projects.

[Online].

A. SINGLE MULTIPLIER VHDL FILES

A.l demux.vhd
library IEEE;
use ieee.std_logic_1164.all;
use work.ncl_signals.all;
entity demux is
port (a: IN dual_rail_logic;
rst, kil, ki2, sl, s2: IN std_logic;
zl, z2: OUT dual_rail_logic;
ko: OUT std_logic);
end demux;
architecture arch of demux is
signal tl, t2: dual_rail_logic;
component th33nx0
port(a: in std_logic;
b: in std_logic;
c: in std_logic;
rst: in std_logic;
z: out std_logic);
end component;
componentth14bx0
port(a: in std_logic;
b: in std_logic;
c: in std_logic;
d: in std_logic;
zb: out std_logic);
end component;
begin
ill: th33nx0
port map(a.raill, sl, kil, rst, tl.raill);
ilO: th33nx0
port map(a.railO, sl, kil, rst, tl.railO);
zl <= tl;
i21: th33nx0
port map(a.raill, s2, ki2, rst, t2.raill );
24

i20: th33nx0
port map(a.railO, s2, ki2, rst, t2.rail0);
z2 <= t2;
kO: th14bx0
port map(tl.rail1, tl.railO, t2.raill, t2.rail0, ko);
ko.railO <= '0';
end arch;
A.2 demux_gen.vhd
library IEEE;
use ieee.std_logic_1164.all;
use work.ncl_signals.all;
entity dmux is
generic(width: in integer:= 1);
port(a: IN dual_rail_logic_vector(width-1 downto 0);
rst, ki 1, ki2, s 1, s2: IN std_logic;
z1, z2: OUT dual_rail_logic_vector(width-1 downto 0);
ko: OUT std_logic_vector(width-1 downto 0));
end dmux;
architecture arch of dmux is
component demux
port (a: IN dual_rail_logic;
rst, ki1, ki2, s1, s2: IN std_logic;
z1, z2: OUT dual_rail_logic;
ko: OUT std_logic);
end component;
begin
struct: for i in a'range generate
comp: demux
port map(a(i), rst, ki1, ki2, s1, s2, z1(i), z2(i), ko(i));
end generate struct;
end arch;
A.3 mult4x4_1stage.vhd
library ieee;
use ieee.std_logic_1164.all;
25

use work.ncl_signals.all;
use work.dual_rail.all;
entity mult4x4_1n is
port(x, y: in dual_rail_logic_VECTOR(3 downto 0);
ki, reset: in std_logic;
s: out dual_rail_logic_VECTOR(? downto 0);
ko: out std_logic);
end;
architecture BEHAVIOR ofmult4x4 ln is
signal ppl, pp2, pp3, pp4, pp5, pp6, pp7, pp8, pp9: dual_rail_logic;
signal pplO, ppll, pp12, pp13, pp14, pp15: dual_rail_logic;
signal cl_l, c1_2, cl_3, cl_4, c1_5: dual_rail_logic;
signal sl_O, sl_l, sl_2, s1_3, sl_4, s1_5: dual_rail_logic;
signal c2_3, c2_4, c2_5, c2_6, c2_7: dual_rail_logic;
signal s2_2, s2_3, s2_4, s2_5, s2_6, s3_3: dual_rail_logic;
signal c3_4, s3_4, c3_5: dual_rail_logic;
signal x_o, y_o: dual_rail_logic_VECTOR(3 downto 0);
signal tempO, tempO_in, tempO_out: dual_rail_logic_VECTOR(? downto 0);
signal ki_O, ko_O: std_logic_VECTOR(? downto 0);
signal templ, templ_in, templ_out: dual_rail_logic_VECTOR(15 downto 0);
signal ki_l, ko_l: std_logic_VECTOR( IS downto 0);
·
signal temp2, temp2_in, temp2_out: dual_rail_logic_VECTOR(12 downto 0);
signal ki_2, ko_2: std_logic_VECTOR(12 downto 0);
signal temp3, temp3_in, temp3_out: dual_rail_logic_VECTOR(ll downto 0);
signal ki_3, ko_3: std_logic_VECTOR(ll downto 0);
signal temp4, temp4_in, temp4_out: dual_rail_logic_VECTOR(ll downto 0);
signal ki_4, ko_4: std_logic_VECTOR(ll downto 0);
signal tempS, temp5_in, temp5_out: dual_rail_logic_VECTOR(lO downto 0);
signal ki_5, ko_5: std_logic_VECTOR(IO downto 0);
signal temp6, temp6_in, temp6_out: dual_rail_logic_VECTOR(9 downto 0);
signal ki_6, ko_6: std_logic_VECTOR(9 downto 0);
signal temp7, temp7_in: dual_rail_logic_VECTOR(? downto 0);
signal ki_7, ko_7: std_logic_VECTOR(? downto 0);
signal kiO, kil, ki2, ki3, ki4, ki5, ki6: std_logic;
signal pp15_o, ppl4_o, pp13_o, pp12_o, ppll_o, pplO_o: dual_rail_logic;
signal pp9_o, pp8_o, pp7_o, pp6_o, pp5_o, pp4_o, pp3_o, pp2_o, ppl_o, sl_O_o:
dual_rail_logic;
signalpp15o,c1_5o,s1_5o,cl_4o,sl_4o,cl_3o,ppl2o, sl_3o,cl_2o,sl_2o,cl_lo,
sl_lo, sl_Oo: dual_rail_logic;
signalc2_7o, s2_6o, c2_6o, s2_5o, c2_5o, s2_4o,c2_4o,c2_3o, s2_3o, s2_2o, s2_1o,
s2_ Oo: dual_rail_logic;
signal c2_7_o, s2_6_o, c2_6_o, s2_5_o, c2_5_o, s2_4_o, c2_4_o, c3_4_o, s3_3_o,
s2_2_o, s2_1_o, s2_0_o: dual_rail_logic;

26

signalc2_7_o1, s2_6_ol,c2_6_o1, s2_5_ol,c2_5_o1,c3_5o, s4_4, s4_3, s4_2, s4_1,
s4_0: dual_rail_logic;
signal c2_7_o2, s2_6_o2, c2_6_o2, c4_6o, s5_5, s5_4, s5_3, s5_2, s5_1, s5 0:
dual_rail_logic;
signal s4_7, c4_7, s4_6, c4_6, s4_5: dual_rail_logic;
component full_add
port(c_in, x, y: in dual_rail_logic;
c_out, s: out dual_rail_logic);
end component;
component half_add
port(x, y: in dual_rail_logic;
c_out, s: out dual_rail_logic);
end component;
component ncl_register
generic(width: in integer;
initial_value: in integer);
port(data_in: in dual_rail_logic_VECTOR(width- 1 downto 0);
ki: in std_logic_VECTOR(width- 1 downto 0);
rst: in std_logic;
data_out: out dual_rail_logic_VECTOR(width- 1 downto 0);
ko: out std_logic_VECTOR(width- 1 downto 0));
end component;
componentcomp8a
port(a: in std_logic_VECTOR(7 downto 0);
z: out std_logic);
end component;
component and2
port(a, b: in dual_rail_logic;
z: out dual_rail_logic);
end component;
component and2i
port(a, b: in dual_rail_logic;
z: out dual_rail_logic);
end component;
component gens7
port(c, x, y, z: in dual_rail_logic;
s: out dual_rail_logic);
end component;

27

begin
tempO_in <= x & y;
COMPO: comp8a
port map(ko_O, ko);
REGO: ncl_register
generic map(8, 2)
port map(tempO_in, ki_O, reset, tempO_out, ko_O);
ki_0(7) <= kiO;
ki_0(6) <= kiO;
ki_0(5) <= kiO;
ki 0(4)<=ki0;
ki_0(3) <= kiO;
ki_0(2) <= kiO;
ki_O(l) <= kiO;
ki_0(0) <= kiO;
x_o <= tempO_out(7 downto 4);
y_ o <= tempO_out(3 downto 0);

GEN SO: and2
port map(y_o(O), x_o(O), sl_O);
GEN PP 1: and2i
port map(y_o(O), x_o(l), ppl);
GEN PP2: and2i
port map(y_ o(O), x_ o(2), pp2);
GEN PP3: and2i
port map(y_o(O), x_o(3), pp3);
GEN PP4: and2i
port map(y_o(l), x_o(O), pp4);
GEN PP5: and2
port map(y_o(l), x_o(l), pp5);
GEN PP6: and2i
port map(y_o(l), x_o(2), pp6);
GEN PP7: and2i
port map(y_o(l), x_o(3), pp7);
GEN PP8: and2i
28

port map(y_ o(2), x_ o(O), pp8);
GEN PP9: and2i
port map(y_ o(2), x_ o(l ), pp9);
GEN PP10: and2
port map(y_ o(2), x_ o(2), pp 10);
GEN PPll: and2i
port map(y_ o(2), x_ o(3 ), pp 11 );
GEN PP12: and2i
port map(y_o(3), x_o(O), pp12);
GEN PP13: and2i
port map(y_o(3), x_o(l), pp13);
GEN PP14: and2i
port map(y_o(3), x_o(2), pp14);
GEN PP15: and2
port map(y_ o(3 ), x_ o(3 ), pp 15);

temp1_out <= pp15 & pp14 & pp13 & pp12 & pp11 & pp10 & pp9 & pp8 & pp7 &
pp6 & pp5 & pp4 & pp3 & pp2 & pp1 & s1_0;

pp15_o <= temp1_out(15);
pp14_o <= temp1_out(14);
pp13_o <= temp1_out(13);
pp12_o <= temp1_out(12);
pp11_o <= temp1_out(11);
pp10_o <= temp1_out(10);
pp9 _ o <= temp 1_out(9);
pp8_ o <= temp 1_out(8);
pp7_ o <= temp 1_out(7);
pp6_o <= temp1_out(6);
pp5 _ o <= temp 1_out( 5);
pp4_o <= temp1_out(4);
pp3_o <= temp1_out(3);
pp2_ o <= temp 1_out(2);
pp1_o <= temp1_out(1);
s1_0_o <= temp1_out(O);

29

HAl - 1: half- add
port map(ppl_o, pp4_o, cl_l, sl_l);
FAl - 2: full- add
port map(pp2_ o, pp5 _ o, pp8 _o, c 1_2, s 1_2);
FAl - 3: full - add
port map(pp3_o, pp6_o, pp9_o, cl_3, sl_3);
FAl - 4: full- add
port map(pp7_o, pplO_o, ppl3_o, cl_4, sl_4);
HAl - 5: half- add
port map(ppll_o, ppl4_o, cl_5, sl_5);
temp2_out <= ppl5_o & cl_5 & sl_5 & cl_4 & sl_4 & cl_3 & ppl2_o & sl_3 &
cl_2 & sl_2 & cl_l & sl_l & sl_O_o;

ppl5o <= temp2_out(12);
cl_5o <= temp2_out(ll);
sl_5o <= temp2_out(10);
cl_4o <= temp2_out(9);
sl_4o <= temp2_out(8);
cl_3o <= temp2_out(7);
ppl2o <= temp2_out(6);
sl_3o <= temp2_out(5);
cl_2o <= temp2_out(4);
sl_2o <= temp2_out(3);
cl_lo <= temp2_out(2);
sl_lo <= temp2_out(l);
sl_Oo <= temp2_out(O);

HA2- 2: half- add
port map(cl_lo, sl_2o, c2_3, s2_2);
FA2- 3: full - add
port map(pp12o, cl_2o, sl_3o, c2_4, s2_3);
HA2- 4: half- add
port map(c1_3o, sl_4o, c2_5, s2_4);
HA2- 5: half- add
port map(cl_4o, sl_5o, c2_6, s2_5);

30

HA2- 6: half- add
port map(pp15o, c1_5o, c2_7, s2_6);

~~~~~&~&~&~&~&~&~&~&~&

s2_2 & sl_lo & sl_Oo;
c2_7o <= temp3 _out(ll);
s2_6o <= temp3_out(10);
c2_ 6o <= temp3 _ out(9);
s2_5o <= temp3_out(8);
c2_5o <= temp3 _ out(7);
s2_4o <= temp3_out(6);
c2_4o <= temp3_out(5);
c2_3o <= temp3_out(4);
s2_3o <= temp3_out(3);
s2_2o <= temp3_out(2);
s2_1o <= temp3_out(l);
s2_0o <= temp3_out(O);

HA3 - 3: half- add
port map(c2_3o, s2_3o, c3_4, s3_3);
temp4_out <= c2_7o & s2_6o
s3_3 & s2_2o & s2_1o & s2_0o;

& c2_6o & s2_5o & c2_5o & s2_4o & c2_4o & c3_4 &

c2_7_o <= temp4_out(ll);
s2_6_o <= temp4_out(10);
c2_6_o <= temp4_out(9);
s2_5_o <= temp4_out(8);
c2_ 5_ o <= temp4_ out(7);
s2_4_o <= temp4_out(6);
c2_4_o <= temp4_out(5);
c3_4_o <= temp4_out(4);
s3_3_o <= temp4_out(3);
s2_2_o <= temp4_out(2);
s2_1_o <= temp4_out(l);
s2_0_o <= temp4_out(O);

FA4- 4: full- add
port map(s2_4_o, c2_4_o, c3_4_o, c3_5, s3_4);

31

tempS_out <= c2_7_o & s2_6_o & c2_6_o & s2_S_o & c2_S_o & c3_S & s3_4 &
s3_3_o & s2_2_o & s2_1_o & s2_0_o;

c2_7_o1 <= tempS_out(10);
s2_ 6_ o 1 <= tempS_out(9);
c2_ 6_ o 1 <= tempS_out(8);
s2_ S_ o 1 <= tempS_out(7);
c2_5_o1 <= tempS_out(6);
c3_So <= tempS_out(S);
s4_4 <= tempS_out(4);
s4_3 <= tempS_out(3);
s4_2 <= tempS_out(2);
s4_1 <= tempS_out(1);
s4_0 <= tempS_out(O);

FA4- S: full - add

port map(c3_So, c2_S_o1, s2_S_o1, c4_6, s4_S);

temp6_out <= c2_7_o1 & s2_6_o1 & c2_6_o1 & c4_6 & s4_S & s4_4 & s4_3 & s4_2
& s4_1 & s4_0;

c2_7_ o2 <= temp6_ out(9);
s2_ 6_ o2 <= temp6_ out(8);
c2_ 6_ o2 <= temp6_ out(7);
c4_6o <= temp6_out(6);
sS_S <= temp6_out(S);
sS _ 4 <= temp6_out(4);
sS _3 <= temp6_out(3);
sS _ 2 <= temp6_ out(2);
sS_1 <= temp6_out(1);
sS _ 0 <= temp6_ out(O);

FA4- 6: full - add

port map(c4_6o, c2_6_o2, s2_6_o2, open, s4_6);

g4_1_S7: gens7
port map(c2_7_o2, s2_6_o2, c2_6_o2, c4_6o, s4_7);
temp7 _in<= s4_7 & s4_6 & s5_5 & s5_4 & s5_3 & sS_2 & s5_1 & sS_O;

32

COMP7: comp8a
port map(ko_7, kiO);
REG7: ncl_register
generic map(8, 2)
port map(temp7_in, ki_7, reset, s, ko_7);
ki_7(7) <= ki;
ki_7(6) <= ki;
ki_7(5) <= ki;
ki_7(4) <= ki;
ki_7(3) <= ki;
ki_7(2) <= ki;
ki_7(1) <= ki;
ki_7(0) <= ki;
end BEHAVIOR;
A.4 mult4x4 rnc.vhd
s: out dual_rail_logic_vector (7 downto 0);
ko: out std_logic);
end mult4x4;
architecture BEHAVIOR of mult4x4 is
signal data_in, di1, di2, dol, do2: dual_rail_logic_vector(7 downto 0);
signal kod: std_logic_vector(7 downto 0);
signal kol, ko2, kil, ki2, kot, sl, s2: std_logic;
component mult4x4_1n
port(x, y: IN dual_rail_logic_vector (3 downto 0);
ki, reset: IN std_logic;
s: OUT dual_rail_logic_vector (7 downto 0);
ko: OUT std_logic);
end component;
component dmux
generic(width: in integer := 1);
port(a: IN dual_rail_logic_vector(width-1 downto 0);
rst, kil, ki2: IN std_logic;
sl, s2: IN std_logic;
zl, z2: OUT dual_rail_logic_vector(width-1 downto 0);
ko: OUT std_logic_vector(width-1 downto 0));
end component;
component mux
generic(width: in integer := 1);
33

port(a1, a2: IN dual_rail_logic_vector(width-1 downto 0);
z: OUT dual_rail_logic_vector(width-1 downto 0));
end component;
componentcomp8a
port(a: IN std_logic_vector(7 downto 0);
z: OUT std_logic);
end component;
component selct
port (ki, rst: IN std_logic;
s1, s2: OUT std_logic);
end component;
begin
data_in <= x & y;
DEMUX INPUT: dmux
generic map(8)
port map(data_in, reset, ko1, ko2, s1, s2, dii, di2, kod);
COMP: comp8a
port map(kod, kot);
ko <= kot;
SELECT INPUT: selct
port map(kot, reset, s I, s2);
COMBI: mult4x4 In
port map(di I (7 downto 4 ), di I (3 downto 0), ki 1, reset, do I, ko 1);
COMB2: mult4x4 In
port map(di2(7 downto 4), di2(3 downto 0), ki2, reset, do2, ko2);
MUX OUTPUT: mux
generic map(8)
port map(doi, do2, s);
SELECT OUTPUT: selct
port map(ki, reset, kii, ki2);
kil.railO <= '0';
ki2.rail0 <= '0';
end BEHAVIOR;

34

A.S mux_gen.vhd

library ieee;
use ieee.std_logic_1164.all;
use work.ncl_signals.all;
entity mux is
generic(width: in integer:= 1);
port(a1, a2: IN dual_rail_logic_vector(width-1 downto 0);
z: OUT dual_rail_logic_vector(width-1 downto 0));
endmux;
architecture arch of mux is
componentth12x0
port (a: IN std_logic;
b: IN std_logic;
z: OUT std_logic);
end component;
begin
struct: fori in al 'range generate
compO: th12x0
port map(a1 (i).railO, a2(i).rail0, z(i).railO);
comp1: th12x0
port map(al(i).raill, a2(i).raill, z(i).raill);
end generate struct;
end arch;
A.6 select.vhd
library ieee;
use ieee.std_logic_1164.all;
entity selct is
port (ki, rst: IN std_logic;
s1, s2: OUT std_logic);
end selct;
architecture arch of selct is
signal dO, dl, d2, d3, rO, rl, r2, r3: std_logic;
componentth33nx0
35

port(a: in std_logic;
b: in std_logic;
c: in std_logic;
rst: in std_logic;
z: out std_logic);
end component;
componentth33dx0
port(a: in std_logic;
b: in std_logic;
c: in std_logic;
rst: in std_logic;
z: out std_logic);
end component;
component invxO
port(i: in std_logic;
zb: out std_logic);
end component;
begin
gO: th33nx0
port map(ki, d3, r1, rst, dO);
gl: th33dx0
port map(ki, dO, r2, rst, dl);
g2: th33nx0
port map(ki, dl, r3, rst, d2);
g3: th33nx0
port map(ki, d2, rO, rst, d3);
iO: invxO
port map(dO, rO);
il: invxO
port map(dl, rl);
i2: invxO
port map(d2, r2);
i3: invxO
port map(d3, r3);
sl <= d2;
36

s2 <=dO;
end arch;
A.7 tb- mult4x4- full.vhd
Library IEEE;
use IEEE.std_logic_1164.all;
use IEEE.std_logic_unsigned. all;
use work.ncl_signals.all;
use work.dual_rail.all;
use work.functions.all;
use std.textio.all;
entity TB_MULT4x4 is
end;

architecture TESTBENCH ofTB MULT4x4 is
signal x, y, x_temp, x_next, y_temp, y_next: DUAL_RAIL_LOGIC_VECTOR(3
downto 0);
signals: DUAL_RAIL_LOGIC_VECTOR(7 downto 0);
signal xy_calc: std_logic_vector(7 downto 0) := "00000000";
signal ki, ko, reset: STD_LOGIC;
signal ex, cy: DUAL_RAIL_LOGIC_VECTOR(3 downto 2);
signal incorrect: std_logic := '0';
type output_array is array(O to 256) of std_logic_vector(7 downto 0);
signal s_calc_array: output_array;
component mult4x4 --_ln
port(x, y: in DUAL_RAIL_LOGIC_VECTOR(3 downto 0);
ki, reset: in STD_LOGIC;
s: out DUAL- RAIL- LOGIC- VECTOR(7 downto 0);
ko: out STD LOGIC);
end component;
begin
UUT: MULT4x4 -- ln
port map(x, y, ki, reset, s, ko);
CALC_ANSWER: process
begin
for i in 0 to 256 loop
s_calc_array(i) <= xy_calc(7 downto 4)
xy_calc <= xy_calc + '1 ';
37

* xy_calc(3 downto 0);

wait for 0 ns;
end loop;
wait;
end process;

INPUTS: process
begin
--reset<= '0';
reset<= '1';
wait until ko'event and ko = '1';
reset<= '0';
x(O).railO <= '1';
x(O).raill <= '0';
x(l).railO <= '1';
x(1).raill <= '0';
x(2).rail0 <= '1';
x(2).raill <= '0';
x(3).rail0 <= '1';
x(3).raill <= '0';
y(O).railO <= '1';
y(O).raill <= '0';
y(1).rail0 <= '1';
y(1).raill <= '0';
y(2).rail0 <= '1';
y(2).raill <= '0';
y(3).rail0 <= '1';
y(3).raill <= '0';
wait for 0 ns;
while (x(3).raill = '0' or x(2).raill = '0' or x(1).raill = '0' or x(O).raill = '0')
loop
wait until ko'event and ko = '0';
cy(2) <= y(O) and y(l );
cy(3) <= y(O) and y(1) and y(2);
cx(2) <= x(O) and x(1);
cx(3) <= x(O) and x(l) and x(2);
wait for 0 ns;
y_temp <= y;
y_next(O) <= not(y(O));
y_next(l) <= y(l) xor y(O);
y_ next(2) <= cy(2) xor y(2);

38

y_next(3) <= cy(3) xor y(3);
x_temp <= x;
x_next(O) <= not(x(O));
x_next(l) <= x(l) xor x(O);
x_ next(2) <= cx(2) xor x(2);
x_next(3) <= cx(3) xor x(3);
x(O).railO <= 10 1;
x(O).raill <= 101;
x(l).railO <= 10 1;
x(1).raill <= 10 1;
x(2).rail0 <= 10 1;
x(2).rail1 <= 10 1;
x(3).rail0 <= 10 1;
x(3).raill <= 10 1;
y(O).railO <= 10 1;
y(O).raill <= 10 1;
y(1).rail0 <= 101;
y(1).raill <= 10 1;
y(2).rail0 <= 101;
y(2).raill <= 10 1;
y(3).rail0 <= 101;
y(3).raill <= 10 1;
wait until ko1event and ko = 111;
if (y_temp(3).raill = 111 and
1 1
y_temp( 1).raill = 1 and y_ temp(O).raill = 1 11) then
x <= x next·
'
else
x <= x_temp;
end if;
y <= y_next;
wait for 0 ns;

y_temp(2).raill

1

11 and

end loop;
while (x(3).raill = 111 or x(2).raill = 111 or x(l ).raill = 1 11 or x(O).raill = 1 11)
loop
wait until ko 1event and ko = 10 1;
cy(2) <= y(O) and y(1);
cy(3) <= y(O) and y(1) and y(2);
cx(2) <= x(O) and x(l );
cx(3) <= x(O) and x(1) and x(2);
wait for 0 ns;
y_temp <=y;
39

y_next(O)
y_ next(l)
y_next(2)
y_next(3)

<= not(y(O));
<= y(l) xor y(O);
<= cy(2) xor y(2);
<= cy(3) xor y(3);

x_temp <= x;
x_next(O) <= not(x(O));
x_next(l) <= x(l) xor x(O);
x_ next(2) <= cx(2) xor x(2);
x_next(3) <= cx(3) xor x(3);
x(O).railO <= '0';
x(O).raill <= '0';
x(l).railO <= '0';
x(l).raill <= '0';
x(2).rail0 <= '0';
x(2).raill <= '0';
x(3).rail0 <= '0';
x(3).raill <= '0';
y(O).railO <= '0';
y(O).raill <= '0';
y(l ).railO <= '0';
y(l).raill <= '0';
y(2).rail0 <= '0';
y(2).raill <= '0';
y(3).rail0 <= '0';
y(3).raill <= '0';
wait until ko'event and ko = '1 ';
if(y_temp(3).raill = '1' and y_temp(2).raill ='I' and y_temp(l).raill = '1'
and y_temp(O).raill = '1') then
x <= x_next;
else
x <= x_temp;
end if;
y <= y_next;
wait for 0 ns;
end loop;
wait;
end process;
OUTPUTS: process
variable ss: side;
variable bw: width := 6;
variable 1: line;
variable tm: time;
40

variable once: bit := '0';
filet: text is out "/home/bsparkma/Senior_Thesis/ModelSirnlt.txt";
variable j: integer := 0;
begin
ki.railO <= '0';
while (true) loop
ki <= '1';
end TESTBENCH;

configuration CFG_TB_MULT4x4 ofTB_MULT4x4 is
for TESTBENCH
for UUT: MULT4x4 -- In
end for;{Library IEEE;
use IEEE.std_logic_1164.all;
use IEEE.std_logic_unsigned.all;
use work.ncl_signals.all;
use work.dual_rail.all;
use work.functions.all;
use std.textio.all;
entity TB_MULT4x4 is
end;

architecture TESTBENCH ofTB MULT4x4 is
signal x, y, x_temp, x_next, y_temp, y_next: DUAL_RAIL_LOGIC_VECTOR(3
downto 0);
signals: DUAL_RAIL_LOGIC_VECTOR(? downto 0);
signal xy_calc: std_logic_vector(7 downto 0) := "00000000";
signal ki, ko, reset: STD_LOGIC;
signal ex, cy: DUAL_RAIL_LOGIC_VECTOR(3 downto 2);
signal incorrect: std_logic := '0';
type output_array is array(O to 256) of std_logic_vector(? downto 0);
signal s_calc_array: output_array;
component mult4x4 --_In
port(x, y: in DUAL_RAIL_LOGIC_VECTOR(3 downto 0);
ki, reset: in STD_LOGIC;
s: out DUAL_RAIL_LOGIC_VECTOR(? downto 0);
ko: out STD_LOGIC);
end component;
41

begin
UUT: MULT4x4 -- 1n
port map(x, y, ki, reset, s, ko );
CALC_ANSWER: process
begin
fori in 0 to 256loop
s_calc_array(i) <= xy_calc(7 downto 4)
xy_calc <= xy_calc + '1';
wait for 0 ns;
end loop;
wait;
end process;

* xy_calc(3 downto 0);

INPUTS: process
begin
--reset <= '0';
reset<= '1 ';
wait until ko'event and ko = '1 ';
reset<= '0';
x(O).railO <= '1 ';
x(O).rail1 <= '0';
x(l ).railO <= '1 ';
x(l ).rail1 <= '0';
x(2).rail0 <= '1 ';
x(2).raill <= '0';
x(3).rail0 <= '1 ';
x(3).raill <= '0';
y(O).railO <= '1 ';
y(O).rail1 <= '0';
y(1).rai10 <= '1';
y(1 ).raill <= '0';
y(2).rai10 <= '1 ';
y(2).raill <= '0';
y(3).rail0 <= '1 ';
y(3).raill <= '0';
wait for 0 ns;
while (x(3).raill = '0' or x(2).raill = '0' or x(l ).raill = '0' or x(O).raill = '0')
loop
wait until ko'event and ko = '0';

42

cy(2) <= y(O) and y(l);
cy(3) <= y(O) and y(l) and y(2);
cx(2) <= x(O) and x(l);
cx(3) <= x(O) and x(l) and x(2);
wait for 0 ns;
y_temp <=y;
y_next(O) <= not(y(O));
y_next(l) <= y(l) xor y(O);
y_ next(2) <= cy(2) xor y(2);
y_next(3) <= cy(3) xor y(3);
x_temp <= x;
x_next(O) <= not(x(O));
x_next(l) <= x(l) xor x(O);
x_next(2) <= cx(2) xor x(2);
x_next(3) <= cx(3) xor x(3);
x(O).railO <= '0';
x(O).raill <= '0';
x(l).railO <= '0';
x(1 ).raill <= '0';
x(2).rail0 <= '0';
x(2).rail1 <= '0';
x(3).rail0 <= '0';
x(3).rail1 <= '0';
y(O).railO <= '0';
y(O).raill <= '0';
y(1).rail0 <= '0';
y(1 ).raill <= '0';
y(2).rail0 <= '0';
y(2).raill <= '0';
y(3).rail0 <= '0';
y(3).raill <= '0';
wait until ko'event and ko = '1 ';
if (y_temp(3).raill = '1' and
y_temp(1).raill = '1' andy_temp(O).raill = '1 ')then
x <= x_next;
else
x <= x_temp;
end if;
y <= y_next;
wait for 0 ns;
end loop;

43

y_temp(2).raill

=

'1'

and

while (x(3).raill = '1' or x(2).raill = '1' or x(1).raill = '1' or x(O).raill = '1')
loop
wait until ko'event and ko = '0';
cy(2) <= y(O) and y(l );
cy(3) <= y(O) and y(l) and y(2);
cx(2) <= x(O) and x(l );
cx(3) <= x(O) and x(l) and x(2);
wait for 0 ns;
y_temp <= y;
y_ next(O) <= not(y(O) );
y_next(1) <= y(l) xor y(O);
y_next(2) <= cy(2) xor y(2);
y_next(3) <= cy(3) xor y(3);
x_temp <= x;
x_next(O) <= not(x(O));
x_next(l) <= x(1) xor x(O);
x_next(2) <= cx(2) xor x(2);
x_next(3) <= cx(3) xor x(3);
x(O).railO <= '0';
x(O).raill <= '0';
x(l ).railO <= '0';
x(l).raill <= '0';
x(2).rail0 <= '0';
x(2).raill <= '0';
x(3).rail0 <= '0';
x(3).raill <= '0';
y(O).railO <= '0';
y(O).raill <= '0';
y(l).railO <= '0';
y(l).raill <= '0';
y(2).rail0 <= '0';
y(2).raill <= '0';
y(3).rail0 <= '0';
y(3).raill <= '0';
wait until ko'event and ko = '1 ';
if(y_temp(3).raill = '1' and y_temp(2).raill = '1' and y_temp(l).raill = '1'
andy_temp(O).raill = '1 ')then
x <= x next"
'
else
x <=x_temp;
end if;
y <= y_next;

44

wait for 0 ns;
end loop;
wait;
end process;
OUTPUTS: process
variable ss: side;
variable bw: width := 6;
variable 1: line;
variable tm: time;
variable once: bit := '0';
filet: text is out "/home/bsparkma/Senior_Thesis/ModelSim/t.txt";
variable j: integer:= 0;
begin
ki.railO <= '0';
while (true) loop
ki <= '1';
wait until s'event and is_data(s);
for i in 0 to 7 loop
if(s(i).raill /= s_calc_array(j)(i)) then
incorrect<= '1 ';
end if;
end loop;
if (once = '0') then
once:= '1';
else
write(!, now- tm, ss, bw);
writeline(t, 1);
end if;
tm :=now;
ki <= '0';
wait until s'event and is_null(s);
j := j+l;
end loop;
end process;
end TESTBENCH;

configuration CFG_TB_MULT4x4 ofTB_MULT4x4 is
for TESTBENCH
forUUT: MULT4x4 -- In
45

end for;}
end for;
end;
wait until s'event and is_data(s);
for i in 0 to 7 loop
if (s(i).raill /= s_calc_array(j)(i)) then
incorrect<= '1';
end if;
end loop;
if (once = '0') then
once:= '1';
else
write(l, now- tm, ss, bw);
writeline(t, 1);
end if;
tm :=now;
ki <= '0';
wait until s'event and is_null(s);
j := j+l;
end loop;
end process;
end TESTBENCH;

configuration CFG_TB_MULT4x4 ofTB_MULT4x4 is
for TESTBENCH
for UUT: MULT4x4 -- ln
end for; {Library IEEE;
use IEEE.std_logic_1164.all;
use IEEE.std_logic_unsigned. all;
use work.ncl_signals.all;
use work.dual_rail.all;
use work.functions.all;
use std.textio.all;
entity TB_MULT4x4 is
end;

architecture TESTBENCH ofTB MULT4x4 is
signal x, y, x_temp, x_next, y_temp, y_next: DUAL_RAIL_LOGIC_VECTOR(3
downto 0);
46

signals: DUAL_RAIL_LOGIC_VECTOR(? downto 0);
signal xy_calc: std_logic_vector(7 downto 0) := "00000000";
signal ki, ko, reset: STD_LOGIC;
signal ex, cy: DUAL_RAIL_LOGIC_VECTOR(3 downto 2);
signal incorrect: std_logic := '0';
type output_array is array(O to 256) of std_logic_vector(7 downto 0);
signal s_calc_ array: output_array;
component mult4x4 --_In
port(x, y: in DUAL_RAIL_LOGIC_VECTOR(3 downto 0);
ki, reset: in STD_LOGIC;
s: out DUAL_RAIL_LOGIC_VECTOR(7 downto 0);
ko: out STD_LOGIC);
end component;
begin
UUT: MULT4x4 -- In
port map(x, y, ki, reset, s, ko);
CALC_ANSWER: process
begin
for i in 0 to 256 loop
s_calc_array(i) <= xy_calc(7 downto 4)
xy_calc <= xy_calc +'I';
wait for 0 ns;
end loop;
wait;
end process;

INPUTS: process
begin
--reset <= '0';
reset<= '1';
wait until ko'event and ko = '1';
reset<= '0';
x(O).railO <= '1';
x(O).raill <= '0';
x(l).railO <= '1';
x(1).raill <= '0';
x(2).rail0 <= '1';
x(2).raill <= '0';
x(3).rail0 <= '1';
x(3).raill <= '0';
47

* xy_calc(3 downto 0);

y(O).railO <= '1';
y(O).raill <= '0';
y(1).rail0 <= '1';
y(l ).raill <= '0';
y(2).rail0 <= '1';
y(2).raill <= '0';
y(3).rail0 <= '1';
y(3).raill <= '0';
wait for 0 ns;
while (x(3).raill = '0' or x(2).raill = '0' or x(l).raill = '0' or x(O).rail1 = '0')
loop
wait until ko'event and ko = '0';
cy(2) <= y(O) and y(1);
cy(3) <= y(O) and y(l) and y(2);
cx(2) <= x(O) and x(l );
cx(3) <= x(O) and x(l) and x(2);
wait for 0 ns;
y_temp <=y;
y_next(O) <= not(y(O));
y_next(1) <= y(1) xor y(O);
y_ next(2) <= cy(2) xor y(2);
y_next(3) <= cy(3) xor y(3);
x_temp<=x;
x_next(O) <= not(x(O));
x_next(1) <= x(l) xor x(O);
x_ next(2) <= cx(2) xor x(2);
x_next(3) <= cx(3) xor x(3);
x(O).railO <= '0';
x(O).raill <= '0';
x(1).rail0 <= '0';
x(l ).raill <= '0';
x(2).rail0 <= '0';
x(2).raill <= '0';
x(3).rail0 <= '0';
x(3).raill <= '0';
y(O).railO <= '0';
y(O).raill <= '0';
y(l).railO <= '0';
y(1).raill <= '0';
y(2).rail0 <= '0';
48

y(2).raill <= '0';
y(3).rail0 <= '0';
y(3).raill <= '0';
wait until ko'event and ko = '1';
if (y_ temp(3 ).raill = '1' and
y_temp(1).raill = '1' and y_temp(O).raill = '1 ')then
x <= x next"
'
else
x <= x_temp;
end if;
y <= y_next;
wait for 0 ns;

y_temp(2).raill

'1'

and

end loop;
while (x(3).raill = '1' or x(2).raill = '1' or x(l).raill = '1' or x(O).raill = '1 ')
loop
wait until ko'event and ko = '0';
cy(2) <= y(O) and y(l);
cy(3) <= y(O) and y(1) and y(2);
cx(2) <= x(O) and x(1);
cx(3) <= x(O) and x(l) and x(2);
wait for 0 ns;
y_temp <=y;
y_next(O) <= not(y(O));
y_next(1) <= y(l) xor y(O);
y_next(2) <= cy(2) xor y(2);
y_next(3) <= cy(3) xor y(3);
x_temp <=x;
x_next(O) <= not(x(O));
x_next(l) <= x(l) xor x(O);
x_next(2) <= cx(2) xor x(2);
x_next(3) <= cx(3) xor x(3);
x(O).railO <= '0';
x(O).raill <= '0';
x(l).railO <= '0';
x(l ).raill <= '0';
x(2).rail0 <= '0';
x(2).raill <= '0';
x(3).rail0 <= '0';
x(3).raill <= '0';
y(O).railO <= '0';
y(O).raill <= '0';
49

y(1 ).railO <= '0';
y(1).raill <= '0';
y(2).rail0 <= '0';
y(2).raill <= '0';
y(3).rail0 <= '0';
y(3).raill <= '0';
wait until ko'event and ko = '1';
if (y_ temp(3 ).raill = '1' and y_ temp(2).raill = '1' and y_ temp(l ).raill = '1'
andy_temp(O).raill = '1') then
x <= x next"
'
else
x <=x_temp;
end if;
y <= y_next;
wait for 0 ns;
end loop;
wait;
end process;
OUTPUTS: process
variable ss: side;
variable bw: width := 6;
variable 1: line;
variable tm: time;
variable once: bit := '0';
file t: text is out "/homelbsparkma/Senior_ Thesis/ModelSim/t. txt";
variable j: integer := 0;
begin
ki.railO <= '0';
while (true) loop
ki <= '1';
wait until s'event and is_data(s);
for i in 0 to 7 loop
if(s(i).raill /= s_calc_arrayG)(i)) then
incorrect<= '1';
end if;
end loop;
if (once = '0') then
once:= '1';
else
write(l, now- tm, ss, bw);
writeline(t, 1);
end if;
50

tm :=now;
ki <= '0';
wait until s'event and is_null(s);
j := j+ 1;
end loop;
end process;
end TESTBENCH;
configuration CFG_TB_MULT4x4 ofTB_MULT4x4 is
for TESTBENCH
for UUT: MUL T4x4 -- In
end for;}
end for;
end;

51

B. VERILOG FILE
module mult4x4 ( vdd, gnd, x_3_RAIL1, x_3_RAILO, x_2_ RAIL1, x_2_RAILO, x_1_RAIL1,
x_1_RAILO, x_O_RAIL1, x_O_RAILO, y_3_RAIL1, y_3_RAILO,
y_2_RAIL1, y_2_RAILO,
y_1_RAIL1, y_1_RAILO, y_O_RAlLl,
y_O_RAlLO, ki, reset, s_7_RAIL1, s_7_RAILO,
s_6_RAIL1,
s_6_RAILO, s_5_RAIL1, s_5_ RAILO, s_4_RAIL1, s_4_RAILO,
s_3_RAIL1, s_3_RAILO, s_2_RAIL1, s_2_RAILO, s_1_RAIL1,
s_1_RAILO, s_O_RAILI,
s_O_RAILO, ko);
inout vdd, gnd; input x_3_ RAIL1 ;
input x_3_RAILO;
input x_2_RAIL1;
input x_2_RAILO;
input x_I_RAILI ;
input x_1_RAlLO ;
input x_O_RAIL1;
input x_ 0_RAlLO ;
input y_3_RAIL1;
input y_3_RA1LO;
input y_2_ RAlLl ;
input y_2_RAlLO ;
input y_1_RAIL1;
input y_1_RAlLO ;
input y_O_RAIL1;
input y_ 0_RAlLO ;
input ki;
input reset ;
output s_7_ RAlLl ;
output s_7_RAlLO ;
output s_6_RAIL1 ;
output s_6_RAILO;
output s_5_RA1Ll ;
output s_5_RAILO;
output s_4_ RAlLl ;
outputs_4_RA1LO;
output s_3_RAIL1 ;
output s_3_RAILO;
output s_2_RA1Ll ;
output s_ 2_RAlLO ;
output s_1_ RAlLl ;
output s_1_RAILO;
output s_O_RAIL1 ;
output s_ 0_RAlLO ;
output ko;
wire
di1_7_RA1Ll,
di1 _7_RAILO,
di1_6_RAIL1,
di1_6_RAILO,
dil _5_RAILI,
di1_5_RA1LO, di1_4_RAIL1, di1_4_RAILO, di1_3_RA1Ll, di1_3_RAILO,
di1_2_RA1Ll,
di1_2_RA1LO, di1_1_RAIL1, di1_1_RAILO, di1_0_RA1Ll,
di1_0_RA1LO, di2_7_RAIL1,
di2_7_RAILO, di2_6_RAIL1, di2_6_RAILO,
di2_5_RAIL1, di2_5_RAILO, di2_4_RAIL1,
di2_4_RAILO, di2_3_RA1Ll,
di2_3_RAILO, di2_2_RAIL1, di2_2_RAILO, di2_1_RAIL1,
di2_1_RAILO,
di2_0_RAIL1, di2_0_RA1LO, dol_7_RAIL1, dol_7_RA1LO, do 1_6_RAIL 1,
do1_6_RAILO, do1_5_RAIL1, do1_5_RAILO, dol_4_RA1Ll, do1_4_RAILO,
do1_3_RAIL1,
do1_3_RAILO, do1_2_RAIL1, dol_2_RAILO, dol _1_ RA1Ll,
do1_1_RAILO, dol_O_RAlLl,
do1_0_RAILO, do2_7_RAIL1, do2_7_RAILO;

52

wire
do2_6_RAILI, do2_6_RAILO, do2_5_RAILI, do2_5_RAILO, do2_4_RAIL1,
do2_4_RAILO, do2_3_RAILI, do2_3_RAILO, do2_2_RAIL1, do2_2_RAILO,
do2_l_RAIL1,
do2 1 RAlLO, do2 0 RAILI, do2 0 RAlLO, kod 7, kod 6,
kod 5, kod 4, kod 3, kod 2, kod 1,
kod=O~ kol, ko2,- kil, ki2, s~ ----;2,
- COMP_11 _ 1, COMP_1(2, SEiECT_iNPUT_dl,
SELECT_INPUT_d3,
SELECT_INPUT_rO,
SELECT_INPUT_rl,
SELECT_INPUT_r2,
SELECT_INPUT_r3, SELECT_OUTPUT_d 1,
SELECT_OUTPUT_d3, SELECT_OUTPUT_rO,
SELECT_OUTPUT_r 1, SELECT_OUTPUT_r2,
SELECT_OUTPUT_r3;
//endofwire
wire buffwireO_ Oko, buffwire 1_1 kosOOseq;
wire buffwireO_ Oreset, buffwire 1_Oreset, buffwire2_1 reset, buffwire3 _2reset, buffwire2 _1 resetsOOseq;
wire buffwireO_ Os2, buffwire 1_1 s2s00seq;
wire buffwireO_ Os 1, buffwire 1_1 s 1sOOseq;
wire bu ffwireO _Oki, buffwire 1_1 kisOOseq;
wire buffwireO_ Oko 1, buffwire 1_1 ko 1sOOseq;
wire buffwireO_ Oko2, buffwire 1_1 ko2s00seq;
mult4x4_ln COMB! ( .vdd(vdd), .gnd(gnd), .x_3_RAIL1 (dil_7_RAILI), .x_3_RAILO
(dil_7_RAILO), .x_2_RAIL1 (
dil_6_ RAILI), .x_2_RAILO (dil_6_RAILO), .x_ I_RAILI (
dil_5_RAIL1), .x_I_RAILO (dil _5_ RAILO), .x_O_RAILI (
dil_4_RAIL1), .x_O_RAILO
(dil_4_RAILO), .y_3_RAIL1 (
dil_3_RAIL1), .y_3_RAILO (dil _3_RAILO), .y_2_RAIL1 (
dil_2_RAIL1), .y_2_RAILO (dil _2_ RAILO), .y_l_RAILI (
dil_l_RAILI), .y_l_RAILO
(dil_l_RAILO), .y_O_RAILI (
dil_O_RAILI), .y_O_RAILO (dil _O_RAILO), .ki (kil), .reset
(buffwireO_Oreset), .s_7_ RAIL1 (dol_7_RAIL1), .s_7_RAILO (dol_7_ RAILO), .s_6_RAIL1 (
dol_6_RAIL1), .s_6_RAILO (do1 _6_ RAILO), .s_5_RAILI (
dol_5_RAIL1), .s_5_RAILO
(dol_5_RAILO), .s_4_RAIL1 (
dol_4_ RAIL1), .s_4_RAILO (dol_4_RAILO), .s_3_RAIL1 (
dol_3_RAILI), .s_3_RAILO (dol_3_RAILO), .s_2_ RAIL1 (
dol_2_RAIL1), .s_2_RAILO
(dol_2_RAILO), .s_l_RAILI (
dol_l_RAILI), .s_l_RAILO (dol_l_RAILO), .s_O_RAILI (
dol_O_RAILI), .s_O_RAILO (dol_O_RAILO), .ko (kol));
mult4x4_1n COMB2 ( .vdd(vdd), .gnd(gnd), .x_3_RAIL1 (di2_7_RAIL1), .x_3_RAILO
(di2_7_RAILO), .x_2_RAILI (
di2_6_RAIL1), .x_2_RAILO (di2_6_RAILO), .x_l_RAILI (
di2_5_RAIL1), .x_l_RAILO (di2_5_RAILO), .x_O_RAILI (
di2_4_RAIL1), .x_O_RAILO
(di2_4_RAILO), .y_3_RAILI (
di2_3_RAIL1), .y_3_RAILO (di2_3_RAILO), .y_2_RAIL1 (
di2_2_RAILI), .y_2_RAILO (di2_2_RAILO), .y_l_RAILI (
di2_l_RAIL1), .y_l_RAILO
(di2_l_RAILO), .y_O_RAILI (
di2_0_RAIL1), .y_O_RAILO (di2_0_RAILO), .ki (ki2), .reset
(buffwireO_Oreset), .s 7 RAILI (do2_7_RAILI), .s_7_RAILO (do2_7_RAILO), .s_6_RAIL1 (
do2_6_RAIL1), .s_6_RAILO (do2_6_RAILO), .s_5_RAIL1 (
do2_5_RAIL1), .s_5_RAILO
do2_4_RAILI), .s_4_RAILO (do2_4_RAILO), .s_3_RAIL1 (
(do2_5_RAILO), .s_4_RAILI (
do2_3_RAIL1), .s_3_RAILO (do2_3_RAILO), .s_2_RAILI (
do2_2_RAIL1), .s_2_RAILO
(do2_2_RAILO), .s_l_RAILI (
do2_l_RAIL1), .s_l_RAILO (do2_l_RAILO), .s_O_RAILI (
do2_0_RAIL1), .s_O_RAILO (do2_0_RAILO), .ko (ko2));
th44x0 COMP_gO ( .vdd(vdd), .gnd(gnd), .a (kod_O), .b (kod_l), .c (kod_2), .d (kod_3), .z (
COMP_11_1));
th44x0 COMP_gl ( .vdd(vdd), .gnd(gnd), .a (kod_4), .b (kod_5), .c (kod_6), .d (kod_7), .z (
COMP_11_2));
th22x0 COMP_g2 ( .vdd(vdd), .gnd(gnd), .a (COMP_11_1), .b (COMP_11_2), .z (ko));
th33nx0 SELECT_INPUT_gO ( .vdd(vdd), .gnd(gnd), .a (buffwireO_Oko), .b (SELECT_INPUT_d3), .c
(SELECT_INPUT_rl)
, .rst (buffwireO_Oreset), .z (s2));
th33dx0 SELECT_INPUT_gl ( .vdd(vdd), .gnd(gnd), .a (buffwireO_Oko), .b (buffwireO_Os2), .c
(SELECT_INPUT_r2), .rst (buffwireO_Oreset), .z (SELECT_INPUT_dl));
th33nx0 SELECT_INPUT_g2 ( .vdd(vdd), .gnd(gnd), .a (buffwireO_Oko), .b (SELECT_INPUT_dl), .c
(SELECT_INPUT_r3)
, .rst (buffwireO_Oreset), .z (sl));
th33nx0 SELECT_INPUT_g3 ( .vdd(vdd), .gnd(gnd), .a (buffwireO_Oko), .b (buffwireO_Osl), .c
(SELECT_INPUT_rO), .rst (buffwireO_Oreset), .z (SELECT_INPUT_d3)) ;
invxO SELECT_INPUT_iO ( .vdd(vdd), .gnd(gnd), .i (buffwire0_0s2), .zb (SELECT_INPUT_rO));

53

invxO SELECT_INPUT_il ( .vdd(vdd), .gnd(gnd), .i (SELECT_INPUT_dl), .zb (SELECT_INPUT_rl))
invxO SELECT_INPUT_i2 ( .vdd(vdd), .gnd(gnd), .i (buffwireO_Osl), .zb (SELECT_INPUT_r2));
invxO SELECT_INPUT_i3 ( .vdd(vdd), .gnd(gnd), .i (SELECT_INPUT_d3), .zb (SELECT_INPUT_r3))
thl2x0 MUX_OUTPUT_struct_7_comp0 ( .vdd(vdd), .gnd(gnd), .a (dol_7_RAILO), .b
(do2_7_RAILO), .z (
s_7_RAILO));
thl2x0 MUX_OUTPUT_struct_7_compl ( .vdd(vdd), .gnd(gnd), .a (dol_7_RAILI), .b
(do2_7_RAILI), .z (
s_7_ RAILI));
thl2x0 MUX_OUTPUT_struct_6_comp0 ( .vdd(vdd), .gnd(gnd), .a (dol_6_RAILO), .b
(do2_6_RAILO), .z (
s_6_ RAILO));
thl2x0 MUX_OUTPUT_struct_6_compl ( .vdd(vdd), .gnd(gnd), .a (dol_6_RAILI), .b
(do2_6_RAILI), .z (
s_6_ RAILI));
thl2x0 MUX_OUTPUT_struct_5_comp0 ( .vdd(vdd), .gnd(gnd), .a (dol_5_RAILO), .b
(do2_5_RAILO), .z (
s_5_RAILO));
thl2x0 MUX_OUTPUT_struct_5_compl ( .vdd(vdd), .gnd(gnd), .a (dol_5_RAILI), .b
(do2_5_RAILI), .z(
s_5_RAIL1));
thl2x0 MUX_OUTPUT_struct_4_comp0 ( .vdd(vdd), .gnd(gnd), .a (dol_4_RAILO), .b
(do2_4_RAILO), .z (
s_4_RAILO));
thl2x0 MUX_OUTPUT_struct_4_compl ( .vdd(vdd), .gnd(gnd), .a (dol_4_RAIL1), .b
(do2_4_RAIL1), .z (
s_4_RAIL1));
thl2x0 MUX_OUTPUT_struct_3_comp0 ( .vdd(vdd), .gnd(gnd), .a (dol_3_RAILO), .b
(do2_3_RAILO), .z (
s_3_RAILO));
thl2x0 MUX_OUTPUT_struct_3_compl ( .vdd(vdd), .gnd(gnd), .a (dol_3_RAIL1), .b
(do2_3_RAILI), .z (
s_3_RAILI));
thl2x0 MUX_OUTPUT_struct_2_comp0 ( .vdd(vdd), .gnd(gnd), .a (dol_2_RAILO), .b
(do2_2_RAILO), .z (
s_2_RAILO));
thl2x0 MUX_OUTPUT_struct_2_compl ( .vdd(vdd), .gnd(gnd), .a (dol_2_RAIL1), .b
(do2_2_RAILI), .z (
s_2_RAILI));
thl2x0 MUX_OUTPUT_struct_l_compO ( .vdd(vdd), .gnd(gnd), .a (dol_l_RAILO), .b
(do2_l_RAILO), .z (
s_l_RAILO));
thl2x0 MUX_OUTPUT_struct_ l_compl
.vdd(vdd), .gnd(gnd), .a (dol_l_RAILI), .b
(do2_l_RAILI), .z (
s_l_RAILI));
thl2x0 MUX_OUTPUT_struct_O_compO ( .vdd(vdd), .gnd(gnd), .a (dol_O_RAILO), .b
(do2_0_RAILO), .z (
s_O_RAILO));
thl2x0 MUX_OUTPUT_struct_O_compl ( .vdd(vdd), .gnd(gnd), .a (dol_O_RAILI), .b
(do2_0_RAILI), .z (
s_O_RAILI));
th33nx0 SELECT_OUTPUT_gO ( .vdd(vdd), .gnd(gnd), .a (buffwireO_Oki), .b (SELECT_OUTPUT_d3),
.c (
SELECT_OUTPUT_rl), .rst (buffwireO_Oreset), .z (ki2));
th33dx0 SELECT_OUTPUT_gl ( .vdd(vdd), .gnd(gnd), .a (buffwireO_Oki), .b (ki2), .c
(SELECT_OUTPUT_r2), .rst (buffwireO_Oreset), .z (SELECT_OUTPUT_dl));
th33nx0 SELECT_OUTPUT_g2 ( .vdd(vdd), .gnd(gnd), .a (buffwireO_Oki), .b (SELECT_OUTPUT_dl),
.c (
SELECT_OUTPUT_r3), .rst (buffwireO_Oreset), .z (kil));
th33nx0 SELECT_OUTPUT_g3 ( .vdd(vdd), .gnd(gnd), .a (buffwireO_Oki), .b (kil), .c
(SELECT_OUTPUT_rO), .rst (buffwireO_ Oreset), .z (SELECT_OUTPUT_d3)) ;
invxO SELECT_OUTPUT_iO ( .vdd(vdd), .gnd(gnd), .i (ki2), .zb (SELECT_OUTPUT_rO));
invxO SELECT OUTPUT il
( .vdd(vdd), .gnd(gnd), .i (SELECT_OUTPUT_dl), .zb
(SELECT_OUTPUT_rl));
invxO SELECT_OUTPUT_i2 ( .vdd(vdd), .gnd(gnd), .i (kil), .zb (SELECT_OUTPUT_r2));
invxO SELECT OUTPUT i3 ( .vdd(vdd), .gnd(gnd), .1 (SELECT_OUTPUT_d3), .zb
(SELECT_OUTPUT_r3));
th33nx0 DEMUX_INPUT_struct_7_comp_ill ( .vdd(vdd), .gnd(gnd), .a (x_3_RAILI), .b
(buffwireO_Os 1), .c (buffwireO_Oko I), .rst (buffwireO_ Oreset), .z (di 1_7_ RAILI )) ;
th33nx0 DEMUX_INPUT_struct_7_comp_il0 ( .vdd(vdd), .gnd(gnd), .a (x_3_RAILO), .b
(buffwireO_Osl), .c (buffwireO_Okol), .rst (buffwireO_Oreset), .z (dil_7_RAILO));

54

th33nx0 DEMUX_INPUT_struct_7_comp_i21 ( .vdd(vdd), .gnd(gnd),
(buffwireO_ Os2), .c (buffwireO_Oko2), .rst (buffwireO_ Oreset), .z (di2_7_ RAIL 1)) ;
th33nx0 DEMUX_INPUT_struct_7 _comp_i20 ( .vdd(vdd), .gnd(gnd),
(buffwireO_ Os2), .c (buffwireO_ Oko2), .rst (buffwireO_ Oreset), .z ( di2_7_RAlLO)) ;
th14bx0 DEMUX_INPUT_struct_7_comp_k0 ( .vdd(vdd), .gnd(gnd),
.c (di2_7_RAIL1), .d (di2_7_RAILO), .zb (kod_7));
(di1_7_RAILO),
th33nx0 DEMUX_INPUT_struct_6_comp_i11 ( .vdd(vdd), .gnd(gnd),
(buffwireO_ Os 1), .c (buffwireO_ Oko 1), .rst (buffwireO_ Oreset), .z (di 1_6_ RAILI )) ;
th33nx0 DEMUX_INPUT_struct_6_comp_il0 ( .vdd(vdd), .gnd(gnd),
(buffwireO_ Os 1), .c (buffwireO_Oko 1), .rst (buffwireO _Oreset), .z (di 1_6_RAlLO)) ;
th33nx0 DEMUX_INPUT_struct_6_comp_i21 ( .vdd(vdd), .gnd(gnd),
(buffwireO_ Os2), .c (buffwireO_ Oko2), .rst (buffwireO_ Oreset), .z (di2_ 6_RAIL 1)) ;
th33nx0 DEMUX_INPUT_struct_6_comp_i20 ( .vdd(vdd), .gnd(gnd),
(buffwire0_0s2), .c (buffwire0_0ko2), .rst (buffwireO_Oreset), .z (di2_6_RAILO));
th14bx0 DEMUX_INPUT_struct_6_comp_k0 ( .vdd(vdd), .gnd(gnd),
(di1_6_ RAILO),
.c (di2_6_RAIL1), .d (di2_6_RAILO), .zb (kod_6));
th33nx0 DEMUX_INPUT_struct_5_comp_i11 ( .vdd(vdd), .gnd(gnd),
(buffwire0_0s1), .c (buffwire0_0ko1), .rst (buffwireO_Oreset), .z (di1_5_ RAIL1));
th33nx0 DEMUX_INPUT_struct_5_comp_i10 ( .vdd(vdd), .gnd(gnd),
(buffwire0_0s1), .c (buffwireO_Okol), .rst (buffwireO_Oreset), .z (di1_5_RAILO));
th33nx0 DEMUX_INPUT_struct_5_comp_i21 ( .vdd(vdd), .gnd(gnd),
(buffwire0_0s2), .c (buffwire0_0ko2), .rst (buffwireO_Oreset), .z (di2_5_RAIL1));
th33nx0 DEMUX_INPUT_struct_5_comp_i20 ( .vdd(vdd), .gnd(gnd),
(buffwire0_0s2), .c (buffwire0_0ko2), .rst (buffwireO_Oreset), .z (di2_5_RAILO));
th14bx0 DEMUX_INPUT_struct_5_comp_k0 ( .vdd(vdd), .gnd(gnd),
(dil _5_ RAILO),
.c (di2_5_RAIL1), .d (di2_5_RAILO), .zb (kod_5));
th33nx0 DEMUX_INPUT_struct_4_comp_i11 ( .vdd(vdd), .gnd(gnd),
(buffwire0_0s1), .c (buffwireO_Okol), .rst (buffwireO_Oreset), .z (di1_4_RAIL1));
th33nx0 DEMUX_INPUT_struct_4_comp_i10 ( .vdd(vdd), .gnd(gnd),
(buffwireO_ Os 1), .c (buffwireO_ Oko I), .rst (buffwireO_ Oreset), .z (di 1_4_RAlLO)) ;
th33nx0 DEMUX_INPUT_struct_4_comp_i21 ( .vdd(vdd), .gnd(gnd),
(buffwire0_0s2), .c (buffwireO_Oko2), .rst (buffwireO_Oreset), .z (di2_ 4_ RAILI));
th33nx0 DEMUX_INPUT_struct_4_comp_i20 ( .vdd(vdd), .gnd(gnd),
(buffwire0_0s2), .c (buffwire0_0ko2), .rst (buffwireO_Oreset), .z (di2_ 4_ RAILO));
thl4bx0 DEMUX_INPUT_struct_4_comp_k0 ( .vdd(vdd), .gnd(gnd),
(di1_4_RAILO),
.c (di2_4_RAIL1), .d (di2_4_RAILO), .zb (kod_4));
th33nx0 DEMUX_INPUT_struct_3_comp_i11 ( .vdd(vdd), .gnd(gnd),
(buffwire0_0s1), .c (buffwireO_Oko1), .rst (buffwireO_Oreset), .z (di1_3_RAILI));
th33nx0 DEMUX_INPUT_struct_3_comp_i10 ( .vdd(vdd), .gnd(gnd),
(buffwire0_0s1), .c (buffwireO_Oko1), .rst (buffwireO_Oreset), .z (di1_3_RAILO));
th33nx0 DEMUX_INPUT_struct_3_comp_i21 ( .vdd(vdd), .gnd(gnd),
(buffwireO_Os2), .c (buffwireO_Oko2), .rst (buffwireO_ Oreset), .z (di2 _3 _ RAIL 1)) ;
th33nx0 DEMUX_INPUT_struct_3_comp_i20 ( .vdd(vdd), .gnd(gnd),
(buffwireO_Os2), .c (buffwire0_0ku2), .rst (buffwireO_Oreset), .z (di2_3_ RAILO));
th14bx0 DEMUX_INPUT_struct_3_comp_k0 ( .vdd(vdd), .gnd(gnd),
.c (di2_3_RAIL1), .d (di2_3_RAILO), .zb (kod_3));
(di1_3_RAILO),
th33nx0 DEMUX_INPUT_struct_2_comp_i11 ( .vdd(vdd), .gnd(gnd),
(buffwire0_0s1), .c (buffwireO_Oko1), .rst (buffwire1_0reset), .z (dil_2_RAIL1));
th33nx0 DEMUX_INPUT_struct_2_comp_il0 ( .vdd(vdd), .gnd(gnd),
(buffwireO_Osl), .c (buffwireO_Okol), .rst (buffwire1_0reset), .z (di1_2_RAILO));
th33nx0 DEMUX_INPUT_struct_2_comp_i21 ( .vdd(vdd), .gnd(gnd),
(buffwire0_0s2), .c (buffwire0_0ko2), .rst (buffwirel_Oreset), .z (di2_2_RAIL1));
th33nx0 DEMUX_INPUT_struct_2_comp_i20 ( .vdd(vdd), .gnd(gnd),
(buffwire0_0s2), .c (buffwire0_0ko2), .rst (buffwire1_0reset), .z (di2_2_RAILO));
th14bx0 DEMUX_INPUT_struct_2_comp_k0 ( .vdd(vdd), .gnd(gnd),
(di1_2_RAILO),
.c (di2_2_RAIL1), .d (di2_2_RAILO), .zb (kod_2));

55

.a

(x_3_RAIL1),

.b

.a

(x_3_RAILO),

.b

.a (di1_7_RAIL1),

.b

.a

(x_2_RAIL1),

.b

.a

(x_2_RAILO),

.b

.a

(x_2_RAILI),

.b

.a

(x_2_RAILO),

.b

.a (dil _6_ RAIL1),

.b

.a

(x_1_ RAIL1),

.b

.a

(x_ 1_ RAILO),

.b

.a

(x_1_ RAIL1),

.b

.a

(x_ 1_ RAILO),

.b

.a (dil _5_ RAIL1),

.b

.a

(x_O_ RAILI),

.b

.a

(x_O_ RAILO),

.b

.a

(x_O_ RAILI),

.b

.a

(x_O_ RAILO),

.b

.a (dil _ 4_ RAIL1),

.b

.a

(y_3_RAIL1),

.b

.a

(y_3_RAILO),

.b

.a

(y_3_RAILI),

.b

.a

(y_3_RAILO),

.b

.a (di1_3_RAIL1),

.b

.a

(y_2_RAIL1),

.b

.a

(y_2_RAILO),

.b

.a

(y_2_RAIL1),

.b

.a

(y_2_RAILO),

.b

.a (di1_2_RAIL1),

.b

th33nx0 DEMUX_INPUT_struct_1_comp_i11 ( .vdd(vdd), .gnd(gnd), .a (y_1_RAILl),
(buffwire0_0s1), .c (buffwireO_Okoi), .rst (buffwire1_0reset), .z (di1 _ 1_RAIL1)) ;
th33nx0 DEMUX_INPUT_struct_1_comp_i10 ( .vdd(vdd), .gnd(gnd), .a (y_I_RAILO),
(buffwireO _ Os I), .c (buffwireO_ Oko 1), .rst (buffwire 1_Oreset), .z (di I_I_RAlLO)) ;
th33nx0 DEMUX_INPUT_struct_I_comp_i21 ( .vdd(vdd), .gnd(gnd), .a (y_ 1_ RAIL1),
(buffwire0_0s2), .c (buffwireO_Oko2), .rst (buffwirei_Oreset), .z (di2_I_RAILI));
th33nx0 DEMUX_INPUT_struct_I_comp_i20 ( .vdd(vdd), .gnd(gnd), .a (y_ 1_ RAILO),
(buffwire0_0s2), .c (buffwireO_Oko2), .rst (buffwirei_Oreset), .z (di2_1_RAILO));
th14bx0 DEMUX_INPUT_struct_ 1_comp_k0 ( .vdd(vdd), .gnd(gnd), .a (di1_I_RAILI),
(dii _ I_ RAILO),
.c (di2_I_RAIL1), .d (di2_I_RAILO), .zb (kod_I));
th33nx0 DEMUX_INPUT_struct_O_comp_ii1 ( .vdd(vdd), .gnd(gnd), .a (y_O_RAILI),
(buffwireO_ Os I), .c (buffwireO_ Oko I), .rst (buffwire I_Oreset), .z (di I_0_ RAILl )) ;
th33nx0 DEMUX_INPUT_struct_O_comp_iiO ( .vdd(vdd), .gnd(gnd), .a (y_O_RAILO),
(buffwireO_Osi), .c (buffwireO_Oko I), .rst (buffwirel_Oreset), .z (dil_O_RAILO));
th33nx0 DEMUX_INPUT_struct_O_comp_i21 ( .vdd(vdd), .gnd(gnd), .a (y_O_RAILl),
(buffwire0_0s2), .c (buffwireO_Oko2), .rst (buffwirel _Oreset), .z (di2_0_RAILI));
th33nx0 DEMUX_INPUT_struct_O_comp_i20 ( .vdd(vdd), .gnd(gnd), .a (y_O_RAILO),
(buffwire0_0s2), .c (buffwireO_Oko2), .rst (buffwirel_Oreset), .z (di2_0_RAILO));
th I4bx0 DEMUX_INPUT_struct_0_ comp_ kO ( .vdd(vdd), .gnd(gnd), .a (di I_0_ RAILl ),
(dii_O_RAILO),
.c (di2_0_RAILI), .d (di2_0_RAILO), .zb (kod_O));
invert_a Gbuff_ko_O ( .vdd{vdd), .gnd(gnd), .a(buffwirel_IkosOOseq), .z(buffwireO_Oko));
invert_a Gbuff_ko_I ( .vdd(vdd), .gnd(gnd), .a(ko), .z(buffwirel_lkosOOseq));
buffer_c Gbuff_reset_O ( .vdd(vdd), .gnd(gnd), .a(buffwire2_lreset), .z(buffwireO_Oreset));
invert_a Gbuff_reset_I ( .vdd(vdd), .gnd(gnd), .a(buffwire2_Iresets00seq), .z(buffwirel_Oreset));
invert_a Gbuff_reset_2 ( .vdd(vdd), .gnd(gnd), .a(buffwire2_lreset), .z(buffwire2_lresets00seq));
invert_a Gbuff_reset_4 ( .vdd(vdd), .gnd(gnd), .a(buffwire3_2reset), .z(buffwire2_lreset));
invert_a Gbuff_reset_5 ( .vdd(vdd), .gnd(gnd), .a(reset), .z(buffwire3_2reset));
invert_c Gbuff_s2_0 ( .vdd(vdd), .gnd(gnd), .a(buffwirel_Is2s00seq), .z(buffwireO_Os2));
invert_a Gbuff_s2_I ( .vdd(vdd), .gnd(gnd), .a(s2), .z(buffwirei _ ls2s00seq));
invert_c Gbuff_sl_O ( .vdd(vdd), .gnd(gnd), .a(buffwirel_IslsOOseq), .z(buffwireO_Osi));
invert_a Gbuff_si_l ( .vdd(vdd), .gnd(gnd), .a(si), .z(buffwirei_lsisOOseq));
invert_a Gbuff_ki_O ( .vdd(vdd), .gnd(gnd), .a(buffwirel_IkisOOseq), .z(buffwireO_Oki));
invert_a Gbuff_ki_ I ( .vdd(vdd), .gnd(gnd), .a(ki), .z(buffwirei_lkisOOseq));
invert_c Gbuff_ko1_0 ( .vdd(vdd), .gnd(gnd), .a(buffwirel_lkolsOOseq), .z(buffwireO_Okol));
invert_a Gbuff_kol_l ( .vdd(vdd), .gnd(gnd), .a(kol), .z(buffwirel_lkoisOOseq));
invert_c Gbuff_ko2_0 ( .vdd(vdd), .gnd(gnd), .a(buffwire1 _ lko2s00seq), .z(buffwire0_0ko2));
invert_a Gbuff_ko2_1 ( .vdd(vdd), .gnd(gnd), .a(ko2), .z(buffwire1_lko2s00seq));
endmoduie

56

.b
.b
.b
.b
.b
.b
.b
.b
.b
.b

C. ADDITIONAL-MULTIPLIER VHDL FILES

C.l Two-Multiplier Design: mult4x4_1stage2.vhd
library ieee;
use ieee.std_logic_1164.all;
use work.ncl_signals.all;
entity mult4x4_1n2 is
port(x, y: in dual_rail_logic_VECTOR(3 downto 0);
ki, reset: in std_logic;
s: out dual_rail_logic_VECTOR(? downto 0);
ko: out std_Iogic);
end;
architecture BEHAVIOR ofmult4x4 ln2 is
signal di2: dual_rail_logic_vector(? downto 0);
signal ki2: std_Iogic;
component mult4x4_ln
port(x, y: IN dual_rail_Iogic_vector (3 downto 0);
ki, reset: IN std_logic;
s: OUT dual_rail_logic_vector (7 downto 0);
ko: OUT std_logic);
end component;
begin
UO: mult4x4_1n
port map(x, y, ki2, reset, di2, ko);
Ul: mult4x4 In
port map(di2(7 downto 4), di2(3 downto 0), ki, reset, s, ki2);
end BEHAVIOR;

C.2 Four-Multiplier Design: mult4x4_1stage4.vhd
library ieee;
use ieee.std_logic_1164.all;
use work.ncl_signals.all;
entity mult4x4_ln4 is
port(x, y: in dual_rail_logic_VECTOR(3 downto 0);
ki, reset: in std_logic;
s: out dual_rail_logic_VECTOR(? downto 0);
ko: out std_logic);
end;
architecture BEHAVIOR of mult4x4 1n4 is

57

signal di2: dual_rail_logic_ vector(? downto 0);
signal ki2 : std_ logic;
component mult4x4_1n2
port(x, y: IN dual_rail_Iogic_ vector (3 downto 0);
ki, reset: IN std_Iogic;
s: OUT dual_rail_ logic_vector (7 downto 0);
ko: OUT std_logic);
end component;
begin
UO: mult4x4 ln2
port map(x, y, ki2, reset, di2, ko);
Ul: mult4x4 ln2
port map(di2(7 downto 4), di2(3 downto 0), ki, reset, s, ki2);
end BEHAVIOR;

58

