University of Massachusetts Amherst

ScholarWorks@UMass Amherst
Masters Theses

Dissertations and Theses

March 2015

Architecting NP-Dynamic Skybridge
Jiajun Shi
University of Massachusetts Amherst

Follow this and additional works at: https://scholarworks.umass.edu/masters_theses_2
Part of the Electronic Devices and Semiconductor Manufacturing Commons, Nanotechnology
Fabrication Commons, and the VLSI and Circuits, Embedded and Hardware Systems Commons

Recommended Citation
Shi, Jiajun, "Architecting NP-Dynamic Skybridge" (2015). Masters Theses. 171.
https://doi.org/10.7275/6453453 https://scholarworks.umass.edu/masters_theses_2/171

This Open Access Thesis is brought to you for free and open access by the Dissertations and Theses at
ScholarWorks@UMass Amherst. It has been accepted for inclusion in Masters Theses by an authorized
administrator of ScholarWorks@UMass Amherst. For more information, please contact
scholarworks@library.umass.edu.

ARCHITECTING NP-DYNAMIC SKYBRIDGE

A Thesis Presented

by
JIAJUN SHI

Submitted to the Graduate School of the
University of Massachusetts Amherst in partial fulfillment
of the requirements for the degree of
MASTER OF SCIENCE IN ELECTRICAL AND COMPUTER ENGINEERING
February 2015
Department of Electrical and Computer Engineering

© Copyright by Jiajun Shi 2015
All Rights Reserved

ARCHITECTING NP-DYNAMIC SKYBRIDGE

A Thesis Presented

by
JIAJUN SHI

Approved as to style and content by:

_______________________________________
Csaba Andras Moritz, Chair

_______________________________________
Israel Koren, Member

_______________________________________
C. Mani Krishna, Member

________________________________
Christopher V. Hollot, Department Head
Electrical and Computer Engineering

ACKNOWLEDGEMENTS

I would like to gratefully and sincerely thank Dr. Csaba Andras Moritz for his
guidance, understanding and patience throughout my graduate study at Umass
Amherst. His mentorship was paramount in providing me correct attitude of doing
research and proper method to present ideas. He encouraged me to not only grow as a
programmer but also as an independent thinker. For everything you’ve done for me,
Dr. Moritz, I thank you. I would also like to thank the members of the Skybridge
research group, especially Mostafizur Rahman, Santosh Khasanvis and my good
partner Mingyu Li for their consistent help during my Master study. I would like to
thank the Electrical and Computer Engineering Department of Umass Amherst for the
support, which turns my dream into reality. In addition, I would also thank all
members in my Master Thesis committee, Drs. Csaba Andras Moritz, Israel Koren,
and C. Mani Krishna, for their significant help from proposal to the final defense.
I want to thank my parents, Jidong Shi and Aisu Xu, for their selfless and endless
love to me. Also for my uncle Jiqiang Shi, thanks for his selfless support for my study
in America.
Thank you, to all of you!

iv

ABSTRACT
ARCHITECTING NP-DYNAMIC SKYBRIDGE
FEBRUARY 2015
B.Eng., UNIVERSITY OF ELECTRONIC SCIENCE AND TECHNOLOGY OF
CHINA, CHENG DU, CHINA
M.S.E.C.E., UNIVERSITY OF MASSACHUSETTS, AMHERST
Directed by: Professor Csaba Andras Moritz

With the scaling of technology nodes, modern CMOS integrated circuits face severe
fundamental challenges that stem from device scaling limitations, interconnection
bottlenecks and increasing manufacturing complexities. These challenges drive
researchers to look for revolutionary technologies beyond the end of CMOS roadmap.
Towards this end, a new nanoscale 3-D computing fabric for future integrated circuits,
Skybridge, has been proposed [1]. In this new fabric, core aspects from device to
circuit style, connectivity, thermal management and manufacturing pathway are
co-architected in a 3-D fabric-centric manner.
However, the Skybridge fabric uses only n-type transistors in a dynamic circuit
style for logic and memory implementations. Therefore, it requires complicated
clocking schemes to overcome signal monotonicity associated with cascading
dynamic logic gates. For Skybridge’s large-scale circuits, the dynamic circuit style
requires cascaded stages to be micro-pipelined, which results in large number of
buffers used for storing minterms causing significant overhead in terms of area and
power. Moreover, implementation of logic is limited to NAND or AND-of-NAND
v

based logic expressions, which does not always result in compact circuits. In this
work,

we

propose

an

extension

of

original

Skybridge

fabric,

called

NP-Dynamic-Skybridge, to solve these challenges by using both n-and p-type
transistors in an innovative circuit style. Here, every stage in a given circuit is
implemented by either n-type or p-type dynamic logic.
Cascading n- and p-type dynamic logic effectively avoids signal monotonicity
problem, and allows combinational-like circuit implementation. This helps to simplify
the clocking scheme for cascaded logics requiring only one set of global precharge
and evaluate clock signals. And also it expands the degree of expressing logic
enabling expressions such as NOR, OR-of-NORs, in addition to those previously
mentioned. Furthermore, the number of pipeline stages is significantly reduced for a
given logic function, and buffer requirements are less compared with Skybridge 3D
fabric thus improving on area and power metrics.

vi

TABLE OF CONTENTS
Page
ACKNOWLEDGEMENTS ....................................................................................... iv
ABSTRACT .................................................................................................................. v
LIST OF TABLES ...................................................................................................... ix
LIST OF FIGURES…………………………………………………………………. x
CHAPTER .................................................................................................................... 1
1.

INTRODUCTION AND MOTIVATION .................................................... 1

2.

OVERVIEW OF SKYBRIDGE FABRIC ................................................... 5
2.1 Core Fabric Components and Elementary Circuits ................................... 6
2.1.1 Vertical Silicon Nanowires .................................................................. 6
2.1.2 Vertical Gate-All-Around Transistor and NAND gate ........................ 7
2.1.3 Elementary circuits built on vertical nanowires .................................. 7
2.2 Challenges of Skybridge Fabric ................................................................. 9
2.2.1 Limited Type of Gate Logic ................................................................ 9
2.2.2 Complicated Control Clock ............................................................... 10
2.2.3 Large Overhead Due To Buffers........................................................ 10
2.3 Chapter Summary .................................................................................... 11

3.

NP-DYNAMIC-SKYBRIDGE FABRIC OVERVIEW AND CORE
COMPONENTS ........................................................................................... 12
3.1 NP-Dynamic-Skybridge Fabric Overview ............................................... 12
3.2 Core Components of NP-Dynamic-Skybridge Fabric ............................. 12
3.2.1 Vertical Nanowires ............................................................................ 12
3.2.2 Vertical Gate-All-Around Transistor ................................................. 14
3.2.3 Coaxial Routing Structure ................................................................. 16
3.2.4 Ohmic contact and bridge.................................................................. 17
3.3 Chapter Summary .................................................................................... 18

4.

ELEMENTARY CIRCUITS ...................................................................... 19
vii

4.1 Logic Nanowire ....................................................................................... 19
4.2 Compound Gate ....................................................................................... 20
4.3 Cascaded Gates ........................................................................................ 21
4.3 Chapter Summary .................................................................................... 23
5.

BENCHMARKING AND RESULTS ........................................................ 24
5.1 Benchmarking Methodology ................................................................... 24
5.2 Benchmarking Results and Scalability .................................................... 25
5.2.1 Benchmarking of 4-bit Carry Look-Ahead Adder ............................ 25
5.2.2 Results of Benchmarking .................................................................. 26
5.3 Chapter Summary .................................................................................... 27

6.

FAN-IN ANALYSIS AND SCALABILITY .............................................. 28
6.1 Fan-in Analysis ........................................................................................ 28
6.1.1 Evaluation Methodology ................................................................... 28
6.1.2 Fan-in Sensitivity Analysis ................................................................ 29
6.2 Scalability and Larger Scale Benchmarking ............................................ 31
6.2.1 Scalability Study................................................................................ 31
6.2.2 Benchmarking for Larger Scale Design ............................................ 31
6.3 Chapter Summary .................................................................................... 32

7.

WIRE STREAM PROCESSOR BENCHMARKING ............................. 33
7.1 Optimized Pipelining scheme .................................................................. 33
7.1.1 Pipelining Scheme in Skybridge Fabric ............................................ 33
7.1.2 Proposed Pipelining Scheme of NP-Dynamic Skybridge ................. 35
7.1.3 Timing and Clock Optimization ........................................................ 35
7.2 WISP-4 Benchmarking ............................................................................ 38
7.2.1 WISP-4 Circuit and Architecture....................................................... 38
7.2.2 Benchmarking Results ....................................................................... 40
7.3 Chapter Summary .................................................................................... 41

BIBLIOGRAPHY ...................................................................................................... 42

viii

LIST OF TABLES
Table

Page

3.1 On/Off current of 16nm 3-D GAA junctionless device .................................. 15
3.2 On/Off current of 16nm Fin-Fet device .......................................................... 16
5.1 Evaluation results of 4-bit CLA ...................................................................... 27

ix

LIST OF FIGURES
Figure

Pages

1.1 Ioff versus Leff at VDD=1 V for bulk-Si and Double-Gate
devicesimplemented inverters ............................................................................... 1
1.2 Trend of: A) supply voltage and B) threshold voltage for various
versions of ITRS .................................................................................................... 2
1.3 Relative performance at constant power density ............................................. 2
1.4 Lithographic challenges with scaling .............................................................. 3
2.1 Abstract view of envisioned skybridge fabric ................................................... 5
2.2 Silicon nanowires on bulk substrate ................................................................ 6
2.3 GAA transistor and I-V characteristic ............................................................. 7
2.4 Schematic and layout of 3-input NAND gate ................................................... 8
2.5 Skybridge 1-bit Full Adder ............................................................................... 8
2.6 Logic implementation comparison ................................................................... 9
2.7 Separate stages design of carry look-ahead adder .......................................... 11
3.1 Vertically stacked dual-doped Substrate ......................................................... 13
3.2 Dual-doped nanowire array............................................................................. 13
3.3 Vertical GAA junctionless transistors ............................................................. 14
3.4 Transistors IV Characteristics ......................................................................... 15
3.5 Coaxial routing structure and bypass routing layer ........................................ 16
3.6 Ohmic contact for n-type silicon..................................................................... 18
4.1 NAND gate and NOR gate in NP-Dynamic-Skybridge fabric ....................... 20
4.2 XOR gate layout ............................................................................................. 21
4.3 Cascaded gates and waveform validation ....................................................... 22
5.1 Block diagram of 4-bit CLA ........................................................................... 26
6.1 Example gate for fan-in sensitivity analysis ................................................... 29
x

6.2 Fan-in sensitivity curves ................................................................................. 30
6.3 Determination for maximum fan-in for NP-Dynamic .................................... 31
6.4 Evaluation results of 4-bit and 8-bit CLAs ..................................................... 32
7.1 Pipelining Scheme of Skybridge ..................................................................... 34
7.2 Pipelining Scheme of NP-Dynamic Skybridge............................................... 35
7.3 Timing of NP-Dynamic Skybridge ................................................................. 37
7.4 Timing of NP-Dynamic Skybridge ................................................................. 37
7.5 Clock Optimization ......................................................................................... 38
7.6 Architecture Block Diagram ........................................................................... 40
7.7 WISP-4 Benchmarking Results ...................................................................... 41

xi

1CHAPTER 1
INTRODUCTION AND MOTIVATION

Tremendous progress in miniaturization of integrated circuits (ICs) has been crucial
for the socio-economic developments in the last century. So far, this miniaturization
was mainly enabled by the ability to continuously scale the CMOS technology. As the
scale of CMOS technology nodes goes down, it is faced with several challenges and
special difficulty to maintain the traditional way of scaling. Firstly, in terms of the
devices, technology scaling enhances short channel effects, resulting in the larger
off-leakage current [2]. What’s more, as the device scales down, the threshold voltage
and Vdd value do not go down linearly [2], which results in degradations of
performance and power in building circuits with high density.

Figure. 1.1 Ioff versus Leff at VDD=1 V for bulk-Si and Double-Gate devices
implemented inverters [3]
1

B)

A)

Figure. 1.2 Trend of: A) supply voltage and B) threshold voltage for various
versions of ITRS [2]
As more transistors integrated into the same die area, it becomes difficult to design
compact circuits and routings. Large resistance and capacitance from interconnections
cause significant degradation in circuit’s performance and power. Microprocessor’s
performance is faced with a corner and taken into a bottleneck [4]. And the power
density of a microprocessor will soon climb beyond the capabilities of any possible
cooling techniques in the future.

Figure. 1.3 Relative performance at constant power density [4]
2

The increased defects and parameter variations during manufacturing also challenge
current CMOS technology. The lithography technology can’t cope with the shrinking
feature size of CMOS layout because it is limited by difficulties in controlling
mask-wafer gap and uniform exposure of photoresists on wafer respectively. And large
mismatch of manufactured CMOS layout has big impact on the microprocessor’s
reliability.

Figure. 1.4 Lithographic challenges with scaling [4]
Being faced with the challenges mentioned above, an emerging concept of
integrated circuits, Skybridge, has been proposed with 3-D integrated circuits based
on vertical silicon nanowires [1]. Compared with conventional 2-D CMOS technology,
it has compact circuits and connectivity through building 3-D routing wires and
transistors. And the junctionless Gate-All-Around transistors built on vertical
nanowire can effectively suppress leakage. In addition, the fabrication of
interconnections and transistors are dependent on material deposition and not on
optical lithography precision. However, the Skybridge fabric uses only n-type
transistors in a dynamic circuit style to implement arbitrary logics. This leads to
complex control clock system, limitation of logic expression’s implementation and
3

large overhead due to buffers in large-scale synchronous micro pipeline. We will
detail these challenges in chapter 2.
In this thesis, we propose a new approach which makes optimization over the
Skybridge fabric by incorporating n- and p-type transistors. This new approach tries
to avoid typical monotonicity problem by cascading n- and p-type dynamic logics. It
uses the dynamic logic style with precharge and evaluate clocks. And it implements
the logics following a combinational-like logic style which enables all stages of a
given to be evaluated together in one clock period. This approach not only reduces the
requirement of complex control clock system and also provides various choices for
logic implementation. Additionally, the static-like implementation of circuit helps to
reduce buffers in large-scale synchronous pipeline design and build up compact
single-rail logic circuits with optimum performance and low power.
The rest of the thesis is organized as follows: Chapter 2 presents the overview of
Skybridge fabric and what are the challenges in this fabric. The chapter 3 discusses
the proposed new fabric and its core components. Based on the basic fabric presented
in Chapter 3, Chapter 4 provides the elementary circuits and how the new fabric can
achieve diverse logics. The Chapter 5 mainly focuses on the initial evaluation of the
proposed fabric by benchmarking 4-bit carry look-ahead adder (CLA). Chapter 6
presents how to gain high fan-in gate in the proposed fabric and the scalability for
large-scale benchmark. Finally, in chapter 7, we will show the large scale benchmark
of 4-bit microprocessor with an optimized pipelining scheme.

4

CHAPTER 2
OVERVIEW OF SKYBRIDGE FABRIC

In order to make revolution and search for a new roadmap of integrated circuits,
researchers come up with the 3-D integrated circuits concepts based on vertical
nanowires, shown in the previously proposed Skybridge fabric [1]. It tries to eliminate
the challenges of current CMOS integration and explores a world of 3-D integrated
circuits which is built from bottom-up includes the manufacturability, fabrication,
device physics, circuits style and microprocessor design. Based on its fundamental
building block, nanowires array, gate-all-around transistors are stacked vertically on
the nanowires to build elementary NAND gates. Large-scale integrated circuits can be
built by cascading these NAND gates with compact 3-D interconnections and
routings.

Figure.2.1 Abstract view of envisioned skybridge fabric [1]
5

2.1 Core Fabric Components and Elementary Circuits
2.1.1 Vertical Silicon Nanowires
In Skybridge fabric, the 3-D integrated circuits are built following bottom-up
architecture style with compact routing structure to deal with key requirements of
current integrated circuits design. Regular Arrays of single crystal vertical silicon
nanowires are fundamental building blocks of Skybridge fabric. These nanowires are
classified such that some of them are used as (i) logic nanowires to implement basic
NAND gate logic consisting of as stack of vertical transistors, and (ii) signal
nanowires as conducting wires to carry Input/Output/Global signals between cascaded
gates. Heavily doping is a key requirement for these nanowires, because it is
necessary for building low-resistance vertical Gate-All-Around (GAA) transistors.
Another secondary characteristic of these nanowires is that the signal nanowires
should be silicided to reduce electrical resistance as they are used as conducting wires
for carting signals between cascaded gates. Figure. 2.2 shows arrays of regular
vertical silicon nanowires that are patterned on highly doped silicon substrate.

Silicon
nanowire

Bulk Silicon
Substrate

Figure.2.2 Silicon nanowires on bulk substrate [1]
6

2.1.2 Vertical Gate-All-Around Transistor and NAND gate
As mentioned in Chapter 1, as the feature size of current CMOS devices shrinks,
the short channel effects has significant negative impact on device’s leakage. Based
on the crystal nanowires, the Skybridge fabric builds stacked 16nm gate-all-around
(GAA) vertical transistors [5]. This device is well-suited in Skybridge fabric because
it effectively suppresses the short channel effects to reduce leakage for below-20nm
transistor. What’s more, they eliminate the requirement of precise doping of devices’
manufactureing and are entirely dependent on deposition technology but not optical
lithography precision. The structure of the GAA vertical transistor with 16nm feature
size and its I-V characteristic are shown in the Figure. 2.2.
Nanowire
(Silicon)

Ohmic Contact
(Ti)
Gate Oxide
(HfO2)

16nm
5nm

Gate Electrode
(TiN)

2nm
16nm

Spacer (Si3N4)

10nm

Figure.2.3 GAA transistor and I-V characteristic [1]
2.1.3 Elementary circuits built on vertical nanowires
Based on the crystal vertical nanowires array, Skybridge fabric builds stacked GAA
transistors vertically on every nanowire to implement NAND gates. The
implementation of single NAND gate is shown in the Figure. 2.3.All the nanowires
are uniform with VDD terminal on the top and GND terminal on the bottom.
7

Figure.2.4 Schematic and layout of 3-input NAND gate
By using signal nanowires to connect and conducting Input/Output singles between
cascaded NAND gates, Skybridge can build NAND-NAND cascaded logic[1]. And if
we connect the output nodes of NAND gates together, compound gate [6] can be be
built with AND-of-NANDs logic. Figure. 2.3 shows the logic implementation for
1-bit full adder with compound
gates in AND-of-NANDs logic.
Uniform n-type transistors are
used to construct the circuits and
one set of precharge/evaluate
clock signal is used to control
the execution of logic.

Figure.2.5 Skybridge 1-bit Full Adder
8

2.2 Challenges of Skybridge Fabric
The Skybridge fabric uses only n-type transistors to implement arbitrary logic with
dynamic circuit style. This leads to complex control clocking scheme, limitation of
logic expression’s implementation and large overhead due to buffers in large-scale
synchronous micro-pipeline design.
2.2.1 Limited Type of Gate Logic
NAND logic is built by stacking n-type transistors vertically on logic nanowire.
And a compound gate with AND of NANDs logic can be implemented by connecting
the output nodes of logic nanowires together. However, these logics are not enough to
provide compact circuits design. In Figure. 2.6, we show the true logic expression
‘ABC+EF’ is simply implemented with OR-of-NORs logic while it uses much more
transistors when implemented with AND of NANDs logic. On the other hand, when
implementing its complementary logic expression, ̅̅̅̅̅̅̅̅̅̅̅̅̅
ABC + EF, AND of NANDs logic
provides a more compact implementation.

AND of NANDs logic

+

+
AND of NANDs logic

+

+
+

+

OR of NORs logic
OR of NORs logic

Figure.2.6 Logic implementation comparison

In the proposed NP-Dynamic-Skybridge, we can have both AND-of-NANDs logic
and OR-of-NORs logic. This results in more flexibility in circuits’ design and helps us
9

to gain the most compact design for a given logic.
2.2.2 Complicated Control Clock
Since the cascaded gates are all implemented with n-type logic, there will be typical
monotonicity problem [6] between cascaded n-type dynamic logic gates which results
in functional errors, if all the cascaded gates’ evaluation periods are set to be finished
in the same clock period. In order to avoid this problem, the stages of a given circuit
are evaluated separately which requires several sets of precharge and evaluate clocks
to be set up to control and synchronize the pipeline. Therefore, it complicates the
clock system’s design. Additionally, if take the clock skew and jet into account, it
hurts the circuits’ function and reliability.
2.2.3 Large Overhead Due To Buffers
In the last section, we explained why the circuit has to be partitioned into several
stages and all stages are evaluated in separated clock periods. In addition, because of
using dynamic logic style, each stage’s output signals should be restored for
synchronous micro pipelining. As we show in Figure.2.7, the Propagation signals,
which are generated in the first stage, are stored in the buffers during the second stage
and carried to the third stage for the evaluation of final results. Therefore, for any
given circuit with separated stages, buffers are used for storing and carrying the
output signals between different stages. It causes large overhead of area and even
hurts the power and performance of large-scale pipeline design.

10

BUF
PG
PRE1

EVA1

Carry
PRE2

EVA2

Sum
PRE3

EVA3

Figure.2.7 Separate stages design of carry look-ahead adder
2.3 Chapter Summary
In this chapter, the overview of the Skybridge integration fabric was presented. We
illustrated its core components and how people came up with basic concepts to
eliminate current challenges in conventional 2-D CMOS design. In addition, we
discussed the challenges of Skybridge fabric including logic implementations,
complexity of clock system and large overhead from buffers.
We will introduce the NP-Dynamic-Skybridge fabric in the next chapter and
explain how it can eliminate these challenges and make optimization over Skybridge
fabric.

11

CHAPTER 3
NP-DYNAMIC-SKYBRIDGE FABRIC OVERVIEW AND CORE
COMPONENTS

3.1 NP-Dynamic-Skybridge Fabric Overview
Skybridge fabric follows a fabric-centric mindset, assembling structure on a 3-D
uniform template of single crystal vertical nanowires, keeping 3-D requirements,
compatibility, and overall efficiency as its central goals. And the proposed
NP-Dynamic-Skybridge traces on similar roadmap of Skybridge fabric and makes
extension to achieve more benefits.
Following the idea of Skybridge, we try to extend its basic concepts by
incorporating both n-type and p-type transistors in the new approach. It is built from
bottom-up by stacking n- and –p type devices on dual-doped nanowire through
material deposition. Proper materials are chosen to construct low-resistance contacts
for n-type and p-type doped regions of each nanowire. New coaxial routing structures
which are connected with bridges are proposed to primarily satisfy connectivity. In
this chapter, we detail the core components of this new fabric and how it is used in
unison to achieve desired functionality.
3.2 Core Components of NP-Dynamic-Skybridge Fabric
3.2.1 Vertical Nanowires
In the proposed fabric, the 3-D circuits are built by incorporating both n- and ptypes devices on each vertical dual-doped nanowire. Vertical nanowires are core
building blocks patterned from heavily doped silicon substrate. Therefore, it requires
12

the preparation of the substrate which has both n- and p-type doped regions. However,
the scheme of doping the substrate surface separately region by region through ion
implantation technology is not feasible because this technology is bad at doping
region’s lateral control. Instead, we come up with a stacked doped layers’ structure for
the preparation of vertically dual-doped substrate. It is shown in the Figure. 3.1.

Figure.3.1 Vertically stacked dual-doped Substrate
Such a dual-doped substrate with vertically stacked silicon layers is formed by
bonding two heavily n-type and p-type doped substrates through molecular bonding
technology [7][8].Between the n-type and p-type doped silicon layers, there is a
silicon dioxide layer for isolation. After preparation of such substrate, we pattern out
the dual-doped nanowires array, as shown in Figure. 3.2.

Figure.3.2 Dual-doped nanowire array
13

3.2.2 Vertical Gate-All-Around Transistor
In Skybridge fabric, people choose n-type Gate-All-Around junctionless transistor
with uniform doping in Drain/Source/Channel eliminating the requirement of abrupt
doping variations within the devices. Similarly, Gate-All-Around (GAA) transistors
are chosen as active device for NP-Dynamic Skybridge fabric. We build both n-type
and p-type GAA transistors vertically on every nanowire and each type of transistor is
built on its respectively doped region. Both of them are junctionless transistors whose
channel conduction is modulated by the workfunction difference between the heavily
doped channel and the gate [9]. Titanium nitride and tungsten nitride are chosen as
gate materials for n-type and p-type transistors to provide proper workfunction for
gate control [10][11]. Figure. 3.3 shows the structures of both devices. Based on the
device process emulation of Synopsys Sentaurus tool, IV characteristics of both
devices are carried out and shown in Figure. 3.4.

B)

A)

Figure.3.3 Vertical GAA junctionless transistors: A) n-type transistor with gate
material TiN and B) p-type transistor with gate material WN0.6

14

Ids(A)

Drain Current

3e-05

|Vgs|=0v-1.2v
(Vs=0v)

2e-05

1e-05

Vds(v)

A)

0.2v

0.4v

0.6v

0.8v

1v

1.2v

Ids(A)
-1.2v

-1v

-0.8v

-0.6v

-0.4v

Vds(v)

-0.2v

Drain Current

-0.8e-05

|Vgs|=0v-1.2v
(Vs=0.8v)

B)

-1.6e-05

-2.4e-05

Figure.3.4 Transistors IV Characteristics: A) n-type transistor IV curve and B)
p-type transistor IV curve
However, junctionless transistor has its intrinsic weakness. When it is on, its works
on accumulation model [18]. Therefore, the junctionless transistor’s on current is
much less than the on current of junction transistor [19], which works on inversion
model. Table Ⅱ and Table III show the comparison of on current, off current and
their ratio of 3-D junctionless GAA device and conventional CMOS Fin-Fet device
with AUS PTM Model [20].
Table.3.1 On/Off current of 16nm 3-D GAA junctionless device
Device

Ion

Ioff

Ion/Ioff

VNJT-n-type

16.3uA

0.095nA

1.72E05

VNJT-p-type

16uA

0.76nA

2.11E04

15

Table.3.2 On/Off current of 16nm Fin-Fet device
Device

Ion

Ioff

Ion/Ioff

Fin-Fet n-type

103.75uA

6.02nA

1.72E04

Fin-Fet p-type

74.7uA

5.23nA

1.4E04

3.2.3 Coaxial Routing Structure
By stacking transistors on each nanowire, simple NAND or NOR gate can be built
as functional logic gate similarly with the elementary NAND gate in Skybridge fabric
(see chapter4). These nanowires can be called logic nanowires. In order to link these
logic nanowires for carrying Input/Output signals between cascaded gates, building
signal nanowires for routing is necessary. Each signal nanowire is built following
coaxial structure with two metal layers and one dielectrics layer for routing. And two
adjacent routing layers are isolated by a dioxide layer. Figure. 3.4 shows the structure
of the coaxial routing structure.

Figure.3.5 Coaxial routing structure and bypass routing layer
The inner dielectrics layer consisted by dual-doped nanowire is not conductive
16

because initially we insert the silicon dioxide layer for isolating p-type silicon from
n-type silicon region. In order to maintain its conductivity, an outer routing layer is
designed to form low resistance Ohmic contact conducting signal to bypass the
isolation layer.
3.2.4 Ohmic contact and bridge
Logic nanowires and signal nanowires are the fundamental blocks of 3-D fabric. In
order to achieve its functionality, it is important to build connection between logic
nanowire and signal nanowire enabling high degree connectivity. Based on this
purpose, low resistance contact is applied to conduct signals between heavily doped
silicon and metal. Due to constraints in wafer masking in current CMOS technology,
contacts are built with uniform material, tungsten [12], which forms low-resistance
contact with p-type doped silicon but high-resistance contact with n-type silicon.
Nevertheless, in our approach, distinct Ohmic contacts are constructed separately
layer by layer following bottom–up architecture. In this way, we can form
low-resistance Ohmic contact for n- and p- types of doped silicon respectively. The
structure and used materials of these two Ohmic contacts is shown in Figure. 3.5. We
choose nickel for p-type silicon nanowire Ohmic contact and titanium for n-type
nanowire. Each of these two metal have proper workfunctions for the formation of
low Schottky Barrier to achieve low resistance [13][14]. A thin titanium nitride layer
in the p-type nanowire Ohmic contact is used for avoiding the reaction between nickel
and tungsten [12].
Bridge is another important link for connecting nanowires. The function of Ohmic
17

contact is to conduct the signal between doped silicon and metal with low resistance,
while the feature of bridge is to carry signals between Ohmic contacts. As shown in
Figure. 3.5, tungsten is used as the material to format the bridges because it has good
adhesion ability with titanium. The bridge can be built through material deposition
technology as people presented in [14].

A)

B)
Figure.3.6 Ohmic contact for n-type silicon: A) Ohmic contact on n-type
nanowire and B) Ohmic contact on p-type nanowire
3.3 Chapter Summary
In this chapter, we showed the core components of NP-Dynamic-Skybridge fabric
and how we extended the basic 3-D concepts to build a new world of 3-D integrated
circuits. Based on the fundamental components we introduced in this chapter, the
elementary circuits will be demonstrated in next chapter to explain how these
elementary circuits can eliminate some of the challenges in Skybridge fabric.
18

CHAPTER 4
ELEMENTARY CIRCUITS

As we mentioned in Chapter 2, Skybridge fabric requires complicated control clock
system due to monotonicity problem in cascaded n-type dynamic logics. Moreover,
logical expressions are limited to NAND or AND-of-NAND based implementations.
Furthermore, for Skybridge’s large-scale circuits, the used dynamic circuit style
requires cascaded stages to be micro-pipelined, which results in a large number of
buffers causing significant overhead. In this chapter, we show how to achieve diverse
implementations for a given logic expression in NP-Dynamic Skybridge fabric and
how to build compact elementary circuits by incorporating both n-type and p-type
transistors based on the core components we discussed in last chapter.
4.1 Logic Nanowire
Logic nanowire is the core building block which is used to implement elementary
logics. In Skybridge fabric, stacking n-type transistors vertically on nanowire is the
fundamental way to build elementary 3-D NAND gate. By connecting output nodes of
these NAND gates together, compound gate logic AND-of-NANDs can be
implemented. In our approach, through stacking n- and p- type transistors on
dual-doped nanowire (see chapter 3), both NAND and NOR gates can be built on one
nanowire. Figure. 4.1 shows the NAND and NOR 3-D layouts. Compared with the
same types of gates in Skybridge fabric, these elementary gates built in NP-Dynamic
Skybridge fabric have more compact implementations. For example, building a
19

5-input NOR gate requires five 1-input NAND gates, and the outputs of these five
NAND gates are connected to format AND of NANDs gate logic. This critical
characteristic drives compact interconnection for circuits routing and results in
improved power and density.

A)

B)

Figure.4.1 NAND gate and NOR gate in NP-Dynamic-Skybridge fabric: A)
NAND gate with five n-type transistors stacked on n-type nanowire and B) NOR
gate with five p-type transistors stacked on p-type nanowire
4.2 Compound Gate
In Skybridge fabric, by connecting the outputs of NAND gates together, people
build AND-of-NANDs compound gate logic. In our approach, besides NAND gate, a
NOR gate can also be built by stacking p-type transistors on one nanowire. Through
20

connecting the outputs of NOR gates, we can implement another kind of gate logic,
OR-of-NOR logic. Figure. 4.2 shows the implementation of a XOR gate by both
AND-of-NANDs and OR-of-NORs logic. It is obvious that NP-Dynamic Skybridge
has better logic flexibility which helps us to achieve diverse ways to implement a
given logic.

A)

B)

Figure.4.2 XOR gate layout: A) AND-of-NANDs implementation and B)
OR-of-NORs implementation
4.3 Cascaded Gates
Cascading logic gates is an important way to implement complicated functions in
integrated circuits design. Since single large gate with complicated logic usually has
21

high delay, a given implemented is usually implemented by several small gates, and
each gate’s input ports are gated by previous gate’s output. In this way, the
complicated logic expressions are implemented by considering the tradeoff between
transistors’ count and performance. Skybridge fabric requires complex control clock
system because the separately evaluated stages need corresponding multiple sets of
precharge and evaluate control clocks as we discussed in chapter 2. However, in the
fabric of NP-dynamic-Skybridge, two cascaded stages are built with two different
types of dynamic logics (n- and p- type). Such a scheme avoids the typical
monotonicity problem of cascaded dynamic logic gates, and all the cascaded stages of
a given circuit are evaluated in the same clock period by uniform control clock. Only
one set of precharge and evaluate clocks is required, and it simplifies the control clock
system design. Figure. 4.3 shows a simple example of cascaded gates and its
validation of function.

A)

B)

C)

Figure.4.3 Cascaded gates and waveform validation: A) Schematic of cascading
NAND and NOR gates and B) Layout of cascaded gates and C) HSPICE simulation
waveforms of cascaded gates

22

4.3 Chapter Summary
In this chapter, we mainly showed how to build the elementary circuits based on the
core components of NP-Dynamic-Skybridge fabric. Further explanations of diversity
and flexibility in logic implementations were presented. It is a critical contribution to
drive compact circuits design. Additionally, the scheme of cascading two different
types’ dynamic logics effectively reduces the difficulty of control clock system’s
design. In next chapter, we will provide the benchmarking results of the proposed
fabric and discuss about its potential in power and performance improvement.

23

CHAPTER 5
BENCHMARKING AND RESULTS

Benchmarking is a quantitative approach to show whether the proposed approach
has potential in improving circuits’ key metrics. In this chapter, we detail on the
benchmarking methodology and make analysis for benchmarking results of an
arithmetic circuit, 4-bit carry look-ahead adder (CLA) [15]. Based on the core
components and elementary circuits we had discussed in chapter 3 and 4, the
arithmetic circuit was built by cascading elementary circuits with a micro pipelined
circuit style. And the layout was built following vertically bottom-up architecture
style which includes fundamental nanowire array, contacts, bridge and devices. After
preparation of schematic and layout [1], we made initial evaluation for the benchmark
to quantify the improvement of the proposed fabric over Skybridge fabric.
5.1 Benchmarking Methodology
Comprehensive methodologies, from the material layer to system, were developed
to evaluate the potential of Skybridge vs. CMOS. All circuit simulations followed a
bottom-up simulation methodology that includes device physics, 3-D interconnect
parasitics, 3-D circuit style, and benchmarking power/performance.
(i) TCAD device simulation: Process simulation was done to create the device
structure emulating the actual process flow. Process parameters (e.g.,
implantation dosage, anneal temperature, etc.) used in this simulation were
taken from previous experimental work on junctionless transistor [1]. Process
24

simulated structure was then used in Device simulations to characterize device
behavior. Detailed considerations were taken to account for confined device
geometry, nanoscale channel length, surface and secondary scattering effects.
(ii) Circuit Simulation: The TCAD simulated device characteristics were used to
generate an HSPICE compatible behavioral device model. Circuit mapping
into Skybridge fabric and interconnection followed similar fabric’s design rules
and guidelines, which had been proposed in Skybridge fabric[1].And based on
the circuit’s 3-D layout, capacitance calculations for Coaxial routing structures
were according to the methodology in[16], and resistance calculations were
according to the ASU PTM interconnect model [16]. The PTM model [16] was
also used for metal routing RC and coupling capacitance calculations.
5.2 Benchmarking Results and Scalability
5.2.1 Benchmarking of 4-bit Carry Look-Ahead Adder
CLA is well-known parallel adder for fast computation. It consists of
propagate-and-generate,

carry,

buffer

and

summation

blocks.

The

propagate-and-generate block is used to produce intermediate signals Pi and Gi (where
i = 0 to 3), which are used for calculating Sum and Carry respectively; the logic
expressions used are Pi = (Ai ⨁Bi ) , Gi = Ai  Bi .

The carry block is used to

compute intermediate carry signals and final carry output. The logic expression for
carry generation is Ci = Gi−1 + Pi−1  Ci−1 , where ‘i’ is from 1 to 4. The buffer block
is used to buffer a signal and maintain signal integrity. The sum block generates the
final sum output using the intermediate Pi and Ci signals; the logic expression is
25

Si = Ai ⨁Bi ⨁Ci = Pi ⨁Ci .

p-type logic
n-type logic

One stage
P
4bits P G

Ai

P
Inv
P G

4bits Carry

C

4bits Sum

C

+

=

Inv

Inv

S0-3

S0-3

G i = ( A i + Bi )
i

Figure.5.1 Block diagram of 4-bit CLA
A block diagram of a 4-bit CLA is shown in Figure. 5.1; The CLA is implemented
with combinational-rail logic, and every two cascaded stages are implemented by two
different types (n- and p-) of logics. In order to implement the CLA with single-rail
logic style, each stage is followed by inverters which generate complementary signals.
Such a scheme reduces the number of stages to one compared with three stages’
implementation in Skybridge fabric. In addition, because the circuit’s operation is
static-like and all stages are operated in one clock period, it does not require any
buffers for storing minterms between stages. This results in significant reduction of
buffers which causes large overhead in Skybridge fabric.
5.2.2 Results of Benchmarking
Table. 5.1 shows the evaluation results of 4-bit carry look-ahead adder. Obviously,
NP-Dynamic-Skybridge has tremendous benefits in key metrics over Skybridge fabric.
NP-Dynamic-Skybridge achieves 4x latency benefits over Original SB single-rail
implementation and it has at least 17% power/throughput benefit. In addition, over 2x
26

density improvement is achieved in NP-Dynamic-Skybridge because of its single-rail
static-like logic implementation with reduction of buffers. However, there is
degradation in throughput due to less pipelining stages and longer clock period for
evaluation.
Table.5.1 Evaluation results of 4-bit CLA
4-bit CLA

Latency
(ps)

Power
(uW)

Area(um2)

Throughput

Throughput/
Power(Ops/J)

Original SB
(Dual-rail)

96

22.1

0.76

10.4E+9

4.7E+15

Original SB
(Single-rail)

192

13.4

0.41

7.8E+9

5.8E+15

NP-Dynamic-SB

50

9.5

0.36

6.6E+9

6.9E+15

5.3 Chapter Summary
This chapter mainly presented the initial evaluation results of benchmarking circuit
for NP-Dynamic-Skybridge fabric. It showed how much benefit in key metrics we can
achieve compared with Skybridge fabric. And we initially proved our proposed
approach is a good extension of Skybridge 3-D fabric.
In next section, we will present the proposed future work and what we should do to
make the new fabric to be more mature.

27

CHAPTER 6
FAN-IN ANALYSIS AND SCALABILITY

High fan-in logic is a well-known driver for compact circuit designs. Since they
have fewer transistors and interconnects, they are advantageous for improving both
density and power consumption. However, high fan-in circuits are not widely used
due to their detrimental impact on performance compared to low fan-in cascaded
designs. The performance degradation is particularly severe in CMOS, where the
circuit style requires complementary devices, and the devices have to be differently
sized, which adds to load capacitance, and thus lowers the performance. However, in
Skybridge fabric, the logic is implemented with dynamic circuit style and only single
type uniform transistors are used for compact 3-D layout design, which helps to
reduce output load capacitance of each gate and allow high fan-in gate design.
In this chapter, we try to show, in NP-Dynamic Skybridge, high fan-in gate can also
be used due to its similar dynamic circuit style with Skybridge. In addition, the
scalability, which indicates the tendency of key metrics along with circuits’ scale
going up, is studied by benchmarking 8-bit CLA.
6.1 Fan-in Analysis
6.1.1 Evaluation Methodology
In order to evaluate the feasibility of high fan-in gate logic in NP-Dynamic
Skybridge, we carried out the fan-in sensitivity analysis using NAND gate( series of
n-type transistors) and NOR gate ( series of p-type transistors) as circuit samples.
28

A)

B)

Vdd

PRE

Vdd

C)

EVA

Vdd

out
In[0]

In[0]

n-type

In[1]

out

p-type

In[1]

In[0]

In[1]
In[m]

In[m]
out

EVA

In[m]

PRE
GND

GND

GND

Figure.6.1 Example gate for fan-in sensitivity analysis: A) Schematic of NAND
gate in NP-dynamic Skybridge (same as Skybridge) and B) Schematic of NOR gate in
NP-dynamic Skybridge and C) Schematic of NAND gate in CMOS design
Figure.6.1A and Figure.6.1B show the schematic of elementary NAND and NOR
gates respectively. TCAD generated V-GAA Junctionless device model (see chapter 3)
were used to build circuit’s netlist for HPSICE simulation,. Similarly, for CMOS
baseline, equivalent NAND gate circuit (Figure. 6.1C) was built by using 16nm
tri-gate high-performance PTM device models. The output node of both CMOS and
NP-Dynamic Skybridge gates were connected with load capacitances which are
equivalent to fan-out of 4 inverters in respective designs. The worst-case delay was
measured in the valid falling edge (90%VDD to 10%VDD) of the output node.
6.1.2 Fan-in Sensitivity Analysis
After doing circuits simulation with the methodology discussed in last section, the
fan-in sensitivity for 16nm CMOS, Skybridge and NP-Dynamic Skybridge are carried
out and shown in Figure. 6.2. Obviously, as fan-in number goes up, the normalized
gate delay of NP-Dynamic Skybridge has similar increasing tendency as Skybridge.
This is mainly determined by the dynamic circuit style used in NP-Dynamic
29

Skybridge. By contrast, the CMOS NAND gate is built by using static logic style as
shown in Figure. 6.1C which means as fan-in number goes up, the load capacitance of

Figure.6.2 Fan-in sensitivity curves
output node increases linearly due to more and more drain capacitances from parallel
p-type transistors of pull-up network. Therefore, for CMOS static circuits, the gate
delay suffer from both increased output load capacitance and raised total resistance in
pull-down network, which results in quadratic increasing of CMOS gate delay as
fan-in number goes up. Both Skybridge and NP-Dynamic Skybridge use dynamic
circuit style to avoid the increased load capacitance from pull-up network and thus
achieve linear fan-in sensitivity.
Generally, for standard CMOS design, people limit the max fan-in to 4 due to its
high sensitivity to gate delay. As shown in Figure. 6.3, we determine the max fan-in
number of NAND and NOR gate based on similar normalized delay used for CMOS
NAND gate’s fan-in constrain.
30

NP-D-SB P-gate
Fan-in=7

CMOS Fan-in=4

NP-D-SB N-gate
Fan-in=8

Figure.6.3 Determination for maximum fan-in for NP-Dynamic
6.2 Scalability and Larger Scale Benchmarking
6.2.1 Scalability Study
High fan-in gate drives compact layout design which results in benefits of power
and performance. As circuits’ scale goes up, Skybridge fabric achieves more benefits
in power and throughput compared with conventional 2-D CMOS fabric by using
high fan-in gates. As we discussed in last chapter, in NP-Dynamic Skybridge, we can
also implement logics by using high fan-in gate. So this helps to keep achieving
benefits in key metrics over conventional CMOS for the large scale benchmarks.
6.2.2 Benchmarking for Larger Scale Design
We show the scalability of NP-Dynamic-fabric by benchmarking larger scale
circuit. We built 8-bit CLA and compared its benchmarking results with previous 4-bit
CLA’s benchmarking results. As shown in the Figure. 6.4, when scaling up from 4-bit
to 8-bit, the improvements in latency and power of NP-Dynamic fabric rises up due to
31

more reduction of buffers. And also there is less degradation in throughput because in
NP-Dynamic-Skybridge fabric, the throughput is determined by total evaluation time
of all stages but not the delay of critical stage which increases linearly as circuit’s
scale goes up.

Figure.6.4Evaluation results of 4-bit and 8-bit CLAs
6.3 Chapter Summary
This chapter mainly presented the logic implementations by using high fan-in gate
in NP-Skybridge fabric similarly with Skybridge fabric. In addition, by benchmarking
larger scale circuits, 8-bit CLA, we showed that the proposed new fabric can achieve
improvement in key metrics as Skybridge when using high fan-in gate. However, in
Skybridge fabric, any given circuit is divided into several micro pipelined stages, so it
has relatively higher throughput than the circuit with NP-Dynamic Skybridge fabric.
In next chapter, a 4-bit microprocessor benchmark will be presented which has
optimized pipelining scheme for improving throughput.

32

CHAPTER 7
WIRE STREAM PROCESSOR BENCHMARKING

In this chapter, a 4-bit wire stream processor (WISP-4) [22] is shown. This
microprocessor is built at transistor level, and functionally verified at the circuit level.
The WISP-4 processor design uses a load-store architecture, which is common in
modern RISC processor designs. And it is a five-stage design and composed of
program counter (PC), read-only-memory (ROM), register file (REG), arithmetic
logic unit (ALU) and write back (WB). Design of all logic and memory circuits for
processor follow the NP-Dynamic Skybridge’s circuit styles (see Chapter 4). Circuit
placements and layouts are in accordance to the NP-Dynamic Skybridge fabric design
rules and guidelines (see Chapter 3). Additionally, a new pipelining scheme is
proposed. Compared with the pipelining scheme in Skybridge fabric, the operation
frequency of each stage increases and thus the computation throughput is improved.
Using the bottom-up evaluation method mentioned in chapter 5, simulations were
carried out to validate WISP-4 design and show its potential against equivalent
Skybridge and CMOS implementations. The benchmarking results show that
NP-Dynamic has advantageous benefits in key metrics over Skybridge and CMOS for
large scale circuits.
7.1 Optimized Pipelining scheme
7.1.1 Pipelining Scheme in Skybridge Fabric
In Skybridge fabric, dynamic circuit style is used to implement logics and all logics

33

gates are built with n-type transistors. In order to avoid typical monotonicity problem
in cascaded n-type logics, the whole pipeline is built with several micro stages and
each stage’s logic evaluation is controlled by one single clock. This results in
successive and separately evaluated stages and also each stage has one phase holding
output value for next stage’s evaluation. Therefore, each stage has totally three
clocking phases ‘precharge’, ‘evaluate’ and ‘hold’. For the single-rail design, inverters
are inserted between each two stages to generate complementary signals. The stage,
which provides true output signals, thus costs one more hold phase to wait for the
generation of complementary signals in the followed inverters. So this is a four phases’
pipeline design with phases ‘precharge’, ‘evaluation’, ‘hold1’ and ‘hold2’. The timing
of dual-rail (three phases) and single-rail (four phases) pipeline is shown in Figure.
7.1.

A)

B)

Figure.7.1Pipelining Scheme of Skybridge: A) Timing of pipeline of single-rail
design and B) Timing of pipeline of dual-rail design

34

7.1.2 Proposed Pipelining Scheme of NP-Dynamic Skybridge
In order to improve throughput, we propose a new pipelining scheme. Here, each
stage is executed with two phases, ’precharge’ and ‘evaluate’, which results in more
frequent operations in each stage. The latches which are specified for dynamic circuits’
pipeline [23] are used between each two stages to store output results. The circuit
design is shown in Figure. 7.2. During the evaluation phase, the latches are enabled by
signal {Eva Evab} and the output results go through latches. After the evaluation
phase, the latches are turned off and hold the output results for the evaluation of next
stage. Therefore, for each stage, after evaluation, it can be precharge again
immediately without waiting and holding results for its next stage. The timing and
circuits is shown in Figure. 7.2 in detail.

A)

B)

Figure.7.2Pipelining Scheme of NP-Dynamic Skybridge: A) Schematic design
of latch and B) Timing of pipeline of NP-Dynamic Skybridge
7.1.3 Timing and Clock Optimization
Based the optimized pipelining scheme, the initial timing design for WISP-4
35

pipeline is shown in Figure. 7.3. Each stage is controlled by either clock set
‘pre1&eva1’ or ‘pre2&eva2’, and two cascaded stages are controlled by different
clock sets.
However, using two separated clock sets is contrary to the purpose of proposed
NP-Dynamic Skybridge fabric. As we discussed in chapter 3, in Skybridge fabric, due
to functional monotonicity problem in cascaded n-type dynamic logics, multiple sets
of ‘precharge and evaluate’ clocks are used to synchronous and control the separated
stages of the pipeline, which results in complex clock system design. And in the
proposed NP-Dynamic Skybridge fabric, functional monotonicity problem can be
avoided by cascading two types (n and p) of logics to implement circuit function.
Only one set of clock (‘precharge’ and ‘evaluate’) is used to control the all stages of
any given circuit. In order to simplify the clock system, we do further optimization of
timing as shown in Figure. 7.3. Noticeably, the clock pre1 overlaps eva2, and
similarly eva1 overlaps pre1. By compressing the same clocks together, we simplified
the clock system to one clock set ‘CLK1 and CLK2’ as shown in Figure. 7.4 and
Figure. 7.5.

36

Evab 2

Pre1 Eva1
Eva2
Evab2
pre2 eva2

REG
ID

Latch 2

Preb 2
Evab 2

Latch 3 Eva1
Evab1
Preb1
Latch 1

Eva 1

IF
Preb 1
Evab 1

Pre1

Eva1

Evab1

Preb2

Evab2

Evab2

Preb1`

WB
ALU

Pre1 Eva1

Eva2
Eva2
Pre2

Latch 4

Figure.7.3Timing of NP-Dynamic Skybridge

A)

B)

Figure.7.4Timing of NP-Dynamic Skybridge: A) Pipeline timing with two clock
sets and B) Optimized pipeline timing with one clock set

37

Eva2

Preb2

Latch 1

Evab1

ID

Latch 2

Preb 2
Evab 2

pre2 eva2

CLK1

Evab2

Pre1

CLK1

CLK2

CLK2b

ID
CLK1 CLK2

Evab 2

Eva1

CLK2

ALU

IF

CLK2 CLK1b CLK2b

CLK1b

CLK2b
CLK1b
Latch 3 CLK1
CLK1b
CLK2b

Latch 2

CLK1b

Preb1`

REG

Eva2 Evab2

Latch 1

Evab2

Latch 3 Eva1
Preb1
Evab1

CLK2b
CLK 1

WB

Latch 4

CLK1b
CLK2b

Pre1 Eva1

ALU

IF

Eva 1

Eva2

Latch 4

Preb 1
Evab 1

Pre2

Eva1

Pre1

CLK2 CLK2b

WB
CLK1

CLK2

REG
CLK1 CLK2

Figure.7.5Clock Optimization
7.2 WISP-4 Benchmarking
7.2.1 WISP-4 Circuit and Architecture
The architecture of WISP-4 is shown in Figure. 7.6. It has five pipeline stages:
Instruction Fetch, Decode, Register Access, Execute and Write Back. During
Instruction Fetch, an instruction is fetched from ROM and is fed to instruction
decoder. In Instruction Decode, the fetched instruction is decoded to generate control
signals, and to buffer the register addresses and data. In the next stage, buffered data
is stored in register file and prepared for sequential execution in the Execute stage.
After ALU operations in the Execute stage, results are stored in the register file during
38

Write Back. The synchronization of pipeline stages is maintained through micro
pipelining of logic blocks at each stage; this is possible, since all logic block
implementation is through the Skybridge logic style, which uses clock signals as
control inputs.

The instruction fetch unit consists of a program counter (PC) and a ROM. The PC
is a 4-bit binary up counter that is used to continuously increment the instruction
address every clock cycle. This implementation uses a 4-bit CLA; one of its inputs is
constant ’1’, and another is the result of previous calculation. The result of PC is fed
to a 4:16 decoder to select one of the 16 rows from the instruction ROM. The ROM
stores a set of instructions to be executed and has a total capacity of 16x9bits in this
prototype. The output of ROM is a 9-bit instruction and contains 3-bit operation
instruction (opcode), two 2-bit source/destination register addresses or 4-bit data.
Circuit-level implementation of these processor units follows the Skybridge circuit
style. Both Compound and cascaded dynamic logic styles are combined for efficient
implementations. 4-bit CLA and HSPICE validations were shown in Chapter 5; in this
section we show the core supporting circuits.

39

Figure.7.6Architecture Block Diagram
7.2.2 Benchmarking Results
The WISP-4 benchmark was built in transistor-level based on the architecture we
present in last section. By following the methodology we discussed in chapter 5, we
did RC extraction from layout and wrote circuit netlist with extracted RC information
for HSPICE simulation. Figure. 7.6 shows the benchmarking results of NP-Dynamic
Skybridge, Skybridge and 16nm CMOS fabric. Since the pipelining scheme is
optimized to two phases’ timing for each stage, the NP-Dynamic Skybridge achieves
1.1x benefits in throughput over Skybridge Fabric. For the power and
throughput/power, which are related with circuits’ total interconnect capacitance,
NP-Dynamic Skybridge fabric shows at most x1.5 improvement due to its single-rail
circuits’ implementations. In addition, NP-Dynamic Skybridge has x2 better density
because of its single-rail circuit design and reduced overhead of buffers as we
40

discussed in chapter 2.

Figure.7.7WISP-4 Benchmarking Results
7.3 Chapter Summary
In this chapter, we mainly showed how we implement WISP-4 benchmark with
optimized pipelining scheme and the benchmarking results. Based on the
benchmarking results, it was shown that NP-Dynamic Skybridge has benefits in key
metrics over Skybridge fabric. Additionally, NP-Dynamic-Skybridge processor was
implemented with simplified clock system in comparison to Skybridge. Therefore, it
can be concluded that the proposed new fabric carries out a good extension of
Skybridge 3-D fabric and makes contribution to 3-D integrated circuits.

41

BIBLIOGRAPHY
[1]

M. Rahman, S. Khasanvis, J. Shi, M. Li, C. A. Moritz. "Skybridge: 3-D
Integrated Circuit Technology Alternative to CMOS." Nature Nanotechnology
Under Review, 2014.

[2]

H. lwai. "Roadmap for 22 nm and beyond." INFOS 86 (2009): 1520-1528.

[3]

K. Kim, K. K. Das, R. V. Joshi, C. Chuang. "Nanoscale CMOS Circuit Leakage
Power Reduction by Double-Gate Device." ISLPED, 2004, 102-107.

[4]

J. Warnock. "Circuit Design Challenges at the 14nm Technology Node." DAC,
2011, 464-467.

[5]

K. E. Moselund, P. Dobrosz*, S. Olsen*, V. Pott, L. De Michielis,D. Tsamados, D.
Bouvet, A. O'Neill* and A. M. Ionescu. "Bended Gate-All-Around Nanowire
MOSFET: a device with enhanced carrier mobility due to oxidation-induced
tensile stress." IEDM, 2007, 191-194.

[6]

N.l Weste, D. Harris,. CMOS VLSI Design: A Circuits and Systems Perspective,
Boston: ADDISON WESLEY, 2011.

[7]

P. Batude, M. Vinet, A. Pouydebasque, C. Le Royer, B. Previtali, C. Tabone, J.-M.
Hartmann, L. Sanchez,L. Baud, V. Carron, A. Toffoli, F. Allain, V. Mazzocchi, D.
Lafond, O. Thomas, O. Cueto, N. Bouzaida, D.Fleury,A. Amara, S. Deleonibus
and O. Faynot. "Advances in 3D CMOS Sequential Integration." IEDM,2009.

[8]

P. Batude, M. Vinet, C. Xu, B. Previtali, C. Tabone, C. Le Royer, L. Sanchez, L.
Baud, L.Brunet, A. Toffoli, F. Allain, D. Lafond, F. Aussenac, O. Thomas, T.
Poiroux and O. Faynot. "Demonstration of low temperature 3D sequential FDSOI
integration down to 50 nm gate length." VLSIT, 2011, 158-159.

[9]

A. Kranti, R. Yan, C.-W. Lee, I. Ferain, R. Yu, N. Dehdashti Akhavan, P. Razavi,
JP Colinge. "Junctionless nanowire transistor (JNT): Properties and design
guidelines." ESSDERC, 2010, 357-360.

[10]

K. Choi, et.al. "The Effect of Metal Thickness, Overlayer and High-k Surface
Treatment on the Effective Work Function of Metal Electrode ." ESSDERC, 2005,
101-104.

[11]

P. Jiang, et.al. "Dependence of crystal structure and work function of WNx films
on the nitrogen content." Applied Physics Letters, 2006, 122107 -122107 -3.

[12]

C. Hu. Modern Semiconductor Devices for Integrated Circuits. N.J: Upper Saddle
River, 2010.

42

[13]

W. B. Nowak, R. Keukelaar, W. Wang, and A. R. Nyaiesh. "Diffusion of nickel
through titanium nitride films." JVST, 1985, 2242-2245.

[14]

Rosler, R. S., Mendonca , et.al. "Tungsten chemical vapor deposition
characteristics using SiH4 in a single wafer system." JVST, 1988, 1721-1727.

[15]

Koren, I. Computer Arithmetic Algorithms. MA: A. K. Peters, 2002.

[16]

Milovanovic, A. & Koprivica, B. Analysis of square coaxial lines by using
Equivalent Electrodes Method. Nonlinear Dynamics and Synchronization (INDS)
& 16th Int'l Symposium on Theoretical Electrical Engineering, 2011

[17]

Arizona Stage University.
<www.ptm.asu.edu>.

[18]

Y. Taur, Tak H. Ning. Fundamentals of Modern VLSI Devices. N.Y: Cambridge
University Press, 2009.

[19]

T. Solankia, N. Parmar. "A Review paper: A Comprehensive study of Junctionless
transistor." NCIPET, 2012.

[20]

Arizona State University. PTM R-C Interconnect models. <http://ptm.asu.edu/>
(2012)

[21]

E. J. Nowak, I. Aller, et.al.,. "Turning Silicon on Its Edge." IEEE Circuits and
Devices Magazine, 2004, 20-31.

[22]

C. A. Moritz, T. Wang. "Towards Defect-Tolerant Nanoscale Architectures."
IEEE-NANO, 2006, 331-334.

[23]

N. F. Goncalves, Hugo. J.De . “NORA: A Racefree Dynamic CMOS Technique
for Pipelined Logic Structures" Solid State Circuits, 1983.

PTM-MG

43

device

models

for

16nm

node,

