The development and performance evaluation of PIF logic functional blocks by Sonar-Pardeshi, Sheetal Suresh
Rochester Institute of Technology 
RIT Scholar Works 
Theses 
1-2005 
The development and performance evaluation of PIF logic 
functional blocks 
Sheetal Suresh Sonar-Pardeshi 
Follow this and additional works at: https://scholarworks.rit.edu/theses 
Recommended Citation 
Sonar-Pardeshi, Sheetal Suresh, "The development and performance evaluation of PIF logic functional 
blocks" (2005). Thesis. Rochester Institute of Technology. Accessed from 
This Thesis is brought to you for free and open access by RIT Scholar Works. It has been accepted for inclusion in 
Theses by an authorized administrator of RIT Scholar Works. For more information, please contact 
ritscholarworks@rit.edu. 
The Development and Performance Evaluation of PIF 
Logic Functional Blocks 
By 
Sheetal Suresh Sonar-Pardeshi 
In Partial fulfillment 
of the 
Requirements for the degree of 




Professor D. Patru -------------------------------------------------
(Dr. DORIN PATRU, Thesis Advisor) 
Professor P. R. Mukund -------------------------------------------------
(Dr. P R MUKUND, Thesis Committee Member) 
Professor James E. Moon -------------------------------------------------
(Dr. JAMES MOON, Thesis Committee Member) 
Professor _____________ R_o_b ___ e_r_t _B_o_w __ m ___ a_n __________________ _ 
(Dr. ROBERT BOWMAN, Department Head) 
DEPARTMENT OF ELECTRICAL ENGINEERING 
KA TE GLEASON COLLEGE OF ENGINEERING 
ROCHESTER INSTITUTE OF TECHNOLOGY 
ROCHESTER, NEW YORK, USA 
MAY 2005 
Thesis/ Dissertation Author Permission Statement
Title ofThesis: The Development and Performance Evaluation of PD7 Logic
Functional Blocks
Name ofAuthor: Sheetal Suresh Sonar-Pardeshi
Degree: Master of Science
Major: Electrical Engineering
College: Kate Gleason College ofEngineering
As per current Rochester Institute ofTechnology (RIT) guidelines for completion
ofmy degree, I understand that I need to submit a copy ofmy Master's thesis to the RIT
Archives. I hereby permit RIT and its agents to archive and make use ofmy thesis or
dissertation in whatever forms necessary. I retain the ownership rights to the copyright of





I would like to thank the people who contributed towards the success of this
thesis. I will start with the Supreme Being, GOD, for his blessings. I am very much
grateful and will be so forever to my parents, Dr. Suresh Sonar and Mrs. Nanda Sonar
and my family (Smita, Ashwini and Sahil) for their inspiration and moral support; I owe
my every success to them.
I have learned a lot from Dr. Dorin Patru, my thesis advisor not just about digital
design but also about practical aspects of life. I thank Dr. Patru for many insightful
conversations during the development of the ideas in this thesis, and for helpful
comments on the text. Also for the freedom he has allowed me to make my own
decisions during the design and development work. His intelligence and wisdom is what
has made this thesis possible.
I would like to thank my committee members, Dr. P RMukund and Dr. James
Moon for taking out time from their busy schedule to review my work. Finally, I would




In the deep submicron range of Integrated circuit design, interconnects and not the
logical gates are causing the performance bottleneck. The number of available transistors
increase by a factor of 2 every technology node but interconnects do not scale with
devices, devices scale down faster and thus the present designs need to be scalable and
reusable.
Pipeline interconnect free (PIF) logic methodology has a potential to solve these
current design problems. In PEF logic design methodology the global interconnects are
replaced by chain of logical gates. PIF logic uses only one type ofgate which can be
connected only to the adjacent eight gates making the gate and the interconnect modeling
easier. In order to migrate from one technology node to other, just one PIF cell needs to
be redesigned. The PIF cell in new technology node can replace the present cells thus
making PIF logic based circuits fully reusable.
This thesis implements PIF design methodology to develop two libraries
consisting of combinational and sequential functional blocks such as adder, shift
registers, multiplexers, decoders and encoders. The performance of these functional
blocks is compared with the standard cell implementation with respect to the quality







Chapter 1: Introduction 1
1 . 1 Scope ofResearch 2
1 .2 Organization ofThesis 2
Chapter 2: Background work 4
2.1 Global Interconnect 4
2 . 2 PowerManagement 5
2 . 3 Scaling and reusability 6
2.4 Device and Process modeling 6
2.5 Non-CMOS device 7
2.6 Yield Enhancement 7
Chapter 3: PD7 logic design methodology and few examples 9
3. 1 PIF logic definition 9
3.2 Introduction ofPIF cells 10
3.3 PU logic design rules 13
3.4 PIF logic implementation ofBasic gates 14
VI
3.5 PIF logic performance analysis 18
3.6 Conclusion 20
Chapter 4: Multiplexer based PIF logic design 21
4.1 PIFNAND design limitations 21
4.2 Need for a new PIF cell 22
4.3 Why transmission gate topology 23
4.4 PIF multiplexer topology 25
4.5 Layout of4-to-l multiplexer 28
4.6 Examples ofPIF logic implementation 30
4.6. 1 Multiplexer 30
4.7 Performance assessment 34
4.7.1 Power dissipation calculation 35
4.7.2 Propagation delay measurement 36
4.8 Simulation results of2-to-l multiplexer 38
4.9 Comparison ofQuality metrics for of2-to-l multiplexer 40
4.10 Multiplexer 4-to-l 45
4.11 Functional Verification and Simulation Results of4-to-l Multiplexer 48
4.12 Multiplexer 8-to-l 52
4.13 Functional Verification and Simulation Results of 8-to-l Multiplexer 56
4.14. One bit full adder design 61
4.15 Functional Verification and Simulation Results of 1 Bit full adder 65
4.16 Conclusion 70
Vll
Chapter 5: Encoders 71
5.1 Standard cell 4-to-2 encoder 71
5.2 PIF logic 4-to-2 encoder 73
5.3 Functional Verification and Simulation Results of4-to-2 encoder 75
5.4 Encoder 8-to-3 80
5.5 Functional Verification and Simulation Results of 8-to-3 encoder 84
5.6 Conclusion 89
Chapter 6: Decoders 90
6. 1 Standard cells 2-to-4 decoder 90
6.2 PIF logic 2-to-4 decoder 92
6.3 Functional Verification and Simulation Results of2-to-4 decoder 95
6.4 Decoder 3-to-8 100
6.5 Functional Verification and Simulation Results of2-to-4 decoder 106
6.6 Conclusion 111
Chapter 7: Shift registers 112
7.1 Master-slave D flip flop 112
7.1.1 D flip flop with preset and clear 114
7.2 1 bit shift register 115
7.3 Functional Verification and Simulation Results of 1 Bit Shift register 119
7.4 2 bit shift register 124
vm
7.5 Functional Verification and Simulation Results of 2 Bit shift register 128
7.6 4 bit shift register 134
7.7 Functional Verification and Simulation Results of 8 Bit shift register 138
7.8 Conclusion 143
Chapter 8: Feasibility of PIF logic circuits in current and future technologies 144
8.1 Throughput in PIF logic based circuits 144
8.2 Power dissipation in Current and future technologies 146
Power dissipation in current technologies due to interconnects 146
Power dissipation in future technologies due to interconnects 148
Power dissipation in future technologies due to leakage currents 150
8.3 Propagation delay prediction in future technologies 152
8.4 Conclusion 1 52
Chapter 9: Conclusion and future work 154
9. 1 Summary 1 54
9.2 Future work 155
References 157
Appendix A; Short forms used in the write up 160
Appendix B; Simulation Results 161
Appendix C; PD7 design layouts 197
IX
Table of Figures
Figure (3.2.1): PIF multiplexer as two input AND gate 10
Figure (3.2.2): PIF multiplexer as two input OR gate 1 1
Figure (3.2.3): PIF multiplexer as two input NOR gate 11
Figure (3.2.4): PIF multiplexer as two input XORgate 12
Figure (3.2.5): PEF multiplexer as two input XNOR gate 12
Figure (3.4.1): NOT gate, PIF NAND implementation 14
Figure (3.4.2): NOT gate, PIF multiplexer implementation 15
Figure (3.4.3): NOT gate, PIF multiplexer implementation 15
Figure (3.4.4): AND gate, PIF NAND implementation 16
Figure (3.4.5): OR gate, PIF NAND implementation 16
Figure (3.4.6): NOR gate, PIF NAND implementation 17
Figure (3.4.7): XORgate, PIF NAND implementation 17
Figure (3.4.8): XNOR gate, PIF NAND implementation 18
Figure 4.3. 1 : Transmission Gate 24
Figure (4.4.1): Transmission gate based 2-to-l multiplexer 26
Figure (4.4.2): Transmission gate based 4:1 Multiplexer 27
Figure (4.5.1): Stick diagram for PIF 4-to-l multiplexer 28
Figure (4.5.2): PIF multiplexer layout in 0.25 u technology 29
Figure (4.6.1): Standard cell based 2-to-l Multiplexer 31
Figure (4.6.2): PIF NAND based 2-to-l Multiplexer 32
























4.10.1): Standard cell based 4-to-l Multiplexer 45
4.10.2): PIF NAND based 4-to-l Multiplexer 46
4.10.3): PIF multiplexer based 4-to-l multiplexer 47
4.12.1): Standard cell based 8-to-l Multiplexer 53
4.12.2): PIF NAND based 8-to-l multiplexer 54
4.12.3): PIF multiplexer based 8-to-l multiplexer 55
4.14.1): One bit full adder, standard cell implementation 62
4.14.2.1): One bit full adder, PIF NAND implementation 63
4.14.2.2): One bit full adder, PIF multiplexer implementation 64
5.1.1): Encoder 4-to-2, standard cell implementation 72
5.2.1): Encoder 4-to-2, PIF NAND implementation 73
5.2.2): Encoder 4-to-2, PIF Multiplexer implementation 74
5.4.1): Encoder 8-to-3 standard cells implementation 81
5.4.2): PIF NAND based Encoder 8-to-3 82
5.4.3): PIF multiplexer based Encoder 8-to-3 83
6.1.1): Standard Cells Decoder 2-to-4 92
6.2.1): Decoder 2-to-4; PIF NAND 93
6.2.2): Decoder 2to4; PIF Multiplexer 94
6.4.1): Decoder 3-to-8, standard cell implementation 102
6.4.2): Decoder 3-to-8, PIF NAND implementation 104
6.4.3): Decoder 3-to-8, PIF Multiplexer implementation 105
7.1.1): D Latch 112
7.1.2): Master-Slave D flip flop 113
XI
Figure (7.1.3): Master-Slave D flip flop with Preset and Clear signals 115
Figure (7.2.1): IBit shift register Standard cell 116
Figure (7.2.2): PIF multiplexer 1 Bit shift register; logical depth 12 117
Figure (7.2.3): PIF NAND 1 Bit shift register; maximum logic depth 17 118
Figure (7.4.1): 2Bit shift register Standard cell 125
Figure (7.4.2): 2 Bit shift register, PIF NAND implementation 126
Figure (7.3.3): 2Bit Shift register, PIF multiplexer implementation 127
Figure (7.6.1): Shift register 4 bit, Standard cell implementation 134
Figure (7.6.2): 4 Bit Shift Register PIF NAND 136
Figure (7.6.3): 4 Bit Shift Register PIF Multiplexer 137
Chapter 1: Introduction
The integration density and performance of integrated circuits have gone through
a revolution. The device size is shrinking and as a result, the number of transistors
integrated on a single die is increasing. This makes the modeling ofdevices and
interconnects a difficult task. The designs produced suffer from poor performance due to
inaccurate assumptions and invalid design decisions made during design simulation. The
main sources of the various problems are due to or related to global interconnects, and
power management. Furthermore due to the reduction in device size, the designs need to
be scalable and reusable to be used in the new technology node.
A lot of research has been done and is still going on to find solutions to these
problems, such as multi threshold devices, organic devices, 3 dimensional interconnects,
interconnect optimization, delay uncertainty analysis, selective powering schemes.
However, these solutions tend to work for a special problem under special circumstances
(as can be understood by the names).
The pipelined interconnect free (PIF) logic design methodology attempts to
provide an approach which addresses global interconnects, power dissipation, scalability
and reusability problems. In PIF logic, the circuit interconnects are replaced with a chain
ofgates. The resultant circuit does not have any global interconnects except power supply
lines (VDD and GND). The only connections present in the circuit are the local
interconnects between the neighboring gates.
A schematic designed using PIF logic is more structured, in terms of schematic
design and physical layout and therefore is easier to remodel and simulate. This method
produces a design which is further scalable and thus reusable. PIF is based on gate level
design making it device independent and in future could be used for non-CMOS devices.
1.1: Scope ofResearch
The goal of this research is to compare the performance ofPIF logic design
methodology with standard cell implementation. Various functional blocks are
implemented in both the design methodologies and are compared with respect to the
performance quality metrics such as power dissipation, propagation delay and layout
area.
Three different libraries are built, 2 based on PIF logic and 1 on standard cell. The
functional blocks designed are multiplexers, one bit full adder, encoders, decoders and
shift registers. The PIF library consists of 1 1 schematics, 1 1 power schematics (the
circuits modified to obtain power consumption results) and 1 1 layouts of the above
mentioned blocks. The standard cell based library consists of 1 1 schematics and the
layout area is calculated from the available library. Altogether 88 functional blocks are
designed (schematic and layout) for this research. The designs are implemented in
0.25um technology using Cadence IC design tools.
1.2: Organization ofThesis
Chapter 2 describes the current design challenges and the solutions proposed by
others as the background work. Chapter 3 presents PIF logic design rules and explains
how it can provide a solution to some of the current design problems discussed in
Chapter 2. Chapter 4 talks about a new PIF cell and the logical reasons for its choice. It
also explains the design of two library components, multiplexer and one bit full adder, in
PIF and standard cell logic implementation.
Chapter 5 and 6 explain the design and comparison ofEncoders and Decoders
respectively. Chapter 7 deals with the sequential logic implementation ofPIF design, the
shift registers and their comparison. Chapter 8 concludes with a summary and
recommendations for future work.
Chapter 2: Background Work
The design challenges such as global interconnects, power management, scaling
and reusability, device and process modeling are discussed in detail in this chapter along
with some of the solutions proposed by other researchers. The next chapter will explain
the potential ofPIF logic methodology to provide a solution to these current design
problems.
2.1: Global Interconnect
For efficient and predicTable implementation of a logic circuit, interconnect
planning is one of the major challenges pointed out by International Technology
Roadmap for Semiconductors (ITRS) 2004 edition. With the increase in the number of
transistors per die, the interconnect length is also increasing. Interconnect delays have
become greater than gate delays causing synchronization problems [12]. With increase in
wire length and decrease in wire width, the interconnect resistance is increasing and
making it to behave like a transmission line causing delay and data errors in the circuit
[6]. As a consequence, interconnect analysis becomes more complex and correct
interconnect modeling is becoming more and more difficult.
Significant work has been done and continues to be performed to overcome the
interconnect and it's modeling problem. Three dimensional interconnects are used to
decrease the length of the global interconnect and thus reducing its parasitic resistance
and capacitance [7]. Interconnect centric design approach is used for gigascale system-
on-a-chip to obtain an integrated architecture for global interconnects [8]. Interconnects
are shielded in order to reduce cross talk noise and signal delay uncertainty [14].
Repeaters are inserted into the global signals interconnects (e.g. clock signal) to optimize
circuit design performance [9].
2.2: PowerManagement
Power management and the leakage current are the second major challenge stated
by ITRS. The power consumption of a circuit is divided into two categories: when the
devices in the circuit are switching, called dynamic power, and when the devices are not
switching, called static power. During transition or switching, the devices perform useful
work of changing or maintaining the logical values in the circuit. Static power
consumption is caused due to leakage current which flows when the transistor is not
switching or is OFF during which no useful work is done.
In sequential circuits, only some devices are switching or some part of functional
block is performing useful work and the other part is not; that is, it is wasting the power
in the form of static power dissipation. Wise use of power would be reducing static
consumption by minimizing the idle state time and obtaining maximum throughput
during the dynamic or switching operation of the circuit.
The management ofpower is applied at the system level using static and dynamic
power reduction techniques. Static technique includes synthesis and compilation done for
low power design. In dynamic technique, run time behavior is used to manage the supply
voltage to different parts of the circuit; this is called dynamic power management (DPM).
One of the ways to implement DPM is by dynamic voltage scaling which is shutting
down the unused I/O devices [11]. Many computing devices offer multiple supply
voltages in a circuit. DPM can be implemented by selective switching of the devices
which are not performing useful operation to low power or low supply voltage state. Here
the power consumption of the system is reduced and then the power supply to the devices
can be switched back again to high power state when they are performing useful
computation [10].
2.3: Scaling and reusability
At every technology node, the number of available transistors increases by a
factor of 2. With introduction of new materials for the dielectric and metal layers, the
previously designed circuits are no longer useful and need to be redesigned. Interconnects
do not scale with devices, device size changes more rapidly. Thus, there is a need for the
designs to be reusable and scalable with the technology in order to cope with the
changing device sizes. For efficient and low cost designs, scaling and reusability is
necessary and thus is included as one of the major challenges in the ITRS road map.
Therefore, designers need to come up with a methodology that will allow designs
to be scalable and fully reusable for the present and future technology nodes. This will
reduce the time required to develop the previous designs and also the cost involved.
2.4: Device and process Modeling
Introduction ofnew materials and reduction in device size triggers the rapid need
ofnew device models. The simulation tools also need to be updated with the latest device
models, to obtain accurate and reliable designs. The fabrication processes involved also
needs to be updated accordingly and try to minimize the variations caused during
fabrication. A design methodology which would reduce the requirements for device and
process modeling and make remodeling easier is obviously desirable.
2.5: Non-CMOS devices
Every physical quantity has its own limitations and the conventional CMOS
devices are no exception. Thus, people have started to look at other devices like crossbar
latches ofmetal (introduced by Hewlett Packard research group [15]), organic transistors
[17] and quantum logic [18]. The crossbar latch is a bisTable switch with a 2 nanometer
device size. Organic transistors offer a wide range of device structures [16], while
quantum logic makes use of single electron as a transistor. These devices are in the
primitive stage and need to be modeled and characterized in order to be used in designs.
Introduction of these devices also manifests the need for reusable designs.
2.6: Yield enhancement
The cost of an integrated circuit is calculated as [4]:
Cost per IC = Variable cost per IC + (Fixed cost / volume)
where, fixed cost is the effort in time and manpower needed to produce the design,
Volume is the number ICs produced and sold, and
Variable cost = (cost ofdie + cost of die test + cost ofpacking) / final test yield
where cost of die is given as,
Cost of die = (cost ofwafer) / dies per wafer
* die yield
The cost of an IC depends on variable cost because the fixed cost is constant as
the name suggest;, less the variable cost the less is the cost per IC. An IC can be made
8
less expensive by increasing the volume ofproduction which requires high die yield.
Thus, there has always been a keen interest and an attempt to increase the yield. But the
complexity and steps involved in fabricating a design on silicon wafer using optical
masks is increasing.
A new design methodology which uses less number ofmetal layers and thus
reduces the number of steps involved in fabrication is needed. Each layer in the design
requires several processing steps which costs money. If the number of steps required in
the fabrication process is reduced, the process variations and the error causing sources
will also be reduced, thus providing high die yield. For time and cost effective products,
a design methodology with less number of layers will prove beneficial.
Chapter Three: Formulation of the PIF Logic Design Methodology
This chapter deals with the definition ofPIF logic and introduces the PIF logic
design rules. It also explains the potential of the PIF logic design methodology in
providing a solution to some of the current design problems. A few simple gates like
NOT, AND, OR and multiplexer are realized in PIF logic methodology.
PIF was initially formulated by Richard Retanubun [1] in 2003. In [20] Vaibhav
has extended the work with a PIF CAD tool implementation. Interested users can also
refer to their work to further explore the PIF logic design methodology.
3.1 PD7 logic definition
In the PIF logic only one type ofgate is used to implement the entire design. We
will call this the gate or the PIF cell. A gate which can implement any arbitrary function
can be a PIF cell. Examples include universal gates NAND or NOR, which are complete
functions in Boolean algebra - i.e, they can implement any other function and can be used
as PIF cell. For this research work we have used two different gates
1 . 4-to-l Multiplexer, designed as part of this research, addressed as PIF
multiplexer
2. Two input NAND gate (NAND2), Developed by Retanubun [1] as the first
PIF gate implementation, addressed as PIF NAND
The NAND gate is the gate of choice instead of the NOR gate because of small
internal parasitic (diffusion) capacitances. The reason for selection of4-to-l multiplexer
is discussed in detail in Chapter 4.
10
3.2 Introduction ofPIF cell; 4-to-l Multiplexer
This section introduces the PIF cell designed and developed for this thesis and
how it can be used to implement any two variable function. Minimum two input gates are
required for practical purposes, thus the selection of4-to-l multiplexer as a PIF cell.
Multiplexer gate
The output of a multiplexer depends on the logic values of its select line inputs.
PIF logic design methodology makes use of the select lines to provide the input signals
and the data inputs are hardwired to either VDD or GND depending on the logic function
implemented. For example, AND gate implementation using multiplexer is shown in
Figure (3.2.1).
Figure (3.2. 1): PIF multiplexer as two input AND gate















hardwired to GND and
'D3'
to VDD.
Similarly, two input OR, NOR, XOR, XNOR gates can be implemented using 4-to-l





are connected to either VDD or GND exactly in the same






Figure (3.2.2): PIF multiplexer as two input OR gate
HUX4 1
1; (*M**<W*(*M*M*W*

















Figure (3.2.5): PIF multiplexer as two input XNOR gate
Thus, multiplexer can implement any two input variable gate functionality. These
gates can be further used to implement the arbitrary functions as can be seen in the
following chapters.
13
3.3 PIF Logic design rules
1. Make use of only one type ofgate. This gate can be any logic function that can be
used to implement any arbitrary logic function.
2. On a two dimensional plane, one gate can only be connected to the eight adjacent
gates.
3. Output of each gate drives one or two inputs of the adjacent gates. But for better
delay balancing in this pipelined logic, driving two inputs is preferred. For
example, if the output is driving a gate which is in inverter configuration, then it
should drive both the inputs ofPIF NAND gate implementation and both select
lines in case ofPIF multiplexer for load balancing.
4. If output of a gate is driving two different gates, then only one input of each gate
should be connected to the output pin of driving gate and the unused pins on the
loading gates should be connected to VDD in case ofPIF NAND gate and GND
in case ofPIF multiplexer.
5. Only one gate delay is allowed between a signal and its inverted value. In case of
PIF NAND implementation you have to live with this difference, but in case of
PIF multiplexer one additional gate delay can be added in the signal path without
inverting the logic to obtain the same gate delay between a signal and its
complement.
A circuit designed using PIF logic rules is more structured. The PIF logic
methodology circuit structure gives a floor plan of the layout as early as the gate level
design. This feature ofPIF logic design methodology proves to be an advantage in case
of the physical implementation.
14
3.4 PIF logic implementation ofBasic gates






<-<<>:>*< < y > : * *
Figure (3.4.1): NOT gate, PIF NAND implementation
For the Multiplexer, Inverter can be implemented in two ways
i. Input signal is given to both the select lines and data inputs DO and Dl are
connected to VDD and D2 and D3 are connected to GND to get logic
inversion, see Figure (3.4.2).
ii. Input signal is given only to the LSB select signal (SO), Figure (3.4.3) and
DO is connected to VDD and Dl to GND, rest of the input pins are
connected to GND. In this configuration, one 2-to-l multiplexer and 1
transmission gate within the PIF gate are the 2 circuits which are
switching. Ifonly MSB select signal (SI) is used as input, then only one
2-
to-1 multiplexer needs to be switched, saving some peak power
consumption.
15
Figure (3.4.2): NOT gate, PIF multiplexer implementation
Figure (3.4.3): NOT gate, PIF multiplexer implementation
3. AND: Input signals are given to one NAND gate and then it is followed by an
inverter as configured in Figure (3.4.4). In Multiplexer implementation, input
signals are connected to select lines and the data inputs are connected as seen








Figure (3.4.4): AND gate, PIF NAND implementation
4. OR and NOR: If inverted signals are given to NAND gate, it acts as OR gate
providing a proof for De Morgan's theorem
A=A+B
The schematic for NAND is shown in Figure (3.4.5) and for multiplexer in
Figure (3.2.2). For NOR implementation, an inverter is connected at the output as





Figure (3.4.5): OR gate, PIF NAND implementation
5. XOR and XNOR: These gates are implemented using the NOT and OR gates
as discussed above. The resultant circuits are shown in Figures (3.4.7) for
XOR and (3.4.8) for XNOR.
17
C C C 60 B 0 B ? >
.^^ftiiiiiiim
rv~





Figure (3.4.7): XOR gate, PIF NAND implementation
18
Figure (3.4.8): XNOR gate, PIF NAND implementation
3.5 PD7 logic performance analysis
3.5.1 Global Interconnect
The PIF logic design rule allows connection only to the neighboring gates which
means that the global interconnects are no longer required (except VDD and GND) and
only local interconnects are present. Most of the design issues discussed in Chapter two
were due to long interconnects and their effects on circuit performance like increase in
propagation delay. In case of the two adjacent cells connected in PIF logic design
methodology, no large currents flow between them, so a minimum width wire can be
used. For power supply lines VDD and GND, the placement of cells is such that the VDD
and GND of each cell overlaps with VDD and GND ofother cell respectively. The power
supply lines are placed using the top metal layer, thus not disturbing the metals used for
local interconnects. The other important global signal, clock, is routed using the PIF cells,
19
eliminating the long interconnect and balancing the delay through the pipelining ofPIF
cells.
3.5.2 Power management
By using the same type of cell, the power density is uniformly distributed over the
PIF block. Due to the organization of the PIF logic and the intrinsic pipelining, the gates
are utilized more intensely, reducing the time in which there is no or very little switching
activity. More switching of a gate means more dynamic power consumed, but that power
is being utilized for computation.
3.5.3 Scaling and reusability
Due to the technological changes, a circuit needs to be redesigned making the
previous designs less useful. This makes scaling and reusability desired properties of a
new design. PIF logic circuits have these properties inherently because the design uses a
single PIF cell. For a design to migrate from one technology to another only one cell has
to be redesigned and the cells in the previous circuit can be replaced by the new one.
This saves development effort and implicitly time to market.
3.5.4 Non-CMOS devices
The PIF logic design methodology is based on a functional gate and is device
independent; it can be easily implemented in non-CMOS devices. Same design rules need
to be followed and the layout rules will change according to the device and technology
20
rules. But again only one cell needs to be designed and used to replace the CMOS based
functional gates.
3.5.5 Yield enhancement
The PIF logic helps to enhance the die yield by reducing the number ofpossible
fault sources. Only two metals are needed for local interconnections. Reducing the
number ofmetal layers also reduces the steps involved in the manufacturing process.
Decreasing the number of steps also reduces the manufacturing cost.
3.6 Conclusion
This chapter introduced PIF design rules and gave some examples ofhow simple gates
can be implemented in PIF logic design methodology. The current design challenges
identified by ITRS, mentioned in Chapter 2 were discussed in terms ofPIF logic design
methodology.
The following chapters provide the various circuits designed for the PIF libraries
and their comparison with respect to the circuits implemented using standard cell library.
21
Chapter 4: Multiplexer Based PD7 Logic Design
In the previous chapter, two PIF cells were introduced. Both of them have their
advantages and disadvantages. In this chapter we will see why the 4-to-l multiplexer was
chosen as the other gate for PIF logic implementation and its detail description. Some
examples ofPIF, like multiplexers and adder are discussed. Other functional blocks like
Encoders, Decoders and shift registers are the subject of the following chapters.
4.1: PD? NAND design limitations
As part of this research, a PIF NAND based schematic and layout library was
designed. The following problems have been identified during the development of the
PIF NAND library:
To obtain the complement of a signal, there will be a gate delay between the
signal and its complement. This is due to the structure ofNAND gate which
provides inverted output. The NAND is designed using static CMOS logic
implementation. As a result, the circuit experiences more delay in one branch (one
with the complemented signal) than the other (with the signal itself) increasing the
response time or delay of the circuit. In the PIF logic design methodology, for
delay balancing more number ofgates need to be added in other branches as well.
In case ofD flip flop, if both the inputs D and its complement are available at
the same time, then the settling time of the flip flop reduces which in turn
increases the operating frequency of the flip flop.
22
In order to implement a function like XOR, PIF NAND requires five gates, Figure
(3.4.7). A cell which would provide the XOR operation in less number ofgates
(effectively transistors) is important for fast and cost effective designs.
If a circuit gives a logically incorrect result, debugging the circuit for finding the
fault and fixing it is tedious. The reasons for this difficulty, for XOR as a sample
circuit are:
i. The logic depth of a function, like XOR, is more in NAND,
ii. Even ifyou know that XOR is the gate causing problem, still five gates
need to be examined to find the exact spot of trouble
iii. NAND schematic is implemented in static CMOS logic consisting of four
transistors. Due to its structure, static CMOS suffers from static power
dissipation drawback. A cell which would reduce static power dissipation
is required.
4.2: Need for a new PD7 cell
Although any logic gate capable of implementing any arbitrary function can be
used as a PIF cell, a 4-to-l multiplexer was selected for the following reasons:
With the use of a 4-to-l multiplexer, a function and its complemented output can
be generated after the same amount ofgate delay
To implement functions like XOR only one cell is required, Figure (3.2.4), thus
reducing the number of cells
With less number of cells in a circuit, it is easier to debug it.
23
Beyond resolving some of the disadvantages of the PIF NAND, the multiplexer based
PIF cell has the logical effort of 2 and this effort is independent of the number of inputs
[4], so you can select any multiplexer 2-to-l, 4-to-l or 8-to-l. This implies that the delay
of a logic gate won't increase with the increase in the number of inputs [5].
From the available multiplexers we need to decide which one would be efficient in
case of the PIF logic implementation. A minimum of two input gate is required by the
PIF logic to implement the circuits (as in case of 2 input NAND gate). For PIF
multiplexer, select lines are used as inputs and the data inputs of the multiplexer are
hardwired to VDD or GND. So, a 2-to-l multiplexer is out of question since it has only
one select line. Both the 4-to-l and 8-to-l multiplexers can implement a 2 input function
but 4-to-l multiplexer is selected which has smaller parasitic capacitance [5] and requires
less decoding circuit for its select lines as compared to 8-to-l multiplexer. The 4-to-l
multiplexer is designed using transmission gates. The reasons for this selection is
explained in the next section.
4.3: Why Transmission Gate topology?
A transmission gate, shown in Figure (4.3.1), is a switch close to the ideal switch
in performance, because the NMOS passes a strong logic
'0'











For the transmission gate in Figure
(4.3. 1), when input A is at logic
'
1 ', NMOS and PMOS transistors are conducting and the
transmission gate is said to be ON. When A is at logic '0', none of the transistors are ON








-I > Output = A. Input
/A
Figure 4.3. 1 : Transmission Gate
4.3.1: Advantages of Transmission gate
The equivalent resistance of the transmission gate is PMOS resistance (Rp) parallel to
NMOS resistance (Rn) which can be assumed to be constant in analysis
The equivalent parallel resistance for pull up and pull down operation is approximately
the same even for the same size transistors [4].
The number of transistors needed for 4-to-l multiplexer is less compared to the static
CMOS implementation
4.3.2: Disadvantages ofTransmission Gates
For a chain of n transmission gates where output of one transmission gate is
connected to the data input of other, the propagation delay of [4]
tp
= (Q.69*C*Rtq*n*(jt + \)) + 2 -(4.3.2.1)
C = output capacitance of each gate
Req
= Equivalent resistance of transmission gate
n
= Number of transmission gates in the chain
25
The delay in a transmission gate chain has a square dependence on the number of
transmission gates n; for four gates in a chain, delay will be sixteen times greater than
one transmission gate. But in the PIF logic, the output of one transmission gate is given to
the select line inputs and not to the data inputs of the next gate. The only node where the
output of one transmission gate goes as a data input to the other is within the 4: 1
multiplexer PIF cell, thus the number of transmission gates in chain, n, is always equal to
2 in the PIF logic design implementation using multiplexer as PIF gate. The data input to
each 4: 1 multiplexer is either VDD or GND, thus every output comes directly from the
power supply (due to the transmission gates) which drives the select signals of the next
stage, and thus delay is not affected by the long chain.
4.4: PD7 multiplexer topology
The 4: 1 multiplexer is implemented using three 2: 1 multiplexers and two
inverters, one for each select signal. The transmission gate based 2-to-l multiplexer is
shown in Figure (4.4.1). Two transmission gates are connected in parallel and select
signal is connected to the PMOS of first transmission gate and NMOS of second, whereas
the /select signal is connected to NMOS of first transmission gate and PMOS of the
second gate. When the select line is logic '0', the first transmission gate is turned ON and
output logic is changed to the logical value ofdata input 'DO'. Similarly, for select signal
of logic '1', output logic becomes equal to data input
'Dl'
Three such 2-to-l




Figure (4.4.1): Transmission gate based 2-to-l multiplexer
4.4.1: PDF cell 4-to-l multiplexer schematic and layout
The transmission gate based 4: 1 multiplexer is shown in Figure (4.4.3). Three 2-
to-1 multiplexers and two inverters are used. Inverters are required to obtain the inversion
of select signals to implement the logic of 4: 1 multiplexer. NMOS transistors are of
minimum size but PMOS transistors are made twice the size ofNMOS to keep the gate
switching threshold at VDD/2 in order to obtain symmetric output [4], A PMOS
transistor with gate width greater than twice the size ofNMOS will slightly improve the
current drive on rising outputs, but would add significant diffusion capacitance, which
slows both (rising and falling) transitions and increases the loading on the select input
[4].
The reason an inverter or buffer is not present at the output of4-to-l multiplexer
in Figure (4.4.2) is that the input of the multiplexer is connected to either VDD or GND
27
and select lines are connected to the output of the previous multiplexers. Due to this
arrangement, the output is always connected to VDD or GND, providing a low output
resistance and thus eliminating the need for an inverter at the output.
IS
03O
Figure (4.4.2): Transmission gate based 4:1 Multiplexer
Connecting the output ofone multiplexer to select lines ofnext stage
multiplexer makes sure that the number of transmission gates in a chain is limited to 2
gates only, thus avoiding the delay caused due to a long chain of transmission gates.
28
4.5: Layout of 4-to-l Multiplexer
The standard-cell layout technique is used for the physical implementation of the
4-to-l multiplexer. Its stick diagram is shown in Figure (4.5.1). The signals are routed in
polysilicon which is placed perpendicular to the power supply lines and serves as the
input to the transistors. For density reasons, it is desirable to realize the active region of
all NMOS and PMOS transistors unbroken [4] as shown in the Figure (4.5.1). This











































































Figure (4.5.1): Stick diagram for PIF 4-to-l multiplexer
29
The layout of4-to-l multiplexer designed for the 0.25um technology is shown in
Figure (4.5.2). Polysilicon, Metall andMetal2 are the layers used for local interconnects.
Metal3 is used for GND andMetal4 for VDD connections, thus minimizing the total
number of layers used. The layout area of this cell is 76.962
urn2


















































Figure (4.5.2): PIF multiplexer layout in 0.25pm technology, in cadence virtuoso
platform
30
4.6: Examples ofPIF logic Implementation
A IBit full adder and 8-to-l multiplexer are the two functional blocks discussed
in this section. These blocks are designed using PIF multiplexer and PIF NAND cells.
The circuits are analyzed and compared with respect to the standard cell library
components or the resultant circuit which use these components. Quality metrics such as
power consumption, propagation delay and layout area are the parameters of interest for
comparison.
4.6.1: Multiplexer
The 8-to-l multiplexer functional block is first considered. In standard cell design
methodology the 8-to-l multiplexer can be built by using one 2-to-l and two 4-to-l
multiplexers or by using seven 2-to-l multiplexers. Since a 2-to-l multiplexer is needed
in both cases, its design is discussed first using standard cell implementation and then by
using PIF logic gates [4-to-l multiplexer and NAND gate]. A reference standard cell
library which is designed in 0.25 um CMOS technology is used.
The reference library has a 2-to-l multiplexer functional block as shown in Figure
(4.6.1).
'SD'




pins of the symbol are the 2 data inputs.
This circuit makes use of 12 transistors with the maximumNMOS length of 680 nm and
width of 1.8097 um and PMOS length of249 nm and width of2.474 um.
The reference library has four to five different multiplexers, but the one used for
this research is
"MUX21"
which provides positive logic output.
"MUX21"
cell varies in
the performance level categorized as per the maximum load capacitance it can drive (that
31
is the fanout capability). In case ofPIF logic methodology, a fanout of 2 (or load
capacitance of20 fF for 0.25 pm process) is allowed. This narrows down the selection of
any standard cell used in the comparison circuit to a cell / gate with a fanout of 2. If a cell
does not have a fanout of 2 then the cells with a fanout of 3 are used.
i'M * & ]<- m^izi
0,
,*-l
Se!*et3 ^ "" -m^-^-^i
Figure (4.6.1): Standard cell based 2-to-l Multiplexer
The NAND based PIF logic implementation of2-to-l multiplexer is shown in
Figure (4.6.2), which makes use of only one type ofgate, 2 input NAND gate. Each
NAND gate has 4 transistors thus involving 16 transistors in the circuit and has a logic









thus the output ofgate at location Al depends on the logic of input 'DO'. When the signal
'SelectO'
is logic '1', input logic
'Dl'









whereas OR operation is provided by the gate at location 'B2'.
An interesting thing to note about this schematic is that, one of the inputs of
NAND gate at location A2 is connected directly to VDD because the signal
'SelectO'
is
connected to two different gates at location A2 and A3, having a fan out of two. This is
one of the rules followed in PIF logic design: the fan out of any gate can not be more than
32
two. If the output of one gate is to be connected to two different gates, then the unused







Figure (4.6.2): PIF NAND based 2-to-l Multiplexer
A 4-to-l multiplexer implementation requires just one gate with logic depth of
one as seen in Figure (4.6.3). Notice that select signal Selectl and two inputs (D2 and
D3) are unused, thus connected to ground. This schematic shows the implementation
when the output of the driving gate of this cell is also connected to another cell in the
circuit.
If the incoming signal
'selectO'
is not driving any other gate but just one gate in
2-




Figure (4.6.3): PIF multiplexer based 2-to-l Multiplexer
Figure (4.6.4). This is a better approach for delay balancing and power distribution.
PIF Multiplexer based circuit uses16 transistors to implement 2-to-l multiplexer
including the inverters needed by the select lines, but it is interesting to note that just half
of the circuit is actually used in case of the schematic in Figure (4.6.3). Thus, 4-to-l













Figure (4.6.4): PIF multiplexer based 2-to-l Multiplexer
4.7: Performance Assessment
The circuits are simulated using the Cadence Spectra simulator which is a part of
the Cadence platform. The inputs of the target circuits are driven by inverters instead of
input sources to emulate the physical conditions even better. For the same reasons, the
outputs of the target circuits drive an inverter.
The performance of each target circuit is evaluated by measuring the power
dissipation and propagation delay for a particular input combination. The layout area of
PIF circuits is measured from the cells in the layout library designed for this research. For
the Standard cell based circuits the layout area is measured either directly from layout (2-
to-1 Multiplexer) or by adding the layout area of the functional blocks (8-to-l
Multiplexer using one 2-to-l and two 4-to-l multiplexers).
35
Standard cell
Figure (4.7.1) shows the schematic arrangement for the standard cell
implementation of a 2-to-l multiplexer. The symbol has been modified to accommodate
the VDD and GND pins which are used in power dissipation measurement. The value for
VDD is 2.5 V for IBM 0.25 um technology, thus the DC source of 2.5 V is used as seen
in Figure (4.7. 1). A resistor of 1 Q is connected between the power supply in the circuit
and the VDD pin of the 2-to-l multiplexer. Current through this resistor is plotted during
simulation and peak current and average current values are measured for power
dissipation calculations.
The output of the inverter which loads the 2-to-l multiplexer is terminated using a






















Figure (4.7. 1): Standard cell based schematic for power dissipation measurement in
2-to-l multiplexer
36
4.7.1: Power dissipation calculation
Peak power consumption Ppeak = iPeakVDD ~ (4.7. 1)
Average power consumption
Pav = (l/T)Jp(t) dt
= (VDD / T)JiDD(t) dt - (4.7.2)
where, p(t) is the instantaneous power, idd is the current being drawn from the supply
voltage Vdd over time interval [0, T] and ipeak represents the maximum value of ioD over
interval [0,T].
4.7.2: Propagation delay
The propagation delay is the delay in getting the response from the circuit for any
change in the input signal. It is the delay experienced by the signal when passing through
the gate. For any gate, the switching threshold (Vm) is ideally located at Vdd/2 or middle
of the logic swing. With this assumption, we are measuring the propagation delay at the
50% transition points of the input and output waveforms.
The average propagation delay is given by [4]
tp





= low to high transition delay
tPHL
= high to low transition delay
Propagation delay is a gate quality metric used to compare different semiconductor
technologies, circuit or logic design styles.
37
PIF logic
The PIF logic implementation for power dissipation measurement is shown in
Figures (4.7.2) and (4.7.3) forNAND and 4-to-l multiplexer respectively. Notice that
additional pins for VDD and GND are present in the gate symbol and the inputs of the










J^o^ iA J)>o v
-ft Mu*_2to1
1 *-


















Figure (4.7.3): PIF multiplexer based schematic for power dissipation measurement in
2-to-l multiplexer
grid
4.8: Simulation results of 2-to-l multiplexer
The simulated output for the 2-to-l multiplexer is shown in Figure (4.8. 1), where
the data lines are given the DC input and the select line is a pulse with Ins period and 400
ps of pulse width with 100 ps of rise and fall time. The waveforms are provided here as
an example of how the performance metrics values are extracted. For later circuits, only















lime ( a )
Figure (4.8.1): Simulation result of2-to-l Multiplexer implemented in PIF NAND logic
The signal
'Output_2tolMux'
in the waveform is the output ofNAND based PIF
implementation of2-to-l multiplexer and
'Total_Current'
is the current flowing into the
circuit, measured at one end of
resistor RO ofFigure (4.7.2). This total current is
integrated with respect to time using the calculator tool and then average power
dissipation is calculated using equation (4.7.2). Peak power is the product of supply
40
voltage and peak current obtained from
'Total_Current'
waveform. The simulation
results of2-to-l multiplexer for propagation delay, power dissipation and layout area are
given in Table (4.8. 1). In the Table and the graphs 2-to-l multiplexer circuit is denoted as
Mux 2-to-l, PIF multiplexer as PIF mux, standard cell circuit as Std cells and average
power as Avg. power for space saving.
Table (4.8.1): 2-to-l multiplexer simulation results
Mux2-to-1
Peak Power Avg.Power Delay Layout Area No. of No. of
uWatts uWatts ps square um Cells Transistors
PIF Mux 449.25 31.725 90.645 76.962 1 16
PIF NAND 794.5 164.53 111.82 71.232 4 16
Std Cells 1111.8 181.83 90.28 69.12 1 12
The PIF NAND has a logic depth of 3 and PIF multiplexer has a depth of 1 gate.
The gate delay of each of these gates in inverter configuration is 46.05 ps and 74. 17 ps
respectively. For the given logic depth of the PIF circuits and the obtained propagation
delay for 2-to-l multiplexer a throughput of 1 data set can be obtained.
4.9: Comparison of quality metrics for 2-to-l multiplexer
The results obtained during simulation and listed in Table (4.8.1) are compared
and discussed here. The quality metrics are plotted as shown in the following Figures.
The standard cell implementation of the 2-to-l multiplexer consumes more peak and
average power. The transistors used by standard cells multiplexer schematic are wider
which allows high peak current to flow through it during transition. More peak current
also increases average power consumed as seen in Figures (4.9.1) and (4.9.2).
The PIF NAND has a logic depth of 3 with the gates switching frequently due to
pipelining causes more
peak power consumption. More number ofgates and more peak
41
current affect average power consumption to increase. Even ifPIF NAND uses more
gates and transistors than standard cells, the power consumed is less because the
transistors in the circuit are minimum sized.
The PIF multiplexer has a logic depth of 1 and uses small size transistors, thus
less peak current. Peak and average power consumption is therefore less for PIF
multiplexer.





200 400 600 800 1000 1200
PIF Mux PIF NAND Std Cells
H Mux 2-to-1 Peak Power
uWatts
449.25 794.5 1111.8
Figure (4.9.1): 2-to-l multiplexer, peak power dissipation comparison
The transistors used in standard cells library are wider thus the speed of the circuit
is more as can be seen by the propagation delay comparison in Figure (4.9.3). Formore
number of transistors and same logic depth of 1, the PIF multiplexer is as fast as standard
cells implementation. With the more number of cells and more logic depth PIF NAND
exhibits more propagation delay.
42
Mux 2-to-1; Avg.Power (uWatts)
PIF Mu>
0 50 100 150 200




Figure (4.9.2): 2-to-l multiplexer, average power dissipation comparison








) 20 40 60 80 100 120
PIF Mux PIF NAND Std Cells
H Mux 2-to-1 Delay psec
90.645 111.82 90.28
Figure (4.9.3): 2-to-l multiplexer, propagation delay comparison
The layout area occupied depends directly on the number of transistors in the
circuit and size of each
transistor. Standard cell circuit has 12 transistors and requires less
layout area than the PIF implementation. The
difference in the layout areas is not much
43
(maximum is of 8 urn2) as can be seen in Figure (4.9.4). The layout area ofone PIF
NAND cell is 17.808 pm2, thus the total area for 2-to-l multiplexer implementation is
four times, 71.232 pm2. For same number of transistors, PIF multiplexer requires more
layout area than PIF NAND because 12 transistors are present within one cell, increasing
its layout area to 76.962 pm2. The additional area is required to maintain the minimum
spacing between the various layers like nwell and active region.
Mux 2-to-1; Layout Area (um2)
Std Cells
PIF NAND
PIF Mux 76.9 32
64 66 68 70 72 74 76 78




Figure (4.9.4): 2-to-l multiplexer, layout area comparison
The number of cells and transistors play an important role in the circuit's
functionality and performance. For any circuit performance comparison one needs to
know the number ofgates and transistors used. For better understanding and clarification
of the comparison comments, the number of cells and transistors are plotted in Figures
(4.9.5) and (4.9.6) respectively.
After the PIF based design of a 2-to-l multiplexer and its comparison to standard
44
cell implementation we continue with the second functional block needed for 8-to-l
multiplexer implementation, the 4-to-l multiplexer. Our intent is to compare PIF logic
functional blocks with the best available library components in the reference library.






PIF Mux PIF NAND Std Cells
Mux 2-to-1 No. of Cells 1 4 1
Figure (4.9.5): 2-to-l multiplexer, number of cells comparison
Mux 2-to-1 ; No. ofTransistors
Std Cells -J12
PIF NAND Ull16
PIF Mux mm -ir
() 5 10 15 20
PIF Mux PIF NAND Std Cells
Mux2-to-1 No. of Transistors 16 16 12
Figure (4.9.6): 2-to-l multiplexer, number of transistors comparison
45
4.10: 4-to-l Multiplexer
The standard cell library has a 4-to-l multiplexer symbol as shown in Figure
(4.10.1) and in the case ofPIF the previously designed circuits: the PIF NAND and the
PIF multiplexer based 2-to-l multiplexer are used to build a 4-to-l multiplexer circuit.
Standard cell
The standard cell 4-to-l multiplexer is designed using transmission gates, in the case
of the reference library using 26 transistors. The maximum sizes for transistors are:
NMOS width is 2.2 pm and PMOS width is 3 pm with length of240 nm in both cases.
The multiplexer symbol is shown in Figure (4.10. 1), DO through D3 are the 4 inputs and
SD1 and SD2 are the least significant and most significant select lines, respectively.
fe Mis*4t&i
Figure (4.10.1): Standard cell based 4-to-l Multiplexer
PIF NAND based 4to-l multiplexer
The NAND gate based implementation consists of two 2-to-l multiplexers and
additional gates to route the signals 'Selectl', 'SelectO', and also to route the data input
signals in order to balance the delay providing a logic depth of 12 gates. This results in an
increase in number ofNAND gates used as shown in Figure (4.10.2). Only even number
46
ofNAND gates must be used to route a signal; otherwise, we run into the danger of
inverting the logic of the signal and thus improper functioning of the circuit.
Note that the vertical lines in the schematic are for the VDD or power supply







































Figure (4.10.2): PIF NAND based 4-to-l Multiplexer
The NAND gates at location
'D2-C4-D3'
form the first 2-to-l multiplexer and
'E7-D8-E8'
form the other. The third 2-to-l multiplexer at the output consists of 'E2-E9-
47
F5'





outputs of the two 2-to-l multiplexers. The empty boxes are the dummy cells used to
provide decoupling capacitance between VDD and GND and also to get a square or
rectangular layout area for the functional blocks.
PIF Multiplexer based 4-to-l multiplexer
The multiplexer based PIF logic implementation is straightforward since the cell
itself is a 4-to-l multiplexer implemented using transmission gates. The schematic is
shown in Figure (4.10.3). Signal
'selectO'





It makes use of 16 transistors and the maximum transistor















Figure (4.10.3): PIF multiplexer based 4-to-l multiplexer
48
4.11: Functional Verification and Simulation Results of 4-to-l Multiplexer
The target schematics are tested in the arrangement described earlier, driving the
gate inputs with two inverters and loading the output with an inverter. The data signals
are given dc value, DO, D2 are connected to GND and Dl, D3 are connected to VDD.
Both select signals are square wave pulses of 2 ns (SelectO) and 4 ns (Selectl) time
periods and 100 ps of rise and fall times. In this arrangement the output is switching
between VDD and GND with the same frequency as the LSB select line
'SelectO'
as can
be seen in the waveforms presented in Appendix B. Table (4.11.1) shows the results
obtained.
For a input pulse width of 900 ps and the circuit gate delay less than the pulse
width (PIF multiplexer 74.16 ps, PIF NAND 552.6 ps), only one data set can be sampled
for the given circuit delay and thus, throughput ofmore than 1 cannot be obtained in case
of4-to-l multiplexer circuit.
Table (4. 1 1 . 1): 4-to-l multiplexer simulation results
Mux 4-to-1
Peak Power Avg.Power Delay Layout Area No. of No. of
uWatts uWatts ps square um Cells Transistors
PIF Mux 495.5 28.25 167.49 76.962 1 16
PIF NAND 3475 827.25 730.335 961.63 54 208
Std Cells 1265.8 171.75 152.675 168.96 1 26
The PIF NAND circuit with a logic depth of 13 and 54 gates has the maximum
peak and average power consumption as shown in Figures (4.11.1) and (4.11.2)
respectively. Standard cell implementation has wider transistors which are switching
frequently (providing an output swing ofVDD and GND) thus consuming more peak
power. Due to more peak current and more number of transistors than PIF multiplexer,
49
standard cell circuit consumes more average power. PIF multiplexer with just one cell in
the circuit consisting of 16 small size transistors requires minimum peak and average
power.






0 1000 2000 3000 4000
PIF Mux PIF NAND Std Cells
Mux 4-to-1 Peak Power
uWatts
495.5 3475 1265.8
Figure (4.11.2): 4-to-l multiplexer peak power dissipation comparison
Mux 4-to-1 ; Avg.Power (uWatts)









Figure (4. 1 1.3): 4-to-l multiplexer average power dissipation comparison
50
The propagation delay of standard cell implementation of4-to-l multiplexer is
less than multiplexer and NAND based PIF designs. The standard cell circuit requires
more transistors than PIF multiplexer but they are larger in size, thus the circuit delay is
less by almost 15 ps. NAND based PIF 4-to-l multiplexer with a lot more transistors and
logical depth of 13 is the slowest circuit as seen in Figure (4.1 1.3).






0 200 400 600 8(
PIF Mux PIF NAND Std Cells
h Mux 4-to-1 Delay psec 167.49 730.335 152.675
Figure (4.1 1.3): 4-to-l multiplexer propagation delay comparison
The layout area of the target circuits also shows similar type ofgraph as in case of
propagation delay, see Figure (4. 1 1.4). PIF multiplexer based design requires the least
area and NAND requires the most. PIF multiplexer requires one cell with minimum sized
transistors, thus less layout area than standard cells. PIF NAND with 54 cells and 208
transistors has maximum layout area demand and standard cell circuit with 26 transistors
requires more layout area than PIF multiplexer but less than PIF NAND.
51






0 200 400 600 800 1000 1200




Figure (4.11.4): 4-to-l multiplexer layout area comparison
The number of cells and transistors used in each schematic are shown in Figure
(4. 1 1
.7)
and (4. 11.8). The dummy cells at the corners are not included in the layout area,
number of cells and transistors calculations since that place can be used by other circuits.




0 10 20 30 40 50 6
PIF Mux PIF NAND Std Cells
H Mux 4-to-1 No. of Cells 1 54 1
Figure (4.11.7): 4-to-l multiplexer number of cells comparison
52






50 100 150 200 250
PIF Mux PIF NAND Std Cells
H Mux 4-to-1 No. of Transistors 16 208 26
Figure (4.11.8): 4-to-l multiplexer number of transistors comparison
4.12: 8-to-l Multiplexer
After discussing the 2-to-l and the 4-to-l multiplexers, the 8-to-l multiplexer is
straightforward. The standard cell library does not have a 8-to-l multiplexer block, thus it
is designed using 4-to-l and 2-to-l multiplexers as shown in Figure (4.12.1). These are
the same blocks used in the 2-to-l and the 4-to-l multiplexers in the previous sections.
The select line SD1 is the least significant bit in 4-to-l multiplexer symbol. This
implementation uses 64 (26 * 2 + 12) transistors with the largest PMOS width of 3 pm
and NMOS width of2.2 pm (within 4-to-l multiplexer block). The layout area is
calculated by adding the areas of each of the blocks used.
The NAND based PIF implementation reuses the 4-to-l multiplexer block to
design the 8-to-l multiplexer as shown in Figure (4. 12.2). As the PIF logic blocks are
reusable, the previously designed 2-to-l and 4-to-l multiplexers are used in the 8-to-l
multiplexer and few more gates are used for signal routing and delay balancing. The
53
select signal (select2) needed to be routed from the input (left) side of the circuit to all the
way up to the output (right), providing a logic depth of 26.
Ms^JHsf
f *.<y* jwi i Hlfc,. .............^ , . m
Figure (4.12.1): Standard cell based 8-to-l Multiplexer
The NAND gate configuration between locations 'D1-D10;
11-110'
provides 4-to-
1 multiplexer functionality. The other 4-to-l multiplexer is at location 'E13-E22; J13-
J22'
which is the same circuit as in case of the Figure (4. 10.2). The NAND gates in
columns A B and C are used for the signal routing. The select signal S2 is ANDed with




and logical ORed at
location
'K9'
to provide the 8-to-l multiplexer output.
The multiplexer based PIF implementation of 8-to-l multiplexer schematic is
simple compared to the NAND based PIF, seen in Figure (4. 12.3). The major advantage
is that the cell itself accommodates 4-to-l multiplexer functionality. Just 6 cells are
needed to implement the 8-to-l multiplexer logic as opposed to 217 in NAND based PIF,
saving power consumption,
layout area and providing faster circuits.
54












































Figure (4.12.3): PIF multiplexer based 8-to-l multiplexer






provide the two 4-to-l
multiplexers and the gate at
'C3'
provides the 2-to-l multiplexer at the output, thus
forming a 8-to-l multiplexer.
56
4.13: Functional Verification and Simulation Results of 8-to-l Multiplexer
The target circuits are again driven by two inverters and loaded with an inverter as
done in case of the previous circuits. The data signals are kept at constant DC values and
the select lines are square wave pulses switching between VDD and GND. The time
period values for SelectO, Selectl and Select2 signals are 2.5 ns, 5 ns and 10 ns
respectively.
If the PIF NAND circuit is given a SelectO signal of 1 . 16 ns then we can obtain
the 8-to-l multiplexer functionality which means that we can get a throughput of 2. The
same is not true for PIF multiplexer since the number of transistors in a multiplexer cell is
more and hence the setup time of the cell is more than PIF NAND. Due to this reason,
there is a minimum pulse width limit on the input signal SelectO and the circuit can
provide a throughput of 1 .
The simulated results are shown in Table (4.13.1) where 8-to-l multiplexer is
abbreviated as mux 8-to-l. The dummy cells appearing as empty boxes [in case of the
PIF logic implementation, Figure (4.12.2)], which are present at the corners are not
included in the layout area since this area is not actually occupied and can be used for
overlapping with other blocks or to provide decoupling capacitance. The simulation
waveforms are present in Appendix B.
Table (4.13.1): 8-to-l multiplexer simulation results
Mux 8-to-1
Peak Power Avg.Power Delay Layout Area No. of No. of
uWatts MWatts ps square um Cells Transistors
PIF Mux 986.25 251.25 526.165 461.77 6 96
PIF NAND 7302.5 2303.3 1545 3864.3 217 868
Std Cells 2049.8 436.5 301.095 407.04 3 64
57
The standard cell implementation consumes twice as much peak and average
power as the multiplexer based PIF, seen in Figures (4.13.1) and (4.13.2), respectively.
Mux 8-to-1; Peak Power (uWatts)
PIF Mu>
730 2.5
0 2000 4000 6000 800C
PIF Mux PIF NAND Std Cells
aMux 8-to-1 Peak Power
uWatts
986.25 7302.5 2049.8
Figure (4.13.1): 8-to-l multiplexer peak power dissipation comparison











Figure (4.13.2): 8-to-l multiplexer average power dissipation comparison
58
The number of cells used by standard cells is less than PIF multiplexer but the size of the
transistors is large, resulting in an increased peak and average power consumption. For a
large number of cells (thus more amount of switching activity) inNAND based PIF
circuit, peak and average power is greater than other two implementations.
The propagation delay of the 8-to-l multiplexer circuit in standard cells
implementation is less than the two PIF implementations as seen in Figure (4.13.3). In the
standard cells circuit, the logic depth is of 2 gates, the number of transistors required is
less than the PIF circuits and the size of the transistors is large, thus providing a fast
circuit.







0 500 1000 1500 2C
PIF Mux PIF NAND Std Cells
a Mux 8-to-1 Delav psec 526.165 1545 301.095
Figure (4.13.3): 8-to-l multiplexer propagation delay comparison
PIF multiplexer circuit has 96 transistors as opposed to 868 in PIF NAND and 64
in standard cells and has a logic depth of 3 gates. Due to more minimum width transistors
(which actually account for number of transistors in signal path) PIF multiplexer stands
59
in the second place for delay comparison. With more transistors and logic depth of26,
PIF NAND 8-to-l multiplexer is the slowest of all the three circuits.
The layout areas is dominated by the number of transistors and size of each
transistor, see Figure (4.13.4). Standard cell uses least amount of area because of less
transistors; even though the transistors are wide, the number of transistors required is less
than the other two PIF circuits. But the difference in layout area isn't much between
standard cell implementation and PIF multiplexer based 8-to-l multiplexer when
compared to that ofbetween each of these circuits and PIF NAND. PIF NAND uses the
highest amount of area and stands last in this comparison.













Figure (4.13.4): 8-to-l multiplexer layout area comparison
The number of cells and transistors are shown (on next page) in Figures (4. 13.5)
and (4.13.6), respectively. The greater number of cells in PIF NAND circuit help to
60
achieve a throughput of 2. The number of cells which increase the power dissipation,
delay and layout area provides an advantage of increased throughput.




50 100 150 200 250
PIF Mux PIF NAND Std Cells
hMux 8-to-1 No. ofCells 6 217 3
Figure (4.13.5): 8-to-l multiplexer number of cells comparison







200 400 600 800 1000
PIF Mux PIF NAND Std Cells
H Mux 8-to-1 No. of
Transistors
96 868 64
Figure (4.13.6): 8-to-l multiplexer number of cells comparison
61
4.14: One bit full adder design
An adder is a basic component ofmany computational blocks used such as
multipliers and Arithmetic and Logic unit (ALU). The PIF NAND based multiplier is
already designed [2]. Practically implemented adders are more than one bit, of course, but
still one bit full adder is the basic functional block and thus discussed here.
4.14.1: Standard cell based 1 Bit adder
The truth Table for a 1 bit full adder is shown in Table (4. 14.1) and the schematic






are the two bits to be added and
'Cin'
is the incoming carry signal.
Table (4. 1 .4. 1): One bit full adder truth Table
Cin A B Sum Carry out
0 0 0 0 0
0 0 1 1 0
0 1 0 1 0
0 1 1 0 1
1 0 0 1 0
1 0 1 0 1
1 1 0 0 1
1 1 1 1 1
62
The logical equations for output signals sum and carry out (Cout) can be derived
from the truth Table. The obtained equations are
Sum = A B 0 Cin





is the logical AND operation
'+'
is the logical OR operation
is the logical exclusive OR operation
For the standard cell implementation, a library component named
'ADDFB'
is
used which stands for one bit full adder, the symbol for which is shown in Figure
(4.14.1). The B version of the component which provides a fanout of2 is used. The
circuit has 26 transistors with the maximum transistor width of2.6 pm and 3.9 pm for















Figure (4.14.1): One bit full adder, standard cell implementation
4.14.2: PIF logic implementation of one bit full adder
The PIF multiplexer based adder was designed in this research and previously
designed PIF NAND based adder [2] is used for comparison. The schematics for PIF
NAND and PIF multiplexer based adder are shown in Figures (4.14.2.1) and (4.14.2.2)
respectively.
63
The PIF NAND schematic has a logical depth of 12 and uses 51 cells to obtain
one bit full adder implementation. For the output 'Sum', one of the XOR functions is
Figure (4.14.2.1): One bit full adder, PIF NAND implementation
implemented as XNOR followed by an inverter. By doing so, the logic depth and hence
the number ofgates required is reduced. The 2 input XOR function is split as XOR of any
2 variables and the resultant is XORed with the third input. The XOR of inputs signals A
and B is implemented as XNOR-NOT becauseA3 is also used in Cout computing.
The gates at location
'H2-G4-H3-I3'
form the XNOR operation and the T2-J2-
Jl'
location gates form the XOR operation to provide out 'Sum'. The logical AND















to provide the carry out signal. The additional gates in output
'Cout'




The PIF multiplexer based adder, Figure (4. 14.2.2), has a logical depth of 6 and
uses 26 cells. To implement two input XOR gate, a PIF multiplexer needs only one cell
as opposed to 5 in PIF NAND. Thus it requires less cells and consequently smaller





and the logical AND gates at
'C5-D5-B7-D6-D7'
provide the output Cout.


































































4.15: Functional Verification and Simulation Results ofOne Bit Full adder
Figure (4.14.2.2): One bit full adder, PIF multiplexer implementation
65
In an adder circuit, the maximum delay is caused by the Cout signal and thus it









signals switch back and forth between VDD and









is a pulse with rise and fall time of 100 ps and pulse width of 900 ps.




to switch between VDD and
GND as shown in the waveforms in Appendix B. One bit full adder is addressed as 1 Bit
FA and the simulation results obtained are shown in Table (4.15.1).
The PIF logic based adders have a gate delay of 552.6 ps (46.05 ps
* logical depth
12) for PIF NAND and 445.02 ps (74. 17 ps
* logical depth of 6) for PIF multiplexer. If
signal
3'
with a pulse width of400 ps for PIF NAND is provided then a throughput of 2
can be obtained.
Table (4. 15. 1): One bit full adder simulation results
1 Bit FA
Peak power Avg power Delay Layout area No. of No. of
mWatts uWatts ps square um Cells Transistors
PI Mux 2.244 1002.3 1170 2001 26 416
PIF NAND 1.634 655.25 723.38 908.21 51 220
Std Cells 2.085 317.5 368.25 184.32 1 26
Due to more transistors in the PIF multiplexer circuit than PIF NAND, peak and
average power consumption are greater even for the half the number of cells, see Figure
(4. 15. 1). Due to pipelining of the gates, the switching activity and the power consumption
in PIF is uniformly distributed over the circuit. This is the reason for more average power
consumption than the standard cell. The standard cell based adder has less transistors but
the peak power consumption is as much as PIF multiplexer because of frequent switching
66
of large transistors (to provide the back and forth switching outputs). But the average
power consumption is less than the PIF circuits because of less transistors.







0 0.5 1 1.5 2 2
PI Mux PIF NAND Std Cells
1 Bit FA Peak power
mWatts
2.244 1.634 2.085
Figure (4.15.1): 1 Bit FA peak power consumption comparison
The PIF multiplexer has 16 times more transistors than that of standard cell and
almost twice that ofPIF NAND, thus exhibiting a higher propagation delay than the other
two circuits as depicted in Figure (4. 15.3). PIF NAND with more logic depth but less
number of transistors than PIF multiplexer has a propagation delay less than PIF
multiplexer but greater than standard cell. The standard cell with less transistors, which
are wider, provide quick output with least delay.
67
1 Bit FA ; Avg power (uWatts)






0 200 400 600 800 1000 1200
PI Mux PIF NAND Std Cells
1 Bit FA Avg power uWatts 1002.3 655.25 317.5
Figure (4.15.2): 1 Bit FA average consumption comparison






0 200 400 600 800 1000 1200 1400
1 Bit FA Delay psec




Figure (4.15.3): 1 Bit FA propagation delay comparison
The layout area comparison for the target circuits is shown in Figure (4.15.4). The
standard cell with largest size and least number of transistors requires less layout area.
68
The PIF NAND area is less than the PIF multiplexer due to the difference in the number
of transistors.




0 500 1000 1500 2000 25
PI Mux PIF NAND Std Cells
h 1 Bit FA Layout area
um2
2001 908.21 184.32
Figure (4.15.4): 1 Bit FA layout area comparison
The number of cells and transistors used in the 1 Bit full adder are compared in
Figures (4.15.5) and (4.15.6), respectively. This comparison helps to understand the
performance of the circuits and difference in their behavior. The greater number of cells
in the PIF logic implementation provides a throughput of2 in case of adders.
69






0 10 20 30 40 50 6
PI Mux PIF NAND Std Cells
B 1 Bit FA No. of Cells 26 51 1
Figure (4.15.5): 1 Bit FA number of cells comparison
Std Cells 1126
PIF NAND
1 Bit FA ; No. of Transistors
PI Mux 416
(D 100 200 300 400 5C
PI Mux PIF NAND Std Cells
H 1 Bit FA No. of Transistors 416 220 26
Figure (4. 15.6): 1 Bit FA number of transistors comparison
70
4.16: Conclusion
The PIF logic design rules were introduced and few problems faced so far in PIF
NAND implementation were discussed. The need for a new PIF cell was demonstrated
and a new cell was introduced.
The chapter presented several examples ofPIF logic implementation and detailed
explanation of the examples. For the multiplexer functional blocks seen so far, PIF
multiplexer is giving performance very close to the standard cells implementation.
In case of the adder, PIF NAND is giving close performance to standard cell in
case of power consumption. But in case ofpropagation delay, standard cell based circuit
gives better performance. Nevertheless, one should also consider the time and complexity
involved in designing standard cell functional blocks. PIF logic provides a time saving
methodology and, being a pipelined approach, provides larger throughput.
71
Chapter 5: Encoders
An encoder converts given information into a more compact form [19]. A
simplest form is a binary encoder which encodes information from
2n
inputs into n-bit
output code [7], as indicated in Truth Table (5.1. 1). Two binary encoders, 4-to-2 and 8-
to-3, are discussed here.
5.1 Standard cell based 4-to-2 Encoder
For a binary encoder, the inputs are one hot encoded which means that exactly




for positive logic or is asserted and the outputs
represent the binary number that identifies which input is equal to logic
'
1 ', see Table




are not shown in the
truth Table, and they are treated as don't-care conditions [19].
Table (5.1.1): Encoder 4-to-2 Truth Table
D3 D2 Dl DO Yl YO
0 0 0 1 0 0
0 0 1 0 0 1
0 1 0 0 1 0









will both be logic
'0'
to
indicate the data bit
'DO'









Observe that the output
'YO"




is 1 and output
'Yl'




is 1. Shortly put as,
72
Y0 =D1+D3
Yl = D2 + D3
These outputs can be generated using two input OR gates as shown in Figure
(5.1.1).
ft







Figure (5.1.1): Encoder 4-to-2, standard cell implementation
Another type of encoder is the priority encoder in which you can set the priority
of the input bits. If input
'D3'
has the highest priority and input
'DO'
has the lowest







both of them have logic
'1'
at the same time
'D3'
will be






11 ', thus taking care of the ambiguous
condition where two inputs are at logic
' 1'
at the same time.
73
5.2 PIF logic implementation ofEncoder 4-to-2
The NAND based and multiplexer based PIF implementation ofEncoder 4-to-2 is
shown in Figure (5.2. 1) and Figure (5.2.2), respectively. The OR gate is implemented as
shown in Chapter 3 Figure (3.4.5) for NAND and (3.2.2) for multiplexer. ANAND gate
requires three cells to implement one OR gate whereas a multiplexer requires just one
cell.
Figure (5.2.1): Encoder 4-to-2, PIF NAND implementation




form one OR gate to give
output
'YO'
and gates at location
'A2-A3-B2'
form the second OR gate providing output
'Yl'. PIF NAND needs more number ofgates than standard cells implementation due to
74
NAND gate's functionality ofproviding inverted output and its organization to provide
an OR functional output. For example two NAND gates are required for an AND gate
output.
For multiplexer based PIF, only two PIF cells are required, each ofwhich is
configured to operate as an OR gate as seen in Figure (5.2.2). The first cell provides
output 'YO'; this output is logic
'1'


































Figure (5.2.2): Encoder 4-to-2, PIF Multiplexer implementation
75
Here, only theDl and DO inputs of the multiplexer are connected to VDD as opposed to
Dl, D2 and D3 inputs in Figure (3.2.2). This organization takes care of the ambiguous
condition ofboth the inputs being logic
'1'
at the same time. The same reasoning applies
to the second cell configured as OR gate.
5.3 Functional Verification and Simulation Results of 4-to-2 Encoder














is a pulse of






to switch between VDD and GND every 1 .6 ns. For this input condition and
with a logic depth of 1 gate for PIF multiplexer and 2 PIF NAND a throughput of 1 is
achieved in both PIF circuits. The simulation results are shown in Table (5.3.1) and the
timing diagrams are provided in Appendix B.
Table (5.3.1): Encoder 2-to-4 Simulation results
Encoder 4 to 2
Peak Power Avg. Power Delay Layout Area No. of No. of
(uWatts) (uWatts) (ps.) square um cells Transistors
PIF Mux 742.25 123.13 148.54 153.92 2 32
PIF NAND 857.5 114.05 121.97 89.04 5 20
Std Cells 804.75 109.35 117.89 92.16 2 12
As seen from the circuit diagrams, standard cells and PIF multiplexer need the
least number of cells to implement the encoder 4-to-2 functionality, whereas five NAND
gates form the PIF NAND based schematic.
For same number of cells, PIF multiplexer logic consumes less peak power than
standard cells because standard cells use wider transistors (NMOS 560 nm; PMOS
76
800 nm). PIF NAND peak power is more than the rest of the two, see Figure (5.3.1)
because ofmore number ofgates which are switching frequently.
The PIF multiplexer requires only two cells, but the number of transistors in the
circuit is more than that in standard cells implementation, thus consuming more average
power, Figure (5.3.2). But since standard cells have larger transistors and more peak
power, they also have more average power dissipation. PIF NAND implementation has
20 transistors as opposed to 12 in standard cells circuit, but only consumes 5.3 uWatts
more average power than standard cells. The standard cell circuit with less transistors
consumes less average power.
The propagation delay of the PIF multiplexer circuit is more due to more
transistors. Since these transistors are minimum sized, their transition time is more i.e.,
they switch slowly and the signal has to pass through those number of transistors causing






650 700 750 800 850 900
PIF Mux PIF NAND Std_Cells
B Encoder 4 to 2 Peak Power
(uWatts)
742.25 857.5 804.75
Figure (5.3.1): Encoder 4-to-2, peak power dissipation comparison
77
more delay. In case of standard cell, the number of transistors is less than PIF NAND and
PIF multiplexer and these transistors are designed to switch fast, resulting in less
propagation delay requirement. PIF NAND with more logic depth but less transistors
than PIF multiplexer and more than standard cell is the second fastest 4-to-2 encoder.




100 105 110 115 120 125
PIF Mux PIF NAND Std_Cells
H Encoder 4 to 2 Avg. Power
(uWatts)
123.13 114.05 109.35
Figure (5.3.2): Encoder 4-to-2, average power dissipation comparison
The layout area of a circuit is large if the number of transistors is more or if the
area of the transistors in the circuit is large as depicted in Figure (5.3.4).NAND uses 20
minimum sized transistors and standard cells use large 12 transistors which is the reason
for a small area difference between them. PIF multiplexer with almost three times the
number of transistors than standard cell requires the maximum layout area.
78
Encoder 4 to 2 ; Delay (psec.)
148.54
Std_Cells 1 117.89
PIF NAND |j HI12197
PIF Mux |
50 100 150 200
PIF Mux PIF NAND Std_Cells
H Encoder 4 to 2 Delay (psec.) 148.54 121.97 117.89
Figure (5.3.3): Encoder 4-to-2 Propagation delay comparison




0 50 100 150 2(
PIF Mux PIF NAND Std_Cells
h Encoder 4 to 2 Layout Area
um2
153.92 89.04 92.16
Figure (5.3.4): Encoder 4-to-2 Layout area comparison
79
As discussed earlier, the number of cells used in PIF NAND is greater than the
other two and is shown in Figure (5.3.5). Figure (5.3.6) reflects the number of transistors
used by each circuit.
Encoder 4 to 2 ; No. of cells
Strl Oplk fell H 9
PIF NAND gj[ 5
PIF Mux H| |2
PIF Mux PIF NAND Std_Cells
B Encoder 4 to 2 No. of cells 2 5 2
Figure (5.3.5): Encoder 4-to-2 Number ofCells comparison




10 15 20 25 30 35
PIF Mux PIF NAND Std_Cells
H Encoder 4 to 2 Number of
Transistors
32 20 12
Figure (5.3.6): Encoder 4-to-2 Number of transistors comparison
80
5.4: 8-to-3 Encoder
A larger version ofbinary encoder is 8-to-3 encoder with 8 inputs and 3 outputs.





is the outputMSB signal.
Table (5.4.1): 8-to-3 Encoder truth Table
Data Inputs Output
D7 D6 D5 D4 LD3 D2 D1 DO Y2 Y1 YO
0 0 0 0 0 0 0 1 0 0 0
0 0 0 0 0 0 1 0 0 0 1
0 0 0 0 0 1 0 0 0 1 0
0 0 0 0 1 0 0 0 0 1 1
0 0 0 1 0 0 0 0 1 0 0
0 0 1 0 0 0 0 0 1 0 1
0 1 0 0 0 0 0 0 1 1 0
1 0 0 0 0 0 0 0 1 1 1
5.4.1 Standard cells based 8-to-3 Encoder
From the Truth Table (5.4.1), output equations for 8-to-3 encoder can be derived
as following
Y0 = D1 +D3+D5 + D7
Yl = D2 + D3 + D6 + D7
Y2 = D4 + D5 + D6 + D7



















Figure (5.4.1): Encoder 8-to-3 standard cells implementation
81
5.4.2 PIF logic implementation of 8-to-3 Encoder
The four input OR gate is implemented in PIF logic by making use of the
associative property of digital gates. A four input OR gate is implemented as three two
input OR gates. PIF NAND implementation is shown in Figure (5.4.2. 1). The three OR




combine to form output
'YO'











provide output 'Y2'. Note that the longer vertical lines are the wires connecting
VDD, power supply line.
82
Figure (5.4.2): PIF NAND based Encoder 8-to-3
In case ofPIF multiplexer schematic, a four input OR gate can be implemented in
the same way as NAND gate, using the associative property of digital logic. PIF
multiplexer based implementation is depicted in Figure (5.4.2.2), where 33 cells / PIF
gates are
required to get 8-to-3 Encoder functionality as opposed to 45 in PIF NAND
implementation. This difference in the number of gates is due to the fact that less gates


















































































O / / / v 1/
Figure (5.4.3): PIF multiplexer based Encoder 8-to-3
84



















only in case ofPIF multiplexer circuit and is at logic
'0' 'D7'
is a square pulse with







switching all the outputs at
the same time. The timing diagram or simulation waveforms are provided in Appendix B
and the values are listed in Table (5.5.1).
The calculated gate delay of the PIF circuits for the given logic depth of 8 for PIF
NAND is 368.40 ps and for a depth of 6 for PIF multiplexer is 445.02 ps. Encoder 8-to-3
has a throughput of 1 for an input signal ofpulse width of 1 .4 ns.
Table (5.5.1): Simulated results for Encoder 8-to-3
Encoder 8 to 3
Peak Power Avg. Power Delay Layout Area No. of No. of
(mWatts) (uWatts) (ps.) square um cells Transistors
PIF Mux 1.59 684.25 1164.41 2462.8 32 512
PIF NAND 2.13 364.75 502.57 801.36 45 180
Std Cells 1.82 196.35 184.14 184.32 3 30
It is clear from the Table that PIF multiplexer uses less number of cells than PIF
NAND but requires more transistors. This difference in number of transistors leads to
more layout area and larger propagation delay. The standard cell implementation requires
the least number of cells and transistors but the time involved in the design is more than
that in multiplexer and NAND based PIF logic design.
The peak power consumption ofPIF NAND is greater than the other two, see
Figure (5.5.1). PIF multiplexer implements one OR gate using just one PIF cell as
85
opposed to 3 in PIF NAND. Therefore, one multiplexer gate switching provides the same
output as 3 NAND gates switching events; as a result PIF NAND circuit consumes more
peak power. Standard cell has large transistors (NMOS 700 nm; PMOS 2 pm) which can
pass large peak current thus causing more peak power dissipation.






PIF Mux PIF NAND Std_Cells
H Encoder 8 to 3 Peak Power
(mWatts)
1.59 2.13 1.82
Figure (5.5.1): Encoder 8-to-3 peak power dissipation comparison
The average power consumed by PIF NAND circuits is almost twice that of
standard cell circuit and more than three times that of a PIF multiplexer circuit, Figure
(5.5.2). The power consumption in PIF is distributed over the entire circuit. The gates are
pipelined and are switching with every transition of input signal. This effect results in
more average power consumption than standard cell implementation.
86




100 200 300 400 500 600 700 800
PIF Mux PIF NAND Std Cells
I Encoder 8 to 3 Avg. Pow er
(uWatts)
684.25 364.75 196.35
Figure (5.5.2): Encoder 8-to-3 average power dissipation comparison
The standard cells based circuit with large transistor and logic depth ofone is the
fastest circuit, Figure (5.5.3). NAND based PIF design is faster than PIF multiplexer with
logic depth of 8 as opposed to 6 in PIF multiplexer. This is because the number of
transistors in the signal path is less for PIF NAND than in PIF multiplexer.
The PIF multiplexer carrying the maximum number of transistors and requires
more layout area, Figure (5.5.4). PIF NAND, with large number of cells, Figure (5.5.5)
but less number of transistors than PIF multiplexer, Figure (5.5.6), stands second in terms
layout area. The standard cell implementation, with large but very less number of
transistors, has least layout area.










200 400 600 800 1000 1200 1400
RFMux PIF NAND Std_Cells
B Encoder 8 to 3 Delay (psec.) 1164.41 502.57 184.14
Figure (5.5.3): Encoder 8-to-3 propagation delay comparison
Encoder 8 to 3 ; Layout Area um
Std Cells \ 1184.22
PIF NAND
PIF Mux
I Encoder 8 to 3 Layout
Area square um
2462.8







Figure (5.5.4): Encoder 8-to-3 layout area comparison
88
Encoder 8 to 3 ; No. of cells
10
Std_Cells H3
I I45PIF NAND 111
J32PIF Mux
20 30 40 50
PIF Mux PIF NAND Std_Cells
B Encoder 8 to 3 No. of cells 32 45 3
Figure (5.5.5): Encoder 8-to-3 number of cells comparison




PIF NAND 111 '
PIF Mux
0 100 200 300 400 500 600
PIF Mux PIF NAND Std_Cells
H Encoder 8 to 3 Number of
Transistors
512 180 30
Figure (5.5.6): Encoder 8-to-3 number of transistors comparison
89
5.6 Conclusion
The power consumption in PIF logic based multiplexer implementation is as good
as standard cell, but giving the full advantage of time efficiency in scalability and
elimination of global interconnects. If the transistors in the PIF cells are made wider
(considering the logical and electrical effort of the logic gate to calculate the transistor
sizes [5]) then faster circuits can be obtained which will also increase the throughput.
With considerable amount of difference in number of transistors, PIF logic
implementation requires more layout area and with more number of transistors in the
signal path exhibits more propagation delay than the standard cell logic implementation.
In case of the PIF cells, the area of eachNAND gate and the propagation delay is smaller
than PIF multiplexer, thus providing efficient results for propagation delay.
90
Chapter 6: Decoders
A decoder is a multiple input, multiple output logic circuit that converts coded
inputs into coded outputs [19]. The input code has fewer bits than the output code and
there is one-to-one mapping from input code words into output code words. 1-out-of-m





active high or positive logic implementation) or asserted at any given time and other
inputs are logic '0'. The output of decoder can be active high or active low, depending on
the system requirements or design trade offs.
Here, two decoders are considered for comparison. The first one is of simplest
form and size, a 2-to-4 decoder and the other is a little more complex a 3-to-8 decoder.
6.1: Standard cell based 2-to-4 Decoder
Table (6. 1 . 1) is the truth Table for 2-to-4 decoder and Figure (6.1.1) shows the
schematic of2-to-4 Decoder using standard cell.
'Enable'
is a input signal which
represents chip select or chip enable.
This signal is implemented as active high; if enable






irrespective of the input
data values.
'Dl'
is theMSB input signal and
'Y3'
is theMSB output signal.
91
Table (6.1.1): Decoder 2-to-4 Truth Table for active high output
Enable DO Dl Y3 Y2 Yl YO
0 X X 0 0 0 0
1 0 0 0 0 0 1
1 0 1 0 0 1 0
1 1 0 0 1 0 0
1 1 1 1 0 0 0
X: Don't care condition;
From the truth Table we can easily derive the equations for each output. Output
YO is logic
'one'







inputs DO and Dl are inverted using NOT gate and they are logically ANDed with
Enable signal to get YO output. Following equations are obtained for each output:
YO = /DO. /Dl. Enable
Yl = DO. /Dl. Enable
Y2 = /DO. Dl. Enable
Y3 = DO. Dl. Enable
where, /DO represents the
inverted signal DO-NOT and
' '
represents logical AND





Figure (6.1.1): Standard Cells Decoder 2-to-4
6.2: PIF logic implementation of 2-to-4 Decoder
The PIF based implementation of2-to-4 decoder needs additional cells to route
the signals and thus looks like a more complex circuit than standard cell implementation,
as shown in Figure (6.2. 1). Three input AND gate is implemented as a combination of
two AND gates using the associative property of digital logic, (ABC)
= (A.B).C. Thus,
we need four NAND gates for each output signal, two NAND gates performing one AND
operation, as shown in Chapter 3, Figure (3.4.4). The resultant circuit has 40 cells with a
















Figure (6.2. 1): Decoder 2-to-4; PIF NAND
























Y3'. Notice some of the NAND gates have one of their inputs connected
to VDD to follow the PIF logic rule, fan out of 2 and the vertical lines are the VDD lines.
The schematic for PIF multiplexer based 2-to-4 decoder is shown in Figure





The AND gate at location
'D2'
is conFigured







when both data inputs are logic
'0'
By connecting the first input ofmultiplexer at
location
'D2'
to VDD and other three to GND we obtain the functionality
94






























l_ 1_ l_ L_ i










































































































































































i_ l_ L_ u
X
Figure (6.2.2): Decoder 2to4; PIF Multiplexer
95









respectively. One of the
important differences between PIF NAND and PIF Multiplexer is that the number of
gates used for routing any signals can be odd or even number in the case of the
multiplexer, but it has to be even in case ofNAND, evident in this schematic. There is no
surprise that PIF multiplexer implementation requires less number of cells than PIF
NAND. The vertical lines in the circuit are the VDD and GND connections.
6.3: Functional Verification and Simulation Results of 2-to-4 Decoder
The input signals have pulse width and period such that they cover all the possible
combinations (00, 01, 10, 11) asserting the respective outputs (YO, Yl, Y2, Y3)
respectively. The simulation results are shown in Appendix B and the numerical values
are given in Table (6.3. 1). The data input
'DO'
is a pulse width 2 ns and 'Dl
'
has a width
of 4.1 ns. The maximum propagation delay is for output signal
'YO'
with respect to input
'DO'
because both the data inputs are inverted before applying to the AND gates. Again
all the target circuits are driven by a chain of two inverters and loaded by an inverter at
the output.
Table (6.3.1): Decoder 2-to-4 simulated results
Decoder 2 to 4
Peak Power Avg. Power Delay Layout Area No. of No. of
(mWatts) (uWatts) ps. square um Cells Transistors
PIF Mux 2.56 766.07 1419.4 2539.7 33 528
PIF NAND 3.098 372.14 559.96 712.32 40 160
Std Cells 1.23 66.607 175.89 261.12 6 36
96
The number of cells required to build the decoder schematic varies heavily in the
logic styles. Standard cell uses four 3 input AND gates, in case ofPIF logic the function
performed is AND but the number ofgates used is more in order to route the signals.
The power dissipation, Figure (6.3.1), in each of the logic implementation is
varying by 1 mW of peak power (standard cells and PIF Multiplexer). PIF NAND
contains more cells than other two functional blocks and exhibits power consuming
nature. The second largest number of cells is required in PIF multiplexer which shows
more consumption than standard cell implementation. More cells lead to more gate
switching events in this pipelined combinational schematic and hence more peak power
and average power is consumed by both PIF circuits as shown in Figures (6.3. 1) and
(6.3.2).
Decoder 2 to 4 ; Peak Power (mWatts)
Std_Cells ^^^^^^^^^1.23
-^n^T
PIF NAND i 3.098
-
PIF Mux I 2.56
0.5 1 1.5 2.5 3.5
PIF Mux PIF NAND Std_Cells
H Decoder 2 to 4 Peak Power
(mWatts)
2.56 3.098 1.23
Figure (6.3.1): Decoder 2-to-4 peak power dissipation comparison









PIF Mux PIF NAND Std_Cells
Decoder 2 to 4 A\^. Power
(uWatts)
766.07 372.14 66.607
Figure (6.3.2): Decoder 2-to-4 average power dissipation comparison
The PIF multiplexer has a logical depth of 7 gates and PIF NAND has a depth of
9 as seen in the circuits before, Figures (6.2.2) and (6.2.1), respectively. For less number
of cells in PIF multiplexer than PIF NAND, a signal in PIF multiplexer passes through
more transistors than PIF NAND (each multiplexer cell has 16 transistors and NAND has
4 transistors). Hence, PIF multiplexer requires more time to provide the output signal
shown by propagation delays in Figure (6.3.3). The standard cells implementation with a
logic depth of 1 and 36 transistors is the fastest circuit.
Due to the difference in number of transistors, the area ofmultiplexer PIF cell is
larger than one NAND gate. Thus, total area occupied by PIF multiplexer is more than
other two implementations as shown in Figure (6.3.4). Standard cell with just 6 cells
shows less area requirement than other two with 33 (PIF multiplexer) and 40 (PIF
NAND) cells.










PIF Mux PIF NAND Std Cells
Decoder 2 to 4 Delay psec. 1419.4 559.96 175.89
Figure (6.3.3): Decoder 2-to-4 propagation delay comparison





0 500 1000 1500 2000 2500 3000
PIF Mux PIF NAND Std_Cells
a Decoder 2 to 4 Layout Area
square um
2539.7 712.32 261.12
Figure (6.3.4): Decoder 2-to-4 layout area comparison
99
The number of cells and transistors which affect the circuit performance in many
aspects are plotted in Figures (6.3.5) and (6.3.6). As can be seen, for 33 cells PIT
multiplexer has 528 transistors where as PIF NAND has 160 transistors for 40 PIF cells.




0 10 20 30 40 5
PIF Mux PIF NAND Std_Cells
H Decoder 2 to 4 No. of Cells 33 40 6
Figure (6.3.5): Decoder 2-to-4 number of cells comparison
Decoder 2 to 4 ; No. of Transistors
Std Cells , 136
PIF NAND
PIF Mux 528
100 200 300 400 500 600
PIF Mux PIF NAND Std_Cells
H Decoder 2 to 4 No. of
Transistors
528 160 36
Figure (6.3.6): Decoder 2-to-4 number of transistors comparison
100
6.4: Decoder 3-to-8
Ifwe need more than 4 decoded outputs, 3-to-8 decoder can be used which has 8
data outputs and 3 inputs. The truth Table for positive logic implementation is shown in
Table (6.4.1).
6.4.1: Standard cells based 3-to-8 Decoder
From the truth Table (6.4. 1), the output equations for 3-to-8 decoder can be
obtained as follows:
YO = /D2. /Dl. /DO. Enable Yl =/D2. /Dl. DO. Enable
Y2 = /D2. Dl. /DO. Enable Y3 = /D2. Dl. DO. Enable
YO = D2. /Dl. /DO. Enable YO = D2. /Dl. DO. Enable
YO = D2. Dl. /DO. Enable YO = D2. Dl. DO. Enable
To implement 3-to-8 decoder, 4 four input AND gates are used for standard cells
implementation. 2 or 3 input AND gates instead of 4 inputs can also be used but the logic
depth of the circuit will increase. The logical effort of the circuit with 4 input AND gates
and a logic depth of one is less than the circuit with 2 or 3 input AND gates and a logic
depth of 2, thus having less propagation delay. The standard cell implementation is
shown in Figure (6.4.1), consisting of 4 input AND gates and inverters.
101
Table (6.4.1): 3-to-8 Decoder Truth Table
Enable D2 Dl DO Y7 Y6 Y5 Y4 Y3 Y2 Yl YO
0 X X X 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 1
0 0 1 0 0 0 0 0 0 1 0
0 1 0 0 0 0 0 0 1 0 0
0 1 1 0 0 0 0 1 0 0 0
1 0 0 0 0 0 1 0 0 0 0
1 0 1 0 0 1 0 0 0 0 0
1 1 0 0 1 0 0 0 0 0 0
1 1 1 1 0 0 0 0 0 0 0
The schematic looks very simple in case of standard cell implementation, but the
same is not true for PIF based schematics. Even though more cells are required by PIF
logic, the time involved in developing the schematic and layout is less than standard cell
schematic. All the designer has to do is design a PIF cell and fit it in the ready structure
of the schematic, for example for a PIF NAND, if the technology changes from 0.25pm
to 0. 10pm or 100 nm, the previous NAND needs to be replaced by the new NAND
designed in 100 nanometer technology. In case of standard cells, the transistor level
schematic of 4 input AND gate needs to be redesigned. The layout also needs to be done






Figure (6.4.1): Decoder 3-to-8, standard cell implementation
6.4.2: PIF logic implementation of 3-to-8 Decoder
The NAND based and multiplexer based PIF implementations are shown in
Figures (6.4.2) and (6.4.3) respectively. For the PIF based schematic, the 2-to-4 decoder
103




are ANDed and connected as the
'Enable'
input of2-to-4 decoder. Altogether two 2-to-4












The Figure (6.4.2) shows the PIF NAND circuit of 3-to-8 decoder. The schematic
has been rotated to fit in the page and the numbering order is reversed; rows are alphabets




provide the enable signal for the first and second 2-to-4 decoders















Notice the gates at location
'D4-E4'
are responsible to provide all





are obtained from the gates at the
locations, 'Ell-E10-H7-I7-J7-k7', 'Ell-E10-I9-J9-k8-K9',
'E11-E10-H12-I12-J11-K10'
and 'El 1-E10-I1 1-J12-K1 1-K12', respectively, with 'E10-E1
1'
being the Enable signal.
The PIF multiplexer schematic is similar to the PIF NAND but requires less cells,
shown in Figure (6.4.3). Even though PIF multiplexer requires less cells than PIF NAND
to implement a function, more cells are needed to route the signals in case of 3-to-8





Figure (6:4.2): Decoder 3-to-8, PIF NAND implementation
105







The AND gate at location
'C4'









is logic '0', and the rest of the input
106
conditions are at logic '0'. This gate configuration is a good example ofversatility of the




are obtained from the gates at locations 'E9-I8-K7', 'E9-I9-K8',
'E9-I10-K10'
and 'E9-
II 1-K1 1 ', respectively, with
'E9'
location gate providing the enable input.
6.5: Functional Verification and Simulation Results of 3-to8 Decoders

















between VDD and GND, with same frequency as the input signal
'DO'
With this input setting a throughput of 1 is achieved since circuit gate delays as per
calculation are 765.9 ps (45.05 ps * 15) for PIF NAND and 964.21 ps (74.17 ps
*
13) for
PIF multiplexer and the input signal pulse width is 2 ns.
The power dissipation and propagation delay are measured from the timing
diagrams obtained during simulation and the layout area is calculated based on the
number (PIF implementation) and type of cells (Standard cells implementation) involved
in the circuit. The results obtained are depicted in Table (6.5.1) and timing diagrams are




because of the inverters.
The standard cell based 3-to-8 decoder circuit has less peak power consumption




Table (6.5.1): Decoder 3-to-8 simulated results
Decoder 3 to 8
Peak Power Avg. Power Delay Layout Area No. of No. of
(mWatts) (mWatts) ns. square um cells Transistors
PIF Mux 4.55 1.59 2.735 8234.9 107 1712
PIF NAND 3.84 1.038 0.912 2226 125 500
Std Cells 0.869 0.1078 0.228 560.64 11 86
The standard cells implementation requires 8 AND gates and 3 NOT gates, as
shown in Figure (6.4. 1 . 1), making use of 1 1 cells and 86 transistors. Even though the
transistors are larger in size (NMOS 780 nm; PMOS 840 nm) it consumes less peak and
average power as shown in Figures (6.5.1) and (6.5.2), respectively. The difference is due
to the number of transistors in standard cell as compared to PIF implementation.
The PIF NAND consumes less peak and average power than PIF multiplexer as
opposed to that in 2-to-4 decoder. PIF multiplexer based decoders (2-to-4 and 3-to-8)
have more transistors than PIF NAND implementation, but the power consumption is
more only in 3-to-8 decoder, indicating more switching events.
Decoder 3 to 8; Peak Power (mWatts)
Std_Cells S^Bfl0 .869
ill 3 84PIF NAND JIIHIIIitttiiltlitlllll
PIF Mux I J* & * v/
'
PIF Mux PIF NAND Std_Cells
a Decoder 3 to 8 Peak Power
(mWatts)
4.55 3.84 0.869
Figure (6.5.1): Decoder 3-to-8 peak power dissipation comparison
108
Decoder 3 to 8; Avg. Power (mWatts)
Std_Cells Ijj 0.1 078
|1 59




PIF Mux PIF NAND Std_Cells
B Decoder 3 to 8 Avg. Power
(mWatts)
1.59 1.038 0.1078
Figure (6.5.2): Decoder 3-to-8 average power dissipation comparison
The PIF multiplexer is the slowest circuit with maximum propagation delay as
shown in Figure (6.5.3). The logical depth ofPIF multiplexer (15) and PIF NAND (13) is
almost same, but the propagation delay ofPIF multiplexer is more than three times that of
PIF NAND. This is due to the large number of transistors present in one PIF cell and thus
also in the signal path in case of the PIF multiplexer. The standard cell based circuit is the
fastest with just 86 transistors which are designed to respond quickly.
With maximum number of transistors, PIF multiplexer has the largest layout area,
Figure (6.5.4). PIF NAND exhibits propagation delay four times that of standard cell and
requires more than four times the layout area of standard cell. These values are still less
than half that required by PIF multiplexer. The layout area directly reflects the number of
transistors involved in the circuit.
109





0 0.5 1.5 2.5
PIF Mux PIF NAND Std Cells
i Decoder 3 to 8 Delay nsec 2.735 0.912 0.228
Figure (6.5.3): Decoder 3-to-8 propagation delay comparison
Decoder 3 to 8; Layout Area um
Std Cells H 560.64
PIF NAND
PIF Mux 8234.9
2000 4000 6000 8000 10000
PIF Mux PIF NAND Std_Cells
B Decoder 3 to 8 Layout Area
square um
8234.9 2226 560.64
Figure (6.5.4): Decoder 3-to-8 layout area comparison
The number of cells and transistors addressed in the comparison are shown in
Figures (6.5.5) and (6.5.6), respectively.
PIF NAND contains more cells but fewer
110
transistors than PIF multiplexer. There is a difference in number of transistors per cell,
NAND (4 transistors) and multiplexer (16), and hence this difference.
Decoder 3 to 8; No. of cells
Std_Cells [3111
PIF NAND llllllll H 125
PIF Mux I
20 40 60 80 100 120 140
PIF Mux PIF NAND Std_Cells
Decoder 3 to 8 No. of cells 107 125 11
Figure (6.5.5): Decoder 3-to-8 number of cells comparison
Decoder 3 to 8; No. of Transistors
Std_Cells 1 1 86
PIF NAND I 500
PIF Mux I '?U1712
500 1000 1500 2000
PIF Mux PIF NAND Std_Cells
B Decoder 3 to 8 No. of
Transistors
1712 500 86
Figure (6.5.6): Decoder 3-to-8 number of transistors comparison
Ill
6.6: Conclusion
The PIF NAND, yields good result compared to PIF multiplexer, especially for
larger circuits. Even if the number of cells involved in the PIF NAND implementation of
decoders is more, still the number of transistors in the circuit which actually provide the
functionality is less. This theory is well explained by logical effort [2]. In order to
implement 8 input AND gate, the most efficient circuit in terms of propagation delay is
the one which makes use of two 4 input AND gates and one 2 input AND gate, as
opposed to using all 2 inputs AND gates or just building 8 input AND gate. Interested
readers can refer to the "Logical
Effort"
book [5] provided as reference in this research.
112
Chapter 7: Shift Registers
So far, we have considered combinational circuits in which the output depends
only on the present values of the input signals. There is another category of circuits in
which the output not only depends on present input values but also on past values. These
types of circuits that show storage or memory behavior are called sequential circuits.
7.1.1: Master slave D flip flop
A sequential circuit is run by a global input signal called 'clock'. A very simple
example of this type of circuit is a latch. A D-latch using NAND gates is shown in Figure
(7. 1. 1). The two NAND gates at the output are said to be cross coupled since output of
one gate is connected to the input ofother and vice versa. This type of connection is what
gives us the outputs
'Q'
and 7Q', which are the complements of each other.
Clock
/Q
Figure (7.1.1): D Latch
113





as long as the
'clock'






changes to logic '0', the previous output at
'Q'
is held constant until the
'clock'
is asserted again. The latch operation mode when the output
'Q'
changes




mode and when the output is held, the
latch is said to be in
'Hold'
mode.
A flip flop is a logic function which makes use of two such latches. Shown in
Figure (7.1.2) is a
"Master-Slave"




















Figure (7.1.2): Master-Slave D flip flop




cycle (logic '1') and then holds the data for negative
'clock'
cycle (logic '0'). During the
positive
'clock'
cycle the slave latch is in Hold mode since its
'clock'
input is not





becomes transparent and passes the value at its
'D'
input, which is same as the output of
114
the master latch. Since the output ofmaster is constant for
'clock'
signal equal to logic
'0', the slave latch output changes only once during the negative
'clock'
cycle and stays
at that value. This behavior gives an effect called edge triggering, which is responding to
the input signal during the transition of the
'clock'
signal .The change in the slave output
occurs when the
'clock'
is changing from logic T to logic '0', providing the output at
the falling edge of
'clock'
This makes the flip flop a negative edge triggered device.
7.1.2: D Flip flop with preset and clear
Flip flops are used in sequential circuits that have many possible states. One such
example is a counter which counts the number of clock pulses and the output of each flip
forms one bit of the count. Such a counter needs to be forced into a known state of count.










irrespective of the input data or clock
value, having an asynchronous behavior. The
'clear'
signal, as the name suggests, resets
the output of the flip flop to logic '0'. Both these signals are active low indicated by a
"/"
in the label as shown in Figure (7. 1.3).
Three input NAND gates are added in the place of the cross coupled 2 input




















Same is true when the clear signal is equal to logic
'0'
The




which in turn causes the flip
flop output
7Q'







signal becomes high or logic '1'.
115



























Figure (7. 1 .3): Master-Slave D flip flop with Preset and Clear signals
Shift registers are designed using these flip flops as shown in the following
section. Shift registers are used to perform various operations on the data set in which
they have to load or shift the data values.
7.2: 1 Bit Shift Register
A serial and parallel shift register is constructed using the D flip flop and other
gates as shown in Figure (7.2.1).
'SR_In'
is the serial input signal and
'Load_/Shift'
is a
control signal which decides between the serial or parallel mode of operation of the shift
register. Enable is the chip select signal. Even if this signal is deactivated, the output of
shift register will be available as long as the power is ON. This is achieved by using a
2-
to-1 multiplexer, which selects
between the incoming data signal and the previous stored
value of the shift register. IfEnable signal is logic '0', the output
'Q'
of the D flip flop is
116





the shift register functions normally and new data can be loaded or transmitted










Figure (7.2.1): IBit shift register Standard cell
The PIF based implementation of the shift registers is shown in Figures
(7.2.2) PIF Multiplexer and (7.2.3) PIF NAND. Here we consider PIF Multiplexer
implementation first because it uses less cells than PIF NAND and makes the
understanding of this complex circuit a bit easier.
In Figure (7.2.2), the PIF Multiplexer based 1 bit shift register, gates at location
'A3-A4-B4'
form the 2 AND gates and 1 OR gate ofFigure (7.2.1). The 2-to-l
multiplexer is implemented using 3 gates at location 'C3-C4-D3', although only one gate
is needed as shown in Figure (4.6.4). There are 3 signals, Enable, OR gate output and the
'Q'
output of the flip flop. To combine them in a 2-to-l multiplexer functionality 3 gates
are required because one gate can have only 2 input signals (as select lines). For the same
reason the 3 input NAND gate (shown in the D flip flop, Figure (7. 1.3)) is implemented




coupled gates ofFigure (7. 1.3) are located at
'G3-G4'
for the master latch and at
' J3-J4'









respectively, they are presented after 2 gate delays. The first
reason for this arrangement is, the gate at the output should be ready to drive 2 input
signals (which would not be the case if the output was taken from the gate at location
'J3') and the second is that the output
'Q'
needs to be connected back in the circuit as one
of the inputs to the 2-to-l multiplexer (otherwise, we would need a fanout of 3 for output








































































Figure (7.2.2): PIF multiplexer 1 Bit shift register; logical depth 12
118
The maximum logic depth ofPIF multiplexer 1 bit shift register is of 14 gates
including the cross coupled gates. All the other logic paths in the circuit have a uniform
logic depth of 12 gates; this is another advantage ofPIF multiplexer over PIF NAND.
Same logic depth can be obtained without disturbing the logic and thus obtaining delay
balancing.
The Figure (7.2.3) shows the PIF NAND schematic of 1 Bit shift register. The
number ofgates required is more than the PIF multiplexer implementation because of the
multiplexer's advantage of accommodating more logic within one gate (example XOR
gate). The gates at locations
'A3-A4-A5-B4'
form the inverter, OR and the two AND







Figure (7.2.3): PIF NAND 1 Bit shift register; maximum logic depth 17
119
The outputs Q and /Q are made available after 2 gate delays for the same reason
explained in PIF multiplexer based 1 Bit shift register. The maximum logical depth is 17
and a total number of 71 gates are used. Again the dummy cells are included in the total
count and their area can be used for decoupling capacitors or for any other logic layout.
7.3 Functional Verification and Simulation Results of 1 Bit Shift Register
The PIF logic and standard cells based circuits are simulated with the same set of
input signal conditions. The
'Clock'
signal is a pulse ofwidth 900 ps and period of 2 ns.
The
'Load_/Shift'
signal is made twice as wide as
'Clock'
in order to cover all the
possible combinations (00, 01, 10, and 11). This set up also manifests the
'Hold'
mode of
the slave latch in Figure (7.1.3). The simulation results obtained for these input
conditions are shown in Table (7.3.1) where 1 Bit shift register is abbreviated as SR 1 Bit
and PIF multiplexer as PIFMux for simplicity.
Table (7.3.1): 1 Bit Shift Register simulation results
SR 1 Bit
Peak Power Avg. Power Delay Layout area No. of No. of
m Watts m Watts Ns square um cells Transistors
PIF Mux 3.328 1.969 1.49 4155.8 54 864
PIF NAND 2.64 0.896 1.088 1264.4 71 284
Std Cells 1.928 0.445 0.427 1920 6 67
Noting that the PIF NAND gate delay for inverter configuration is 46.05 ps and
PIF multiplexer is 74. 17 ps. For a logic depth of 17, PIF NAND based ideal circuit delay
would be (17 * 46.05) 782.85 ps and for PIF multiplexer based circuit would be (14
*
74.17) 1038.38 ps. Thus, for the applied set of input signals, the throughput of the PIF
based circuits is 1 .
120
The peak power comparison is shown in Figure (7.3. 1). Standard cell
implementation consumes less peak power than the PIF based designs. Even though
standard cell circuit needs less cells and transistors, the latter are wider (NMOS 2.2 pm;
PMOS 3pm ofwidth) and consume more power as compared to the minimum sized
transistors. Formore cells PIF NAND uses less power than PIF multiplexer because of
less transistors.
SR 1 Bit ; Peak Power (mW)
.928
Std Cells i PM^^?!^
'IF NAND l^^^^^^^^^^!^^ |2 64
PIF Mux m 13.328
PIF Mux PIF NAND Std Cells
SR 1 Bit; Peak Power mWatts 3.328 2.64 1.928
Figure (7.3. 1): 1 Bit Shift Register Peak Power Dissipation Consumption comparison
The average power dissipation, Figure (7.3.2), shows the same comparison results
as peak power. PIF circuits have more cells and effectively transistors than the standard
cell and due to pipelining these cells are switching frequently which causes more peak
and average power dissipation.
With fewer cells and wider transistors, the standard cell based circuit is the fastest
one as shown in the propagation delay comparison, Figure (7.3.3). Again, PIF
121







PIF Mux PIF NAND Std Cells
B SR 1 Bit; Avg Power mWatts 1.969 0.896 0.445
Figure (7.3.2): 1 Bit Shift Register Average Power Dissipation Consumption comparison











PIF Mux PIF NAND Std Cells
B SR 1 Bit; Delay nsec 1.49 1.088 0.427
Figure (7.3.3): 1 Bit Shift Register Propagation Delay comparison
multiplexer has maximum transistors and is slower than the other two. PIF NAND has
more logic depth than PIF multiplexer but the number of transistors in the signal is less
122
than half and thus is faster. With four times more transistors than standard cell circuit PIF
NAND is slower.
The layout area of each circuit is shown in Table (7.3.1) and its comparison in
Figure (7.3.4). For more transistors PIF multiplexer based circuit occupies the largest
area and smallest for standard cell based circuit. The number of cells and transistors
comparison is shown in Figures (7.3.5) and (7.3.6) respectively.
SR 1 Bit ; Layout area
um2
Std Cells | 1920
4155.8





1000 2000 3000 4000 5000
PIF Mux PIF NAND Std Cells
B SR 1 Bit; Layout area square
um
4155.8 1264.4 1920
Figure (7.3.4): 1 Bit Shift Register Layout Area comparison
SR1 Bit ; No. of cells
123
Std Cells ill 6




0 20 40 60 8
PIF Mux PIF NAND Std Cells
HSR 1 Bit No. of cells 54 71 6
Figure (7.3.5): 1 Bit Shift Register number ofCells comparison
SR 1 Bit ; No. of Transistors
Std Cells 1967
PIF NAND pll|l|ilH284
PIF Mux pi ._.'"'. - 1 1 1 nii |864
200 400 600 800 1000
PIF Mux PIF NAND Std Cells
H SR 1 Bit No. of
Transistors
864 284 67
Figure (7.3.6): 1 Bit Shift Register number ofTransistors comparison
124
7.4: 2 Bit Shift Register
The lbit shift register shown in Figure (7.2.1) is used to design the 2 bit serial
parallel shift register ofFigure (7.4.1). The output
'QO'
of the first flip flop is connected
as
'SR_In'
input of the second flip flop as shown in Figure (7.4.1). This arrangement
provides the serial shifting of the data. For serial shift operation, the final shifted output
will be the
'Q'
value of the last flip flop in the circuit. The serial shift operation is the
slowest operation in a shift register and thus this delay result is also provided in this
thesis work.
The 2 bit shift register circuit has two 1 bit shift registers with just the change of
the
'SR_In'
signal connection. In case ofPIF logic circuits, there will be additional gates
which are required for the signal routing in order to fulfill the PIF logic design rule of
connecting only to the adjacent gates. The resultant circuits are shown in Figures (7.4.2)
and (7.4.3) for PIF NAND and PIF Multiplexer based circuits, respectively.
PIF NAND implementation of 2Bit shift register, as seen in Figure (7.4.2) has a
lot ofNAND cells as compared to Figure (7.3. 1). The reason is that the standard cell
library has symbols for the flip flop and multiplexer. Both PIF NAND and PIF
multiplexer do not use any such block diagram representation to avoid any confusion,
since PIF cells are the only blocks / symbols used in the circuit.




is the 1 bit shift register. The output
'Q'
of first shift register is connected
to the serial input of the second by a chain of even number ofgates as soon at location
'B2'. The gates present at locations
'A1-A2-A3'
are used to route the input signals
'/Preset, Enable, Load_/Shift, Clock,
/Clear'




receive 8 number of additional gates for delay balancing. The circuit thus




Figure (7.4.1): 2Bit shift register Standard cell
In case ofPIF multiplexer based 2Bit shift register, Figure (7.4.3), the same




locations include 1 bit shift
register each and the output of one register is routed as serial input of other through
'B2'
location cells. For routing of signals odd number of cells can also be used since
multiplexer does not invert the logic as opposed to NAND gate. The PIF multiplexer












u\ u u u
^d d n/Ql
Figure (7.4.2): 2 Bit shift register, PIF NAND implementation
127
Figure (7.3.3): 2Bit Shift register, PIF multiplexer implementation
128
7.5 Functional verification and simulation results of 2Bit shift register





compared to the ones used during the simulation of 1 bit shift register in order to satisfy
the timing requirements of the added gates in the PIF logic implementation. But the same
input conditions are used for the functional verification of the 3 circuits.
The
'Clock'
signal is a square wave with 4.2 ns ofperiod and 2 ns pulse width.
The
'Load_/Shift'
signal is made wider so that at least 4 serial shift operations are




only for a duration of






which is at logic
'0'
The total gate delay of the circuit can be calculated as 1.15125 ns (25*46.05 ps)
for PIF NAND and 1.4834 ns (20 * 74. 17 ps) for PIF multiplexer. For the given input
conditions a throughput of 1 is obtainable. Ifmore than 1 throughput is desired than the
input pulse width can be reduced. Timing calculation caution must be taken in case of
PIF multiplexer since its gate delay is more than PIF NAND.
The simulated results are given in Table (7.5.1). There is one additional column




signal is low and the output of the last flip flop in the shift register. This time difference
is called serial output delay. Again, 2 bit shift register is abbreviated as SR 2 Bit.
Table (7.5.1): 2Bit shift register simulation results
SR 2 Bit
Peak Power Avg Power Delay Layout area No. of No. of Serial out
m Watts mWatts ns square um cells Transistors Delay, ns
PIF Mux 3.95 2.216 2.6283 14392 187 2992 7.5403
PIF NAND 4.8125 1.1389 1.534 4291.7 241 964 5.7129
Std Cells 3.815 0.25618 0.304 3840 12 134 4.3969
129
With increase in the functional complexity, the number of cells and thus the
transistors required increases rapidly as is clearly evident from Table (7.5.1). For the PIF
implementation, obviously the number of cells is more than twice compared to 1 bit shift
register, but for standard cells based circuit it is exactly twice.
The peak power consumption of Standard cells and PIF Multiplexer is almost
same which signifies that Standard cell 2Bit shift register uses wider transistors (NMOS
2.2 pm; PMOS 3pm ofwidth) which provide high current during transition and thus
consume more peak power. PIF NAND uses the largest number of cells which are
pipelined and are switching every 2 ns (in this case) thus also consume maximum peak
power.














Figure (7.5.1): 2Bit Shift Register Peak power dissipation consumption comparison
Average power consumption which includes the static and dynamic power
consumption is commensurate with the number of transistors in the circuit as seen in
130
Figure (7.5.2). PIFMultiplexer has maximum average power consumption and standard
cells based circuit has the minimum. PIF NAND with maximum number of cells but less
number of transistors than PIFMultiplexer consumes less average power than the latter
but more than standard cells circuit.




0 0.5 1 1.5 2 2
PIF Mux PIF NAND Std Cells
a SR 2 Bit Ava Power mW 2.216 1.1389 0.25618
Figure (7.5.2): 2 Bit Shift Register Average Power Consumption comparison
The propagation delay of2Bit shift registers is shown in Figure (7.5.3). Standard
cells circuit with wider and fewer transistors is the fastest circuit. PIF Multiplexer has
fewer cells and less logical depth than PIF NAND but uses almost three times as many
transistors which add to the signal path. This results in more propagation delay in PIF
multiplexer. PIF NAND requires more cells and transistors which are minimum size and
hence stands second in speed comparison.
131
PIF Multiplexer circuit with maximum number of transistors also requires
maximum layout area as seen in Figure (7.5.4). PIF NAND requires lesser area than PIF
Multiplexer but more than standard cells.








0 0.5 1 1.5 2 2.5 C
PIF Mux PIF NAND Std Cells
SR2BitDelayns 2.6283 1.534 0.304
Figure (7.5.3): 2Bit Shift Register Propagation delay comparison





0 5000 10000 15000 20
PIF Mux PIF NAND Std Cells
SR 2 Bit Layout area square
um
14392 4291.7 3840
Figure (7.5.4): 2Bit Shift Register Layout Area comparison
132
The number of cells and transistors working for the execution of2Bit shift
registers are shown on the next page in Figures (7.5.5) and (7.5.6) respectively.









) 50 100 150 200 250 300
PIF Mux PIF NAND Std Cells
HSR2 Bit No. of cells 187 241 12
Figure (7.5.5): 2Bit Shift RegisterNumber of cells comparison





PIF Mux I 2992
1000 2000








Figure (7.5.6): 2Bit Shift RegisterNumber of transistors comparison
133
The serial data output delay of the three circuits is shown in Figure (7.5.7). Again
due to more transistors in the signal path, PIF Multiplexer takes the longest time to
provide the output whereas Standard Cells take the least.






PIF Mux PIF NAND Std Cells
H SR 2 Bit Serial out Delay, ns 7.5403 5.7129 4.3969
Figure (7.5.6): 2Bit Shift Register; Serial Output Delay comparison
134
7.6: 4Bit shift register
The 2Bit shift register designed in Section 7.4 is used to build 4Bit shift register
as shown in Figure (7.6.1) displaying the standard cell implementation. The output
'Q'
of




























Figure (7.6.1): Shift register 4 bit, Standard cell implementation
135
PIF NAND and PIF Multiplexer based 4 bit shift registers are shown in Figures
(7.6.2) and (7.6.3) respectively. Again the number of cells used is always more than two
times than that used in 2 bit register for the simple reason that the signals need to be
routed from one end of schematic to the other, sometimes at diagonally opposite places.




represent 2 bit shift register,
'B2'
shows the serial data connection between them and
'A1-A2-A3'
gates are the ones used
for signal routing.
PIF NAND uses 612 gates and the logic depth is 33 (25+8). In case ofPIF
Multiplexer, the total cells are 514 with a depth of29 (20+9). Note that the dummy cells


















Figure (7.6.2): 4 Bit Shift Register PIF NAND
137
Figure (7.6.3): 4 Bit Shift Register PIF Multiplexer
138
7.7 Functional verification and Simulation Results of 4Bit shift registers
The target circuits are again driven and loaded by inverters. The
'Clock'
input is a
square wave with period 6.2 ns and
'Load/Shift'
has a period of30.2 ns and a pulse
width of 5.2 ns. This input setting provides one load signal and four shift signals and
takes care of the shift operation through every flip flop in the circuit. The potential of
each logic implementation is measured through quality metrics shown in Table (7.6.1), 4
bit shift register is abbreviated as SR 4 Bit for simplicity. For a circuit delay of 1.51965
ns and 2.15093 ns for PIF NAND and PIF multiplexer, respectively, a throughput of 1 is
practically achievable.
Table (7.7. 1): 4Bit shift register simulation results
SR 4 Bit
Peak Power Avg Power Delay Layout area No. of No. of Serial out
m Watts mWatts ns square um cells Transistors Delay, ns
PIF Mux 7.6075 3.4172 3.561 39557 514 8224 20.425
PIF NAND 9.6425 1.7128 1.965 10898 612 2448 20.5565
Std Cells 7.2025 0.38856 0.356665 7680 24 268 18.7871
As in case of the previous shift registers, PIF NAND with largest number of cells
consumes maximum peak power as plotted in Figure (7.7.1). Considering the difference
in the number of cells and transistors between PIF Multiplexer and Standard cells
circuits, the peak power
consumption does not vary much. Again the reason is the
presence ofwider transistors used by Standard cells implementation.
Due to the more number of transistors the PIF implementation of the circuit
requires more average power than standard cells based circuit. Figure (7.7.2) shows the
comparison with PIFMultiplexer requiring the maximum average power and standard
139
cells the minimum.




0 2 4 6 13 10 1
PIF Mux PIF NAND Std Cells
SR 4 Bit Peak Power
mWatts
7.6075 9.6425 7.2025
Figure (7.7. 1): 4 Bit shift register Peak power dissipation comparison




0 1 2 3 i
PIF Mux PIF NAND Std Cells
SR 4 Bit Ava Power mW 3.4172 1.7128 0.38856
Figure (7.7.2): 4 Bit shift register Average power dissipation comparison
140
Propagation delay [Figure (7.7.3)] ofPIFMultiplexer is the most due to the
transistors delays faced by the signal during transition. The standard cells based circuit
with wider and least number of transistors is fastest. The propagation delay for 4 bit shift
register does not increase much compared to the 2 bit shift register for standard cells
circuit. In case ofPIF implementation there is a difference of about 500- 600 ps but for
standard cells it is just 50 ps.
SR 4 Bit; Delay (ns)





0 1 2 3 '
PIF Mux PIF NAND Std Cells
aSR 4 Bit Delav nsec 3.561 1.965 0.356665
Figure (7.7.3): 4 Bit shift register Propagation delay comparison
The layout area comparison is dominated by standard cells circuit due to the least
number of transistors. PIF NAND uses almost 8 times more transistors than standard cells
and requires the area which is more by the same factor. The number ofPIF Multiplexer
transistors is more than 3 times than that in PIF NAND and so is the layout area.
The number of cells and transistors are plotted in Figures (7.7.5) and (7.7.6)
which prove helpful in analyzing and reasoning the comparison.
141







0 10000 20000 30000 40000 50000
PIF Mux PIF NAND Std Cells
SR 4 Bit Layout area
Square um
39557 10898 7680
Figure (7.7.4): 4Bit shift register Layout area comparison





100 200 300 400 500 600 700







Figure (7.7.5): 4Bit shift register Number of cells comparison
The serial output delay is plotted in Figure (7.7.7). The delay in case of the PIF logic
implementation is almost the same, whereas standard cells is faster by 1.78 ns.
142
Considering this huge circuit and the number of transistors involved this difference lies
within appreciable limits.
SR 4 Bit ; No. ofTransistors
Std Cells |268
PIF NAND 1112448
PIF Mux | | | !8224
0 2000 4000 6000 8000 10000
PIF Mux PIF NAND Std Cells
a SR 4 Bit No. ofTransistors 8224 2448 268
Figure (7.7.6): 4Bit shift register Number of transistors comparison






17.5 18 18.5 19 19.5 20 20.5 21
PIF Mux PIF NAND Std Cells
"SR 4 Bit Serial out 20.425 20.5565 18.7871
Figure (7.7.7): 4 Bit shift register Serial Output Delay comparison
143
7.8: Conclusion:
The PIF logic and the standard cells implementation of the sequential circuits
show that the peak power consumption of the both is almost the same but the average
power varies considerably. The difference in the number of transistors also affects the
circuit performance as seen in the comparison of the various quality metrics. But PIF
logic provides a greatest advantage ofbeing scalable and reusable as is evident from the
implementation of 2 bit shift register using 1 bit and then 4 bit using the 2 bit shift
registers.
144
Chapter 8: Feasibility ofPIF logic circuits in current and future technologies
The individual performance of the various functional blocks has been presented in
the previous chapters. Overall, both power dissipation and propagation delay are larger
for the PIF functional logic blocks compared to the standard cell based implementation.
However, these results have been obtained for the 0.25 um technology node. In this
chapter we will analyze the conditions in which the PIF based blocks can become a
viable alternative to the standard cell based implementation.
8.1: Throughput in PIF logic based circuits
Any PIF block has an intrinsic gate level pipeline characteristic, as shown for
example for the PIF NAND based 1-bit full adder in figure 8.1. Due to this property, it
can process two data sets at the same time. Thus, the power dissipated per data set will be
less than the total power consumed by the entire block, and the rate at which a result is
generated will increase.
The first example is the PIF NAND based 1-bit full adder, figure 8.1. Two data
sets are shown, each marked with a different gray level. To account for data dependent
gate level propagation delays, the difference between two data sets is 9 gate delays. The
logic depth of the adder is 12 and the block propagation delay in the circuit is 552.6 ps
(12 * 46.05 ps). As verified through simulations, a valid logic level has to be stable at its
inputs for at least 400 ps, i.e. a new result can be generated each 400 ps. Note that this is
true for any n-bit adder, based on the same PIF NAND 1-bit full adder.
145
In contrast, the standard cell based 1-bit full adder has a block delay of 368.25 ps
without pipelining capability. This means that the delay after which a result is being
generated increases linearly with n for an n-bit ripple adder. One can argue that other
adder architectures, like the carry-look-ahead, have significant smaller propagation
delays when implemented with standard cells. However, for n > 4, the logic depth
becomes comparable to the distance between two data sets in the PIF NAND based
block. Therefore, from a throughput point of view, the two implementations are
comparable.
Figure (8.1): PIF NAND based 1 bit full adder
The peak power consumed by NAND based PIF adder is less than the standard
cell adder, but the average power ofPIF NAND is more. The average power consumption
per data set is 0.47178 mW (0.72
* 0.65525 mW). The factor 0.72 is derived from the
gate delay difference between two data sets (400ps /552.6 ps), input signal has a pulse
146
width of 400 ps and PIF NAND circuit delay is 552.6 ps. The average power
consumption per data set of standard cells is 0.3175 mW, almost same as PIF NAND. So
for the same average power dissipation PIF NAND has a higher throughput and
consumes less power per data set.
8.2: Power dissipation in current and future technologies
The power dissipated in the future technologies will either be due to interconnects
or leakage currents. If the leakage current dominates the circuit performance, then PIF
logic implementation will suffer to the same extent as the standard cell or may be even
worse due to the increased number of transistors. But given the fact that PIF logic
implementation has intrinsic pipelining the power consumed per data set could be
comparable.
If interconnects will be the main cause of power dissipation then PIF logic
implementation will be a viable alternative. The power dissipation and propagation delay
predictions for future technologies indicate that the global interconnects will affect the
performance very badly (90% power consumption in global interconnects and 6 to 7
times more propagation delay than local interconnects).
Power dissipation in current technologies due to interconnects:
The discussion in this section focuses on the dynamic and static power dissipation
in a system. The SLIP (System Level Interconnect Prediction) conference focuses on the
interconnect performance and its prediction in future technologies. A presentation in the
SLIP [22] provides a graph, Figure (8.2), giving
the current power dissipation in a
147
system. Only 34% of the total dynamic power is utilized in order to drive the transistor
gate input capacitance whereas 15% is wasted in order to charge and discharge the
parasitic capacitance (diffusion) of the transistor drains and 51% is consumed by
interconnects to transfer the signal from one logic gate to other. The interconnect power
is further classified as global (clock and signals) and local (clock and signals)
interconnect power consumption. The global signals consume 53% of the interconnect
power where as local interconnects need 47% of the power.
The graph in Figure (8.3) presents the dynamic power dependency on the fanout,
the power consumed by interconnects for a fanout of 20 is almost as much as the total
power in a system. For a fanout of 2, the power consumed by interconnects is less than
half of the total power, clearly saving about 50% of power consumed. PIF logic
implementation also provides a fanout of 2, thus the power consumption advantage seen
































Totftj Power vs. N*t Length
Figure (8.3): Total dynamic power for a fanout of 2 and 20 - after [22]
Power dissipation in future technologies due to interconnects:
A graph predicting the power dissipation in the future technologies (0.15 nm to
0.022 nm) is shown in Figure (8.4). The power consumed by interconnects will grow to
65%-80% within 5 years, thus reducing this power dissipation is one of the major
challenges faced by the integrated circuit design industry.
The predicted dynamic power dissipated in interconnects is further categorized as
the power dissipated by the local, intermediate and the global interconnects as seen in
figure (8.5). The global interconnects will consume 90% of the total interconnect power
as opposed to 25% in local interconnects. In PIF logic circuits only local interconnects
are present and global interconnects are replaced by chain of gates thus will provide
power efficient designs in the future technologies.
149






SSS - xTRS J0:>1 B&ftag ?.<i3KK<i SSW
*..'.S ><> fl-.S *S .< 0? MPS 3.44$ <<:;> ***?
Tet*H*fojfy gpRtrtkiton |(]









Power dissipation in future technologies due to leakage currents:
Static power dissipation is caused due to the transistor leakage currents and also
due to short circuit current during transition. The current technology trends indicate that
the static power consumption will increase rapidly, reaching one half of the total power
consumption within 3 process generations. Figure (8.6) shows an increase in the static
power consumption for the Intel's past few technologies [23]. Projecting static power
dissipation in the future technologies (beyond 180 nm) indicate that the static and
dynamic power dissipated will be close.
The ITRS (International Technology Roadmap for Semiconductors) provides
static power dissipation prediction in a SRAM cell as a reference, given in Figure (8.7).
With the shrinking device sizes, number of transistors on a single die will increase and
thus static power dissipation will also increase may be to a point where it is equivalent to
the dynamic power dissipation in a system. Since PIF uses more number of transistors to
implement a functional block (as seen in the previous chapters) the power dissipation due
to leakage will affect its performance adversely. It can be argued that for the same
number of transistors, standard cell implementation would also dissipate same or
comparable amount ofpower as the PIF circuits. Given the prediction that more
transistors will consume more static power, in PIF circuits the power saved due to







Figure (8.6): Trends in dynamic and static power dissipation























8.3: Propagation delay prediction in future technologies
The Figure (8.8) gives a graph for propagation delay prediction by ITRS 2003
edition. The delay per 1 mm of wire will increase exponentially for global interconnects
after 45 nm technology where as for local interconnect delay will reduce that its current
value. Even though PIF logic circuits are slow in 0.25 pm technology in the future





















-T ) ! J 1 ! 1 ! ' i ' 1 1 1
' 1 ' 1
10 20 30 40 50 60 70 80 90 100
Tech. Node rn nm
Figure (8.8): Future technology propagation delay prediction
8.4: Conclusion
The PIF logic based designs in the current technology node provide designs which
consume more power than standard cells and have a larger functional block propagation
delay. However, the intrinsic pipelining of PIF
logic circuits will provide a higher
throughput.
If leakage currents in a device dominate the power
consumption in a circuit or a
system, then the PIF based
circuits will more affected than their equivalent standard cell
153
implementations. However, the power loss due to leakage currents will be compensated
by the power savings due to the absence ofglobal interconnects in PIF logic circuits.
Finally, if power dissipation per processed data set will be equivalent, with a
higher throughput and 100% scalability, PIF logic circuits will have the potential to
become an alternative solution to the classic standard cell implementation.
154
Chapter 9: Conclusion and future work
9.1 Summary
Today's design industry has lot of challenges to accept and this is ever increasing
with technology improvement. Issues like global interconnects, power management,
scalability and reusability of design need to be solved for continued progress of circuit
design. CAD tools, which are based upon device models, used by engineers to design a
circuit also need to be upgraded. With decrease in device size and introduction ofnew
materials used in fabrication, correct device and interconnect model derivation becomes
very complex and difficult.
PIF logic methodology provides a simple and elegant solution to these current
problems. By replacing the global interconnects with chain ofgates, problems caused due
to large wire lengths are eliminated, thus makes interconnect modeling a little easier. The
structure ofPIF logic methodology is such that the designs are reusable and scalable.
Only one cell is used to realize any circuit needed which distributes the power uniformly
over the circuit. For a new technology, only one cell needs to be designed and modeled,
thus simplifying scalability and modeling. PIF logic also provides a blending of
schematic and layout; the cells placed in the schematic provide the floor plan for physical
design which reduces the design time required. In essence, PIF logic design methodology
provides a simple solution to today's complex problems.
155
9.2 Future work
9.2.1 PIF based system level design like microprocessor using the library
components designed in this thesis work
The library components from this thesis can be used to develop various other
functional blocks and systems in the same way as the standard cell libraries are used. The
library presented in this thesis also includes large functional blocks like 8-to-l
multiplexer, shift registers which can be easily used to make a complex block like a
microprocessor or memory controllers. Computational functional blocks can be designed
using the adder provided in the library, also larger adders or multipliers can be
constructed.
For a different technology, the same library components can be used with a
modified PIF cell developed for this technology. The layout library can also be used by
just replacing the present PIF cells with new PIF cell.
9.2.2 Computer aided design tool which makes use of the library designed to provide
PIF schematic and layout editors
PIF logic design can be automated using the library components presented in this
research work. The PIF CAD tool will allow users to use the various library components
to build, simulate and lay out complex functional blocks.
The CAD tool developed in [20] can be modified to obtain schematic entry which
makes use of the library components along with the PIF cells which it is presently build
for. This combination would provide a very strong tool in which the user just has to
mention which PIF cell (the 2 cells presented here NAND andMultiplexer plus the new
156
cells) to be used and how the cells should be connected and then he / she will get the PIF
logic based schematic and layout of the design.
9.2.3 Develop PIF cell which can implement a 4 or 5 input gate
The two PIF cells developed so far are meant to be used for a 2 input gate. To
implement a 4 or 5 input gate more cells and a greater logic depth than 1 or 2 is needed
by the current PIF cells. A PIF cell which can implement 3 or 4 input gate can be
developed to be used in more complex functions. It can also be used to build a library
which can be further used in the PIF CAD tool library data base.
157
References
(1) Richard Retanubun, Dorin Patru, P RMukund, "Pipelined Interconnect Free
Logic", Proc. IEEE International Systems-On-Chip (SoC) Conf. 2003, pp. 1 7-20.
(2)
"
Pipelined interconnect free methodology", a graduate thesis by Richard
Retanubun at Rochester Institute ofTechnology, July 2003.
(3) International Technology Roadmap for Semiconductors 2004 edition available at :
http://public, itrs.net/Files/2004ITRS/Hqme2_QMhM
(4) Jan M Rabaey, Anantha Chanrakasan and Borivoje Nikolic, "Digital Integrated
Circuits: A design
perspective,"
Prentice-HallElectronics and VLSI Series,
2nd
edition, 2003.
(5) Ivan Sutherland, Bob Sproull and David Harris, "Logical Effort: Designing fast
CMOS
circuits,"
Morgan Kaufmann Publishers, Inc, 1999.
(6) Jean-pierre Schoellkopf, "Impact of Interconnect Performances on Circuit Design",
Proc. IEEE International Interconnect Technology Conf. 1998, pp 53-55.
(7) J. Baliga, "Chips go vertical [3D IC interconnection]", IEEE Spectrum, March
2004, Vol. 41, Issue 3, pp 43-47.
(8) Zarkesh-Ha P, Meindl J D, "An integrated architecture for global interconnects in a
gigascale system-on-a-chip (GSoC)", Microelectronics 2000. ICM 2000
Proceedings of the
12th
International Conference, pp 149-152.
(9) Naeemi A, Venkatesan R, Meindl J D, "System-on-a-chip Global Interconnect
optimization", in ASIC / SoC Conference 2002, pp.399-403.
158
(10) Zhiyuan Ren, Bruce Krogh, RaduMarculescu, "Hierarchical Adaptive Dynamic
PowerManagement", IEEETtransactions on Computers 2005, Vol. 54, Issue 4, pp
409-420.
(1 1) Yung-Hsiang Lu, Giovanni De Micheli, "Comparing System-Level Power
Management Policies", IEEE Transactions on Design and Test ofComputers 2001,
Vol. 18, Issue 2, pp 10-19.
(12) Tajana Simunic, Stephen Boyd, "Managing Power consumption in Networks on
Chips", Proc. Design, Automation and Test in Europe Conf. 2002, pp 110-116.
(13) KresimirMihic, Tajana Simunic, Giovanni De Micheli, "Reliability and Power
Management of Integrated Systems", Euromicro Symposium onDigital System
Design 2004, pp 5-11.
(14) Junmou Zhang, Eby Friedman, "Crosstalk NoiseModel for Shielded
Interconnects in VLSI-based Circuits", Proc. IEEEInternational Systems-On-Chip
(SoC) Conf. 2003, pp. 243-244.
(15) http://www.hpl.hp.conVnews/2005/jan-mar/crossbar,html .
(16) Ananth Dodabalapur, "The Future of organic Semiconductor Devices", Proc.
Device Research Conf. 2000, pp 11-14.
(17) J. Zaumseil, T. W. Lee, J. W. P. Hsu, Y. L. loo, R Cirelli, J. A. Rogers,
"Nanoscale Transportation in Organic Transistors and LEDs", Proc. Device
Research Conf. 2003, pp 199-200.
(18) M. Tanaka, F. Matsuzaki, T. Kondo, N. Nakajima, Y. Yamanashi, A. Fujimaki,
H. Hayakawa, N. Yoshikawa, H. Terai, S. Yorozu, "A Single-Flux-Quantum Logic
159
PrototypeMicroprocessor", Proc. IEEEInternational Solid-State Circuits Conf.
2004, pp 298-307 and 529.




(20) "Computer aided design of the pipelined interconnect free logic", a graduate
thesis by Vaibhav Avachat at Rochester Institute ofTechnology, July 2002.






(22) N. Magen, A. Kolodny, U. Weiser, N. Shamir, "Interconnect-Power Dissipation
in aMicroprocessor", http://www.sliponline.org/SLFP04/Presentations/Sessionl/l-
2_Magin.pdf .
(23) J. Adam, Gurindar Sohi, "A Static PowerModel for Architects", Proc. Annual
IEEE /ACMInternational Symposium onMicroarchitecture 2000, pp 191-201.
N. Magen, A. Kolodny, U. Weiser, N. Shamir
Appendix A: Short forms used in the write up
160













4-to-l multiplexer based PIF logic circuit
2 input nand gate based PIF logic circuit

















m,, -: Total Current





A-.-.-.-}ffffA\\-f.-.\--.-rrrr r -f m 3 ^nn^r^^^^4"i"i"iVi*ri'i"iVri"i"i"i'"i*"i""*''"J- ....i.***..A ... J....L....L... J....*...L..J.fcfchifcfcA.....ki***fci..it ... J .L.nJ...>J...V...A...J....l...V...J.n.l....t.J.>. V...A...J


























/"-i ,-* j!~~A y~-* >Wt y4 /~4 .*"~* rf~W\
tm ^*S*ji#Aitf ^*W**M4 Wmn/ *%***** *tMMftdf W^^J ^*S#**j*J V*<m**hJ in^^^i t*^Ji^
-1(5
vJ I l~~J LJ I I I I i I I L j i i~j L .4****AA*****MAAAAAAA^*AAAMAMAAAAA*AWA*i*MMMAA\A**4*ft*A4**WWfc*****fc**'vJfc**
2,550
*MD1




wa.. ; W Current

















1 '"* ii h.8n










M 1... j.... i. . ..,... i. .. i ..;., ; ; :,, x^wiXJwiwJiwU<JJm>^.^ 3pppwdwcc6pwlioooc6cdwd














^ra t: Input DU D3
20m





i, ;....<.. } j i 1




time i s i
















*'jt""""" VmilMIMMIHMrt IUU>. WW
"**&
<**<<<*w<'0M*&**'*<-6^
















?; Input P| 02
i>:V\U;K.?m t.-<....j...^v,t<...4...'....i-i J U~JJ i UJ L~i i I i i-~l i l~J i 1 Uvi




Multiplexer 4-to-l; PIF NAND based circuit simulation results
166




300i- *; J^ tont
iiu





ma ?: Input Dl
^^^^jj^i j^^^^^Jr fct^f^*^*!





vb+svU*MJiMvvi++*vbv*vJ*i****+++**+++J*++*iijirLnAuumAnnn^jWLrWriWJLn^^ 1...J J tv,^l^MXWMJwwtwwww^^^
04 2Jn 4lr Sin Bin \k
is









V ( '""i""' '*" '"**'*"^*~ * I ,.i,...J......>... j... .)....! 4....;.,..i,.,..< ...,i...l ;...J....J....I...j....j....i...j....i....i...j.vl.y.






























'' *' ^MW.'-JV>*,*^^U*^.isV^^.s-.i.w..M>J*SJX*^^A*^v^^ *' .''i,i,ij .< ,>H^*!,
ii'yfti"W'Wiirtihilm r*t Jlllllllllllllllll1'lln"[ll
vwj : J I J I i-sJ,*j uJ
0^
1: Input D1, D3, D5, D7
A
ik:
: input D0, 02 D4?0S
time 1 s )





J \ fJ*********Imlwl ' ' ' k*Al \_/ \
r~



















^.1A1.^^411.3-..L.j....:....t...i i^ju^^^,,; UwUvj uJ~~ i i i ; U~J i i I i i 1~>; UvJ i . i LJ iwj^u.a^J.,.,^,Aw,.uJ
op




k Input Di, 02, Di 06




aw., i: Total Current
A
V
^ ?: Input Selec
14 L.
1/ \ \ , r~\ r~~\y iw / \w >' ^w / w








;rrnnn i'i i.Tli 1 1 11 1 1 1 i'i 1 1 1 Pi 1 1 il 1 1 1 r*i 1 1 1 ?i ii il i . . i'i i Hi*. . i .1 1 nr'n 1 1 ^, , , i f 1 1 1 i-i 1 1 ifn 1 1 fn- t'l m ^i irrFi i IT - 1 1 ?i i - ?nn' -]- 1 -' i n ' - ' ' < - - J ' I--- J*JL
HJ-
T~
I rf'w"*>wffii iilHniHlii""$ WhKiH tJjii




: Input Dl, D3, 05, PI
tk WiH.il ' It ^""W
%A r V 1/ : * I
(<jif; !......( i ) : i I U~J 1 1 J U..X. J I i 1 i i~~l i ! i I 1 ' !~J I ' ' I I




Input D0, 02. D4, U
*_.
k* A-. J V* A^
I VK-' ffl LnXuui.^i.V^^JUuu^ruxnl.luu1fjuuulaann^n^-ll--^----J UwJ 1 PwwA^^vmsJ-sJw^J UwiwJJ 3 iJ i~J U*J 1 L**A~I 1 ! LJ 1 ; LX i~wi iwJ-*J
U 2,0n 44n 6.0n Sin 10n
time \ %





















-m nln. il. ..).... I. ...mill.
L





1 Bit Full Adder; Standard Cells based circuit simulation results
171





-1~~i~~' ' ^~' >i ' ! 1* i~~s~~i (~_! ii I l~i i l_j Uwl i u~L 1......1 ; t j...a....u.g...J....u..j l l_j I.....L ;...J...
in : Output Cout
\ttftWttttMtHHj W* *UHUUiiiiHlH*U)l ^tiUtt^iUiiiUiUtj
^""k*"'*'***'''^^







i : Input B
14
y>w<w*iiw*iH tfl*WfW!Wll<WiWiW































tiro ( s !

















f: Input Dl, 02
Ss
-ifriL
time i s 1
























y 2Jn It $.0n Sin
time ( s 1














i: Input 01, D2
A
-Hto
tJ lin ifo t* 84n il

























<Httt^HM<MM(V / ^*StftW**)M*M*yM/ ^*V**n
p--hwkvv4'*sjs'^Vv'*'Ma'^s4s*i*iJ^
tWIllttll<t^ / ^^^(#IIMIIIItlHtl|!tl HW*I<MW>IMI








o: Input Dl, D2f 03, D4, OS, PS
)iiiilin<i,.,l,,,ifcj4<.160m


























: Input Dl, 02,03,1)+, 05, 06
Encoder 8-to-3; PIF NAND based circuit simulation results
178
&jmi<&*i<m*&#&&miHl
UhiuttUti#.,,l.^.i...i.. *+++b*wbw+J*i''' i .TPwX4+^W
jl#MiM*<&i*m&**m*t*4pt















.: Input 00,01, 02,03,% DM









f\, UMMMMIMIttMmMi>i<MMin,iMMI iiiiiiiiHiiiuiiti||i.M wmim i iT
~l t s,:i...J.J Ui~i~^J.,J t.,.,1 i...
/*S
J
























-; a Ik lnpt 00

















time f s )





































































time { s I
*. !









i ^ Output Yl
10 . .i..rJ..r.l.








IMIUMMMMM *. *llllllllII IM11 1MnJi1 1




-/, . . . \






\<V \,..i..~i~J.~A.i.,>, t .1 I! t~3 I I I Lia...,:,..t.: i ;,",i. ~A
, f , A
-I i 1 i I : i ;....>. ; >,...: J....1...S i I t i I i I ; .
i : Total Current
v
...























&$?0 ?r 4Jn 6* 8Jn 1i
time ( 5
Decoder 2-to-4; PIF Multiplexer based circuit simulation results
182
v
u ^: Output W
Ami.MM.M.M.I I II IIIMUM I III I.MMMMMMM.II
I j \















2 1. 12, % % % % Ti
a-




^1 iuv"""""" "'"' '""*JV (yiiiiiiiiuniiiiiiiiim A I k/V""!""111
iill, 1, liiiiiii.iLiiliiuui.Jcui^iJiiiiliiu^iiiiluiuinJiii, iiiiiliii.li JiiuiiiuiiVi.iiiJiuiiiiuiiuiu
^urren














<J\ &< !wUw imi^imii/Iv^ Snwi\|\ JV> imftwumim ""4|
V>
Jjj.rJ-iwAAAAWLTLrJx-L-L-jffrnr




.*v /**>* "V \^
^MMl)MAM'''MlUiUlllill"Vs"
[(AtiiiiifcttiXi^iitiitiiiAnritiiiiiitiwIwtAtWiit
I.H.nnlninJ^nnJnnn.finilm^rntnntmrln^rn^"^-1 J tU*UwtwXwJw >t.k.t.WiUwiw*iJi*^^
ISS iiil 8*
tims i <! i





















' mhW YO V", VI Y^ ViS Vi
LM,
> rwvwyV1*













liiiiiliiiJiiiilunlu^uiiluuiiiiiiiiiliiiniiii'iiiJi.i'in ).u,;,iii'uiiiliiiii.,:.i..:.iiii.iiin,|ii A tnmiuiii U*k*J**M*<Li*il,
.A
.<<wU
#, : input Dl D2
6
>





)^fl ill Mn g&i
t5i















! ,!i 'iiiinfuAiiiiliilii^iiliiliiiiiiiiiiiiiilnhihil-''1 .-'.?--J- - J. i. . J. . J. . i. . 3 . a. . : . . : : : i i .< i 1 = ,.. J.. J..J..1..1.. l..:,,i,. ;..;,,:,. ;..i..^.ji^JL^. ,,..,, I,
J,,.i,t,-l..^.^J..J..^J^L,JJwJJ^.
Input PL K





! 12n Ho 16r





* ?: Output Q
-i n n ft
i i i i i
n.






iii i i i i i i i i i
\ f^ifi^adr^
_l I I I I I L_J I LJ I L-
1,
i: Tote I Current
V -
i a v: Input Load_/Shift
'
1/ "1 . r l. r4 ff -W II I I Bll II I II * i I'l iif^^f^ II I l|l t
I I'''' 1 1 1 1 1 1 1 u
A
>
7 a a: Input Clock
I I I
'
I I I I I \ I I I I Al 1 1 L. *. i_L^




v 2.470 I I I I 1 1 1 1 1 L. 1 *i 1 1 1 1 i 1 1 I
\\jJ^-*a^*
ik 6,0n
time ( s )









i i i i i i i i i i i i i i 1 1 1 1 1 1i i i i i i i i ,
\





pn wwwwww m mm nAnnninnnpi nn nnm nfc
J I I I I I I I I I I I I I I I I I 1 I I I I I I I I I I I I I I I I I '''I l I l I I I I l l I
i: Tote I Current
<
v-
_l I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I L
7 a a: Input Load_/Shift
V -
J L_J I I I I I I I I I I I I I L.
} , H \ f
I I I I I I l__l I I I I I I I I I L
7 a v: Input Clock
_u




























-L-i l I l l I II II l ll I l 1 I I I l I l_l I I 'I I i




i i i i i i i i i i i i i i i i i i i i iii, j i i i i i_i i i i i iii'
ji i i i i i ii i i i i i i_i i i i i i i i_i i i i i i i i i i i i i i i
3
a 7: Input Clock





\ r \-> /
j 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 i_j 1 1 i 1 1 1 1 i 1 1 1 1 1 1 1 1 1 1 1 i_i 1
2 g^
-: Input SR_In
























j i i '
Htr~r
i~tHHt i i j i i i i i i_ I I * I I i i i jzzr i i i i i_
Total Current
<
-lu L*-^ Lm h r A * nf A ^ + Kt n iA =^y
a t b a: Input y_/Shift





-: Input D0, D1
> im hr , r , ++J I I I I I I I I I I I I I I I I I 1 !I 1 1 1 LT t +
-m*
^m J I I I I I I Lm
{k\: Input 5R_If
>
-Mink .4 , ',J'.l-'. ,'f'i, y . ,'-', ,J. . '.'ft ,i. > , .** 1 ", , f- . *, ,+^H
v
0,0 m 18n 27n 36n
time ( s )












1 I I L I I I I ( I -I I I I I L. -r-1-*- ~ ' I I I I I I I I 1 I L-
i: Output 01
>




> _i'h \ I
j i i i_n i i i I
i~Pt j I I I i I i I i c i i 1 i i i l i
i1
















: Input Ml, Dl
'^l^,,^ |-#ltfi^fW^




4 94n 18n 27n 36n
time ( s

























-14 I i i i i i it ii / J I I I I I I L T= tl I I J f
a 0 0m *'< Total Current





-14 \L \ 1 1 1 1 1 ifi 1 1 1 11-
.
H










04 9.0n 18n 27n 36n
time ( s













i i i i i i i i I I I I I I"1' I
<: Output 01





J I I I I I I L ll I I I I I I I II I I I I I I I I I I 1^ J I I I U III*
31m
Total Current
< -IJrWr^T-^! I hfrl I fll I l-frl I -fll I !*! I ^WW^tWVW^ ,h l !^-|
V
7: Input Load/Shift
I I I I I I I f I I 1 1 I 1 1~T
r






-: Input D0, Dl, D2, 03
1̂ 1 1 1 1 1 1 1 1 I 1 I I I I I I 1 ll II I I I I II I L
"-rr^
jJ,
J__1_J I I I I L
^
3
a 3^1: Input SRJn
>
-9flmt>. . .'>. bw , 1 r", , '', , , ! 1 . . , .-K 1 a .iiHt. .IV , "i , .*mi
v
04 94n 18n 27n 36n 45n
time ( s )










I I I I I II









>-14 r*rr i ii i i
II M I







j i i i i i i i i i i '''' ' : ' I T-1, I J I I I I I ' ' '
>



















4 Bit Shift Register OutputWaveforms; Standard Cells based circuit simulation results
193
3, ?: Output Q0
> /
-J i I i I I i I I l!j i i i l .tr t I I I T






-I I I I I L . I j i i i ' i~T~!^
7*
-i i i i i i Lt.
3i y Output Q2
>
_i i i ' -i i i i i i i ' ' i i i i
n
> ' ' I ' ' 3 '
'





-i i i i i ' i -i i i i i i_ j i i i i i i i i_i ' ' ' '









-I 1 1 Ll_l 1 I I I I I I I 1 I I I I I I I i' i , . , .J -Li I I I I I I 1_
^30
a; Input Clock
\m I I I I I ii. i I | | | | i i i | | I |_
U-T~UJ L_JTill I I i * i _i i i 'ill*J I l_I I I I L I I I I I I L_
^2.5020
"; InPut D0' D1- D2- D3
>2.4990 re. r****^ it***^ .-*frrr# ,-tfrrrm . ffV'f*1 , t#> ,#, , #>: ,u
i: Input SRJn
*tf .11^
I.^ ,^.1,1^ H^ .11^
gjjfe^jij-
9,0n 18n 27n
time ( s )
36n 45n
















































-I I I I I I T T H I I I I I l l I I
"




' ' ' ' '
'
fc ' i i i i
'




1 I I i^H i i ' I J I I I L. i i i i A i i i i I i i i* I' J I I I I I l_
i: Output /Q1
i I Ii ' i r>\i i iM j i i i i i i i i l i i i i i i i i i ' " i ' i i ' i i iJJ=fc
7: Output 02
f T-i
I Hr I 1 1 ifi I1 1 1 1 1 1 1 1
a: Output /Q2
: Output Q3
A 1 1 l 1 1 1 1 1A1 1 1 1 l 1 1 1 1 A 1 * , 1 1 l j 1 1 1 i_ ikt ^^
' ' ' I I I I I I I I 1 l I I I I L. -I I I I l_
"I
1111
1 : Output /Q3





-I I I I L
,' 1 1 l 1 1 1 1 mi 1 1 1 I 1 1 1 1 1 1 1 1 I t 1 : ; ; r 3=*
*\ Input Clock
1
1 1' 1 L 1 1 } 1 1 i'LTI i i iiTi
36n 45n





















1 f *H V,
fjy
i i i i i i i i i i i i i *>*ii i I i i m i i] = I I I I I I I I L_li m A
I: Output Q1
i i i i i i i i i i i iii ,i--f | I I I I I T^- i i i i rn i i i i_j i i l_i_
.: Output 02
I I I I I I I I I I I f I I I l l l l l'l *TIT5! I I T I V * I I ^s J I I I I I l_
3
n t>: Output 03
-L0 [ '''' ' ' ' i i i i i i i i i i i i i i '> r<
i: Iota I Current
a b: Input Load_/Shift
J E/, , ,"'7TI. i i T*II W I I I I I I I I I I I I I I ) ill I I I
v: Input Clock



















r11111 I I I 3
A ; y
i iA i i 1 I i i i A
-A
i i i i i <w. . . . i*
-: Output /Q0





''i i i 'i i i i i n i i i i i i i
i^- Zr-rn i i i i i i i i i
-: Output /Q1












''''''''''''' L_l I I _l I I l_l I I LL1T *\ X j i i l_in
t
,; Output /Q2
1 I I ' I ' I ' flb l ' I ' ' ' ' 1
'
i i i I i iTflV
K Output Q3
rr\fc
i i i i i i i i i I i i i i I i i i | | i i ' ! i | | l_j | | | l*
?; Output /Q3




time ( s )
45n
4 Bit Shift Register OutputWaveforms; PIFMultiplexer based circuit simulation results
APPENDK B: PIF design layouts
197




PIF Multiplexer based Multiplexer 2-to-l
199
*<**<S***I<- :****::*





w w^ '̂-V'-^ ^^-. i-jfr
"'v'
-^22
*-*% & ter>* is*,*** ft
y.ys*rs5*<~i!fe





****** a: a* .** -3!
y.
:****& afe








a -v A-.y.y. * - .y- 'A 3Ex-yf x- ^v x*a >' A' -v.. a'a^x- at: x-
: ** *> * **




..** ** *:S: .. *
&*$ * *:* *
&~*Vfg%. <***# Jx^-'l. S^St* .#3Fs#l. f:*&*Ajr






















S> * * ^ J ~- *
V# *v. ** *r*$ *' %
**%& **% o^x"*?<; s***%ill



















:&-^^a>t ft^^si %iei# i* flj ^* t?mim^m^k
'yt* . *; ";-
*
? *"** :<y- * -v.x*
*'
S -yTy*- * A *: .

















PIFMultiplexer based Encoder 4-to-2
203
*,m* m* *, *???*mp m*#**>:ap *m **?*
s~t$$mA& mmmi*&'#t





imam*** ** ? mm*** mmm?m *. *mm<& *mmm*%mmmmmmm&vz
mmmm^Ia^I*8 1* fW *&$$&%&mwmmmmwmmmmm








t mmmi m*mm*, %mm%%
mw&&f%*
*$wmvti ^m^tp, -mmmm
A yXsA*:5*fAyXy J , ^ fe - ., A****' S jfJ^SS?
&>'
V| f y3 Sa jWlf.' ^- VA^^^* 5a, *
f' ** **** > >J*a ** S"**** y< '*#*. *.SH>:Wi*S* >S*? > **> i**' *>** .>*a X**'.*!*** * ***
*m IJ **** m* $ $m mm* .< **mm**> *m *m ii*s.*mmtM>.* m m h*imm
^> \ '>m:\"i* ^3 a! i<,?&jk
^"







m&&mmilium mUxhmmtm^m m%Wkmmi^i^m m$Mmm j
^iJr^ltlllioKl ':Li:#HyyJxy?||y^Iy i^M^L Z '^TIjlf ^ <^ if*lf .
yissessym s. S53SSSSKSfe
PIF NAND based Encoder 4-to-2
204
H*VyWOfXi>NNNN-**<
PIFMultiplexer based Encoder 8-to-3
