Analysis of logic block architectures and functional improvement of fine grained cells by Ramnath, Rohith
UNLV Retrospective Theses & Dissertations 
1-1-2005 
Analysis of logic block architectures and functional improvement 
of fine grained cells 
Rohith Ramnath 
University of Nevada, Las Vegas 
Follow this and additional works at: https://digitalscholarship.unlv.edu/rtds 
Repository Citation 
Ramnath, Rohith, "Analysis of logic block architectures and functional improvement of fine grained cells" 
(2005). UNLV Retrospective Theses & Dissertations. 1849. 
https://digitalscholarship.unlv.edu/rtds/1849 
This Thesis is protected by copyright and/or related rights. It has been brought to you by Digital Scholarship@UNLV 
with permission from the rights-holder(s). You are free to use this Thesis in any way that is permitted by the 
copyright and related rights legislation that applies to your use. For other uses you need to obtain permission from 
the rights-holder(s) directly, unless additional rights are indicated by a Creative Commons license in the record and/
or on the work itself. 
 
This Thesis has been accepted for inclusion in UNLV Retrospective Theses & Dissertations by an authorized 
administrator of Digital Scholarship@UNLV. For more information, please contact digitalscholarship@unlv.edu. 
ANALYSIS OF LOGIC BLOCK ARCHITECTURES AND FUNCTIONAL 
IMPROVEMENT OF FINE GRAINED CELLS
by
Rohith Ram nath
Bachelor of Engineering 
University of Mysore, India 
2001
A thesis subm itted in partial fulfillment 
of the requirem ent for the
Master of Science Degree in Electrical Engineering 
Department of Electrical and Computer Engineering 
Howard R. Hughes College of Engineering
Graduate CoUege 
University of Nevada, Las Vegas 
August 2005
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
UMI Number: 1429723
INFORMATION TO USERS
The quality of this reproduction is dependent upon the quality of the copy 
submitted. Broken or indistinct print, colored or poor quality illustrations and 
photographs, print bleed-through, substandard margins, and improper 
alignment can adversely affect reproduction.
In the unlikely event that the author did not send a complete manuscript 
and there are missing pages, these will be noted. Also, if unauthorized 
copyright material had to be removed, a note will indicate the deletion.
UMI
UMI Microform 1429723 
Copyright 2006 by ProQuest Information and Learning Company. 
All rights reserved. This microform edition is protected against 
unauthorized copying under Title 17, United States Code.
ProQuest Information and Learning Company 
300 North Zeeb Road 
P.O. Box 1346 
Ann Arbor, Ml 48106-1346
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
Thesis Approval
The G raduate College 
University of N evada, Las Vegas
July 13 ,20 05
The Thesis prepared by
Rohith Ramnath
Entitled
"Analysis of Logic Block Architectures and Functional
Improvement of Fine Grained Cells"
is approved in partial fulfillment of the requirem ents for the degree of 
________________ M a s te r  o f  S c ie n c e  i n  E l e c t r i c a l  E n g in e e r in g
Examin ition Committee M ember
Committee M ember
'%  P /.
Gradua'tFCoUege Faculty Representative
Exam im m on C m m itte e  Chair
Dean o f the Graduate CoUege
11
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
ABSTRACT
Analysis of Logic Block Architectures and Functional 
Improvement of Fine Grained Cells
by
Rohith Ram nath
Dr. Yingtao Jiang, Exam ination Committee Chair 
A ssistant Professor 
D epartm ent of Electrical & Computer Engineering 
University of Nevada, Las Vegas
The first objective of this research project was to evaluate the 
performance of various logic block architectures in FPGAs. Since logic blocks 
widely vary in size, functionality and complexity, we were motivated to 
explore them  in detail. For our study, logic blocks from Actel, Altera, 
Quicklogic and Xilinx were chosen along w ith some designs discussed in the 
academia. These cells were either multiplexer based or look-up-table (LUT) 
based. S tructural VHDL models of all these blocks were constructed and 
benchmarks circuits were mapped. Results at th is stage suggested that, 
although the coarse grained cells occupied more area  and showed poor 
utilization, they were considerably faster th an  the fine grained cells.
The second objective was to improve the performance of the Actel 
Proasicplus (fine grained) logic block by enhancing its functional capahilities. 
During this process we came up with three modified architectures. These new
111
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
cells were laid out in MAGIC using a TSMC 0.18pm technology file with X = 
0.09pm and. the extracted files were sim ulated in PSpice. This transistor level 
data helped us to estim ate the area and propagation delay of the new 
architectures. The modified architectures were also tested for performance by 
implementing the previous benchm arks and a significant improvement in 
speed, occupied area and utilization was observed.
IV
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
LIST OF ACRONYMS
Al- Area of a Logic Block
ASIC: Application Specific In tegrated  Circuit
B i: B enchm ark- 1
B2: Benchmark -  2
B3: Benchmark -  3
B4: Benchmark — 4
CAD: Computer Aided Design
CLB: Configurable Logic Block
CLK: Clock
CLR: Clear
CMOS: Complimentary M etal Oxide Semi-conductor
Dl: Delay of a Logic Block
DSP: Digital Signal Processing
DSPLM: Digital Signal Processing Logic Module
DSPFPGA: Digital Signal Processing Field Programmable Gate Array
DSRFPGA: Digit Serial Reconfigurable Field Programmable Gate Array
FPGA- Field Programmable Gate Array
HDL: Hardware Description Language
I/O: Input/O utput
lOB: Input Output Block
LAB: Logic Array Block
LB: Logic Block
LC: Logic Cell
LE: Logic Element
LM: Logic Module
LPPGA: Low Power Program mable Gate Array
LUT: Look-Up Table
MAC: Multiply Accumulate
MPGA: Mask Programmable Gate Array
MUX: M ultiplexer
Nl: Number of Logic Blocks
Nlq: Number of Logic Blocks in the Critical Path
NMOS: Negative Channel M etal Oxide Semi-conductor
PLD: Programmable Logic Devices
PMOS: Positive Channel M etal Oxide Semi-conductor
Ra: Routing Area
RAM: Random Access Memory
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
Ro: Routing Delay
RTL: Register Transfer Level
SRAM: Static Random Access Memory
VHDL: VHSIC Hardware Description Language
VHSIC: Very High Speed In tegrated  Circuit
TSMC: Taiwan Semi-conductor M anufacturing Corporation
VI
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
LIST OF FIGURES
Figure 2.1 A Conceptual FPGA...........................................................................06
Figure 2.2 Island Style A rchitecture................................................................. 07
Figure 2.3 Row based A rchitecture.................................................................... 08
Figure 2.4 Basic Logic E lem ent...........................................................................11
Figure 2.5 Cluster hased logic blocks.................................................................13
Figure 3.1 CMOS Inverte r....................................................................................18
Figure 3.2 Actel’s Proasicplus Logic Block....................................................... 20
Figure 3.3 Flash Switch versus M ultiplexer.................................................... 22
Figure 3.4 A ltera’s S tratix  Logic E lem ent....................................................... 24
Figure 3.5 A ltera’s S tratix  in Normal Mode.....................................................26
Figure 3.6 A ltera’s S tra tix  in Dynamic Arithmetic M ode.............................26
Figure 3.7 Quicklogic’s Eclipse II Logic Cell.................................................... 32
Figure 3.8 Slice of Spartan  II CLB.....................................................................36
Figure 3.9 The DSP Logic Module......................................................................40
Figure 3.10 Architecture of the DSR FPGA Logic Block.................................41
Figure 3.11 Digit Serial Logic Block....................................................................42
Figure 3.12 The Low Power Program mable Gate A rray................................. 44
Figure 3.13 Boolean Function Im plem entation in Actel’s Proasicplus 48
Figure 4.1 Comparison of Num ber of Logic Blocks for benchm ark-1......... 56
Figure 4.2 Comparison of Occupied Area for benchm ark-1.......................... 57
Figure 4.3 Comparison of Delay when Rn = 0 for benchm ark-1 ..................58
Figure 4.4 Comparison of Delay when Rd -  IODl for benchm ark-1........... 58
Figure 4.5 Comparison of U tilization for benchm ark-1................................. 59
Figure 4.6 Comparison of Num ber of Logic Blocks for benchm ark-2......... 60
Figure 4.7 Comparison of Occupied A rea for henchm ark-2.......................... 61
Figure 4.8 Comparison of Delay when Rd = 0 for henchm ark-2.................. 62
Figure 4.9 Comparison of Delay when Rd -  IODl for henchm ark-2........... 62
Figure 4.10 Comparison of U tilization for henchm ark-2................................. 63
Figure 4.11 Comparison of Num ber of Logic Blocks for benchm ark-3......... 64
Figure 4.12 Comparison of Occupied Area for benchm ark-3..........................65
Figure 4.13 Comparison of Delay when Rd = 0 for benchm ark-3.................. 66
Figure 4.14 Comparison of Delay when Rd -  IODl for benchm ark-3........... 66
Figure 4.15 Comparison of U tilization for benchm ark-3................................. 67
Figure 4.16 Comparison of Num ber of Logic Blocks for benchm ark-4......... 68
Figure 4.17 Comparison of Occupied Area for benchm ark-4..........................69
Figure 4.18 Comparison of Delay when Rd = 0 for benchm ark-4.................. 70
Figure 4.19 Comparison of Delay when Rd = IODl for benchm ark-4........... 70
V I1
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
Figure 4.20 Comparison of Utilization for benchm ark-4................................ 71
Figure 4.21 The Proasicplus logic tile ................................................................. 75
Figure 4.22 Architecture of M odi........................................................................ 77
Figure 4.23 Architecture of Mod2........................................................................ 79
Figure 4.24 Architecture of M od3....................................................................... 81
Figure 4.25 Layout of Actel Proasicplus Logic B lock......................................83
Figure 4.26 Layout of Mod 1...................................................................................84
Figure 4.27 Layout of Mod2...................................................................................85
Figure 4.28 Layout of Mod3...................................................................................86
Figure A l 2-AND Gate using Actel’s Proasicplus Logic B lock.................. 93
Figure A2 2-OR Gate using Actel’s Proasicplus Logic B lock......................93
Figure A3 2-XOR Gate using Actel’s Proasicplus Logic Block....................94
Figure A4 3 AND Gate using Actel’s Proasicplus Logic B lock...................94
Figure A5 3 N 0R  Gate using Actel’s Proasicplus Logic B lock...................95
Figure A6 DFF using Actel’s Proasicplus Logic B lock.................................95
Figure B l 2 AND Gate using M odi arch itectu re .......................................... 96
Figure B2 2 0R  Gate using M odi arch itecture ............................................. 96
Figure B3 2-XOR Gate using Modi a rch itec tu re .......................................... 97
Figure B4 3 AND Gate using Modi arch itectu re .......................................... 97
Figure B5 3 N 0R  Gate using M odi arch itectu re .......................................... 98
Figure B6 3-NAND Gate using M odi arch itec tu re .......................................98
Figure B7 3 0R  Gate using M odi arch itecture ............................................. 99
Figure C l 2-AND Gate using Mod2 arch itectu re ........................................ 100
Figure 02 2-OR Gate using Mod2 arch itecture ........................................... 101
Figure 03 2-XOR Gate using Mod2 a rch itec tu re ........................................101
Figure 04 3 AND Gate using Mod2 arch itectu re ........................................ 102
Figure 05 3 N 0R  Gate using Mod2 arch itectu re ........................................ 102
Figure 06 3-NAND Gate using Mod2 a rch itec tu re .....................................103
Figure 07 3 0R  Gate using Mod2 arch itecture ........................................... 103
Figure 08 3 X0R Gate using Mod2 a rch itec tu re ........................................ 104
Figure 09 4 AND Gate using Mod2 arch itectu re ........................................104
Figure 010 4 N 0R  Gate using Mod2 arch itectu re ........................................105
Figure O il  4 NAND Gate using Mod2 arch itec tu re .....................................105
Figure 012 4 0R  Gate using Mod2 arch itecture ........................................... 106
Figure 013 5 AND Gate using Mod2 arch itectu re ........................................ 106
Figure 014 5 N 0R  Gate using Mod2 arch itectu re ........................................ 107
Figure 015 5 NAND Gate using Mod2 arch itec tu re .....................................107
Figure 016 5 0R  Gate using Mod2 arch itecture ........................................... 108
Figure 017 DFF using Mod2 architecture ...................................................... 108
Figure D l 2 AND Gate using Mod2 arch itecture ........................................ 109
Figure D2 2-OR Gate using Mod2 architecture........................................... 110
Figure D3 2-XOR Gate using Mod2 a rch itec tu re ........................................ 110
Figure D4 3-AND Gate using Mod2 arch itecture ........................................ I l l
Figure D5 3-NOR Gate using Mod2 arch itecture ........................................ I l l
vin
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
Figure D6 3-NAND Gate using Mod2 architecture..................................... 112
Figure D7 3-OR Gate using Mod2 architecture............................................112
Figure D8 3-XOR Gate using Mod2 arch itectu re ........................................ 113
Figure D9 4 AND Gate using Mod2 architecture........................................ 113
Figure DlO 4 N 0R  Gate using Mod2 architecture........................................ 114
Figure D U  4-NAND Gate using Mod2 arch itecture .....................................114
Figure D12 4-OR Gate using Mod2 architecture............................................115
Figure D13 5-AND Gate using Mod2 architecture .........................................115
Figure D14 5 N 0R  Gate using Mod2 architecture .........................................116
Figure D15 5-NAND Gate using Mod2 architecture ..................................... 116
Figure D16 5-OR Gate using Mod2 architecture............................................117
Figure D17 DFF using Mod2 architecture.......................................................117
IX
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
LIST OF TABLES
Table 3.1 Area and Delay estim ates of logic block components....................19
Table 3.2 Area and Delay of different A ltera fam ilies....................................31
Table 3.3 Area and Delay of different Quicklogic families............................ 35
Table 3.4 Total Delay as including routing and combinational de lay  49
Table 3.5 Results for benchm ark-1......................................................................50
Table 3.6 Results for benchm ark-2 .................................................................... 51
Tahle 3.7 Results for henchm ark-3 .................................................................... 52
Table 3.8 Results for benchm ark-4.................................................................... 53
Table 4.1 Logical capabilities of Actel Proasicplus........................................ 75
Table 4.2 Logical abilities of M odi......................................................................77
Table 4.3 Logical abilities of Mod2 and Mod3.................................................. 79
Table 4.4 Performance comparison in term s of a re a ...................................... 86
Table 4.5 Propagation delay of the four a rch itectu res...................................88
Tahle 4.6 Comparative results for benchm ark-1 ............................................ 89
Tahle 4.7 Comparative results for benchm ark-2............................................ 89
Tahle 4.8 Comparative results for benchm ark-3 ............................................ 90
Table 4.9 Comparative results for benchm ark-4............................................ 90
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
ACKNOWLEDGEMENTS 
I take th is opportunity to thank  my advisor Dr. Yingtao Jiang, for his 
guidance and support. I would also like to thank  Dr. Rama Venkat for 
helping me through out my M aster’s program.
I would like to thank  Dr. Venkatesan M uthukum ar for his valuable 
suggestions and comments.
I would like to thank  Mr. Stan Hanel for providing access to Active HDL 
whenever I needed it, without which. I’d have struggled a lot.
I would also like to thank  all my friends, Anoop, B haarath, Girish, Kapil
and Ram, who have been w ith me all through.
Last but not the least, I am thankful to my family and my fiancée, Brinda,
for their continuous support and encouragement.
XI
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
TABLE OF CONTENTS
ABSTRACT................................................................................................................... iii
LIST OF ACRONYMS..................................................................................................v
LIST OF FIGURES.....................................................................................................vii
LIST OF TABLES..........................................................................................................x
ACKNOWLEDGEMENTS.......................................................................................... xi
CHAPTER 1 INTRODUCTION...........................................................................1
1.1 M otivation..........................................................................................................2
1.2 Problem Definition............................................................................................3
1.3 Thesis O utline ................................................................................................... 4
CHAPTER 2 BACKGROUND AND PREVIOUS WORK................................ 5
2.1 Field Programmable Gate A rrays..................................................................5
2.1.1 Island style architecture.........................................................................7
2.1.2 Row based architecture ..........................................................................7
2.2 Logic block architecture.................................................................................10
2.2.1 Look-up table based logic blocks........................................................10
2.2.2 M ultiplexer based logic blocks........................................................... 14
CHAPTER 3 EXPERIMENTAL PROCEDURE............................................. 15
3.1 Structural models of logic blocks using VHDL..........................................16
3.2 Area and delay models for FPGA logic blocks...........................................16
3.2.1 Background............................................................................................. 16
3.2.2 Area and delay modeling of the elements of a logic block 17
3.3 A cte l....................................................................................................................19
3.3.1 Logic cell a rch itecture.......................................................................... 20
3.3.2 Routing arch itec tu re ............................................................................ 20
3.3.3 VHDL modeling..................................................................................... 21
3.3.4 Area and delay of Actel’s Proasicplus logic cell.............................. 21
3.4 A ltera’s S tra tix ................................................................................................ 22
3.4.1 Logic cell a rch itecture ......................................................................... 23
Xll
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
3.4.2 Modes of opera tion ............................................................................... 24
3.4.3 Routing s tru c tu re ................................................................................. 26
3.4.4 VHDL modeling.....................................................................................27
3.4.5 Area and delay model of Stratix L E ................................................. 27
3.5 Quicklogic’s E clipse........................................................................................ 30
3.5.1 Logic tile arch itec tu re .......................................................................  31
3.5.2 Routing s tru c tu re ..................................................................................32
3.5.3 VHDL modeling.....................................................................................32
3.5.4 Area and delay of the Eclipse logic ce ll............................................32
3.6 Xilinx’s Spartan I I ...........................................................................................34
3.6.1 Logic cell a rch itecture ..........................................................................34
3.6.2 Routing s tru c tu re ..................................................................................35
3.6.3 VHDL modeling.....................................................................................36
3.6.4 Area and delay models of the Spartan I I ........................................ 36
3.7 DSP FPG A ........................................................................................................ 38
3.7.1 Area and delay of the DSP FPGA......................................................38
3.8 Digit Serial Reconfigurahle FPGA...............................................................39
3.8.1 Area and delay of the DSR FPGA Logic Module............................ 41
3.9 Low Power Programmable Gate A rray.......................................................42
3.9.1 Area and delay of the LPPGA............................................................ 43
3.10 Performance Evaluation of Logic Blocks................................................. 44
3.11 Benchm ark Im plem entations..................................................................... 46
3.11.1 Boolean function................................................................................ 46
3.11.2 32-hit adder..........................................................................................49
3.11.3 16 bit m u ltip lie r.................................................................................50
3.11.4 16-bit MAC u n it ..................................................................................51
CHAPTER 4 RESULTS AND DISCUSSIONS............................................... 53
4.1 Boolean function.............................................................................................. 52
4.1.1 Number of logic blocks.......................................................................53
4.1.2 Occupied A rea........................................................................................ 54
4.1.3 D elay........................................................................................................ 55
4.1.4 U tilization............................................................................................... 57
4.2 32-bit A dder......................................................................................................57
4.2.1 Number of logic b locks.......................................................................57
4.2.2 Occupied A rea........................................................................................ 58
4.2.3 D elay........................................................................................................ 59
4.2.4 U tilization............................................................................................... 61
4.3 16-bit M ultip lier.............................................................................................. 61
4.3.1 Number of logic b locks.......................................................................61
4.3.2 Occupied A rea........................................................................................ 62
4.3.3 D elay ........................................................................................................ 63
4.3.4 U tilization............................................................................................... 65
4.4 16-bit MAC u n it............................................................................................... 65
xm
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
4.4.1 Number of logic b locks........................................................................ 65
4.4.2 Occupied A rea........................................................................................ 66
4.4.3 D elay........................................................................................................67
4.4.4 U tilization............................................................................................... 69
4.5 O bservations.................................................................................................... 69
4.5.1 G ranularity .............................................................................................70
4.5.2 Utilization fac to r...................................................................................70
4.5.3 Speed of the design............................................................................... 71
4.5.4 Utility of a D-flip flop ...........................................................................72
4.6 Functional Improvement of Actel’s Proasicplus.......................................72
4.6.1 Actel’s Proasicplus................................................................................ 72
4.6.1.1 Logical capabilities................................................................. 73
4.6.2 Motivation for im provem ent.............................................................. 74
4.7 Modified A rchitectures...................................................................................75
4.7.1 M o d i........................................................................................................ 75
4.7.2 M od2........................................................................................................ 76
4.5.3 M od3........................................................................................................ 79
4.8 Transistor level m odeling.............................................................................. 80
4.9 Performance Com parison.............................................................................. 84
CHAPTER 5 CONCLUSION AND FUTURE W ORK.................................. 90
5.1 Summary and Contributions.........................................................................90
5.2 Future W o rk .....................................................................................................91
APPENDICES
A Configuration schematics of Actel’s Proasicplus Logic Block.............. 93
B Configuration schematics of M odi arch itecture ......................................96
C Configuration schematics of Mod2 arch itecture ................................... 100
D Configuration schematics of Mod3 architecture....................................109
BIBLIOGRAPHY....................................................................................................... 118
V ITA ............................................................................................................................. 123
XIV
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
CHAPTER 1
INTRODUCTION
Field Program mable Gate Arrays (FPGAs) are now firmly established as 
design and prototyping tools and are showing up in a variety of applications 
[l] [2] [3] [4] [5]. The design of an FPGA is very sim ilar to an MPGA (Mask 
Programmable Gate Array) with the exception th a t all its connections are 
user programmable. Ever since their introduction in the late 80’s, different 
architectures have come forward but a typical FPGA consists of an array of 
logic blocks surrounded by a programmable interconnection network with 
Input/O utput blocks on the periphery. As the name suggests, logic blocks are 
used to implement the logic, while the programmable interconnections 
provide the resources to connect the logic blocks to form the required circuit
[6] [7] [8] [9] [10].
The main advantages of an  FPGA are low non-recurring m anufacturing 
cost and a fast tu rn  around time [5] [lO] [ ll]  [12]. This in stan t 
programmability gives systems built with these devices a significant time-to- 
m arket advantage [6] [7] [8] [9]. However, th is program m ability comes a t a 
price, since FPGAs are slower and demand more silicon area when compared 
to MPGAs or ASICs [3] [6] [9] [13] [14]. This is because ASICs use simple
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
wires for interconnections between logic gates but in FPGAs, gates are 
connected through programmable switches. These switches have much larger 
resistance and capacitance and hence are slower than  the simple wires used 
in MPGAs [6] [13] [15].
1.1 Motivation
Over the last couple of decades, ASICs have heen widely used for myriad 
of applications. However they suffer from very high in itial investm ent and 
significant tu rn  around time. Although FPGAs overcome these limitations, 
they are a compromise in term s of speed and density [3] [6] [7] [8] [14] [15]. 
Hence to m arket FPGAs as attractive choices it is essential to improve their 
performance and m ake them  comparable to ASICs and MPGAs.
In general, the three main factors affecting overall FPGA performance are:
• architecture of the FPGA which includes hoth logic block as well as 
interconnection network,
• quality of CAD tools,
• the electrical transistor level design of the FPGA [13] [15]. 
Selecting the right architecture is one of the most im portant steps in
building an FPGA. The architecture has three components: logic blocks, 
interconnects and I/O blocks [6] [7] [8] [9] [lO]. The goal of this thesis is to 
explore the different FPGA logic block architectures, their relative strengths
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
and weaknesses and also to improve their performance by enhancing 
functionality without adding too many transistors.
1.2 Problem Definition
An FPGA logic block can be as simple as a transistor pair or as complex as 
a microprocessor [7]. A logic block is typically capable of implementing many 
different combinational and sequential functions. The current versions of 
these logic hlock architectures are either look-up-table (LUT) based or 
m ultiplexer based. Depending on the functionality of logic blocks they are 
classified as:
• Small: logic blocks, which can a t most implement any 3 variable 
function.
• Medium: logic blocks, which can a t most implement any boolean 
function between 4 and 6 variables.
• Large: logic blocks, which can implement any function greater than  6 
variables.
During the course of our research we found out th a t for a given boolean 
function, we need more small grained cells than  their competitors. These 
higher numbers of logic cells in tu rn  relate to increased communication 
between logic blocks and hence prove costly in term s of interconnect 
resources. Although the delay through the logic block is comparatively small, 
routing delay which is a combination of both wire delay as well as switch
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
m atrix  delay dominates the scenario and degrades performance [12] [16] [17] 
[18].
Hence our purpose in th is research work was to explore the m erits and 
dem erits of different logic block architectures and also to improve 
performance of the small grained logic cells.
1.3 Thesis Outline
This thesis has been divided into 5 chapters. Chapter 2 gives some 
necessary back-ground information, including the relevant aspects of 
commercially available FPGAs and rela ted  prior work.
Chapter 3 describes the first phase of our research work and explains 
experimental procedure used to evaluate the performance of different logic 
blocks and presents the model used for m easuring area, delay and utilization 
for a given logic hlock.
Chapter 4 presents the results obtained during the early part of this 
research work and describes the second phase of this project which deals with 
the design new functionally enhanced logic hlock architectures. This chapter 
also gives an insight towards their transisto r level implementation and 
discusses the results obtained when these architectures were tested with 
benchmark circuits.
Finally, Chapter 5 draws conclusions and explains w hat can be done in 
future.
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
CHAPTER 2
BACKGROUND AND PREVIOUS WORK
The first half of this chapter provides an introduction to FPGAs and 
describes some of the present day commercial architectures, while the second 
half provides information about previous research work related to our study.
2.1 Field Programmable Gate Arrays
The Field Programmable Gate Array or FPGA as it is more widely called 
is a type of programmable device th a t can be configured to perform a wide 
variety of applications. An FPGA has three major configurable elements:
• Logic blocks
• Input/O utput blocks
• Programmable Interconnects.
Logic blocks are cells th a t can be configured to implement boolean 
functions. The input/output blocks provide the interface between the package 
pins and in ternal signal lines. 'The programmable interconnect resources 
provide routing paths to connect the inputs and outputs of the logic blocks as 
well as input/output blocks. A user's design is implem ented by specifying the 
simple logic function for each cell and selectively closing the switches in the
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
interconnect m atrix. The array of logic cells and interconnects form a fabric of 
basic building blocks for logic circuits. Combining these basic blocks to create 
the desired circuit creates complex designs [7] [8] [9] [lO] [13] [18]. The 
schematic of a conceptual FPGA is as shown helow:
In p u t /O u tp u t  
B lock
In te rc o n n e c tio n
N e tw o rk
Logic B lock
lO B  lO B lO B  lO B   ^ lO B  lO B lO B  lO B
IDE
lO B
lO B
lO B
lO B
lO B
IQB
LB
LB
LB
LB
LB
LB
LB
LB
LB
LB
LB
LB
LB
LB
IQB
IQB
ÎOB
lO B
IQB
IQB
IQB
lOB IQB IQB lO B IQB IQB IQB lO B
Figure 2.1-’ Conceptual FPGA
There are four classes of FPGA's: island style, row based, hierarchical 
PLD and sea-of-gates [6] [8] [9] [lO] [13]. The most popular ones are the 
island style and row based architectures.
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
2.1.1 The Island Style Architecture
The island style architecture consists of an array  of programmable logic 
blocks w ith vertical and horizontal programmable routing channels. The 
basic architecture is illustrated  in figure 2.2. Xilinx FPGAs are classic 
examples of th is kind of architecture [6] [19] [20] [38].
Routing Channels Logic Block
Switch Box Connection Box
Figure 2.2: Island Style Architecture
2.1.2 Row-Based Architecture:
As the name implies, th is architecture has logic blocks arranged in rows 
with horizontal routing channel between successive rows as shown in the 
figure 2.3. The routing tracks within the channel are divided into one or more
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
segm ents. The length of the segments can vary from the width of a module 
pair to the full length of the channel. The segments can be connected a t the 
ends using program m able switches to increase their length. Other tracks run 
vertically through the logic blocks. They provide connections between the 
horizontal routing channel and the vertical routing segments. The family of 
FPGAs from Actel has this kind of architecture [7] [8] [9] [21] [35].
V ertical Tracks^ Segm ented Tracks
1
L ogic  B lock L ogic B lock L ogic  B lock
L ogic B lock L ogic B lock L ogic B lock
T t
H orizontal Routing Channel
Figure 2.3: Row Based Architecture
After their introduction in the mid 80’s FPGAs have now become a billion 
dollar industry. Many companies have come up with different FPGA 
architectures and the most popular ones in today’s world are either
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
m ultiplexer based or LUT based. Companies like Altera and Xilinx have logic 
blocks which use a look-up-table as their basic element while others like 
Actel and Quicklogic use multiplexers to build their logic cells. Both these 
approaches share the  common goal of achieving a high logic density and 
speed performance.
The following are some general term s used frequently in subsequent 
chapters:
• Logic Blocks: th a t portion of the basic tile of an FPGA which
implements both combinational and sequential logic of a circuit. An
FPGA may have only one type of logic block or there may he different 
types of logic blocks. In either case the logic blocks are arranged in a 
two dimensional array.
• Logic Block Functionality: this term  refers to the size of the
granularity of a logic block. On a broader sense, the functionality of a 
logic block refers to the logical capabilities of tha t block. In general a 
logic block w ith lesser functionality is referred to as a fine grained cell 
and a one with greater functionality is referred to as a coarse grained 
cell [7] [18] [22] [23].
• Routing Architecture: th is refers to the routing resources th a t are
available to interconnect logic blocks. These resources have wire tracks 
that can be interconnected by using programmable switches.
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
• Input/O utput Blocks: commonly referred to as lOBs, these blocks 
appear on the periphery of an FPGA and are used to connect logic 
blocks to the physical pins of the chip. Typically an lOB allows a pin to 
be programmed as an input, an output or a bidirectional port. 
Additional processing of a signal like inversion or latching may also he 
provided in an lOB.
• Critical Path: refers to th a t path  in the circuit which has the largest 
delay from one end to the other. In an FPGA, critical path is the 
longest path which is term inated at both ends hy an lOB and may 
contain one or more logic blocks between its ends.
2.2 Logic Block Architecture
Logic block in an FPGA can be implemented in ways th a t differ in num ber 
of inputs and outputs, am ount of area consumed, complexity of logic functions 
th a t it can implement and the total number of transisto rs it consumes [6] [7] 
[12] [13] [16] [17] [22] [23] [24]. The most common versions however, are 
either multiplexer based or LUT based.
2.2.1 Look-Up-Table Based Logic Blocks
The hasis for a LUT based logic block is an SRAM performing as a 
function generator [12] [24] [25] [26] [38]. The tru th  table for a K input logic 
function is stored in a 2^ * 1 SRAM. The address lines of the SRAM act as 
inputs and the output of the SRAM provides the value of the logic function.
10
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
The advantage of look-up-tables is th a t they exhibit very high functionality 
[24]. A K 'input LUT can implement any function of Khnputs and there are
2^*^ such functions. The disadvantage is th a t they are demanding in term s of 
area because the area of a logic block increases exponentially with its inputs
[17] [22]. A basic model of a LUT based logic block is as shown in figure 2.4
[18] [22] [28].
SET
Output
& C L K  '
CLR Q
CLR
Figure 2.4: The basic logic element
The area of a logic block of the above form is a function of the num ber of 
its inputs and the amount of fixed hardw are it contains [6] [17] [18] [22]. The 
total active area for a given im plem entation is the product of num ber of logic 
blocks and the area of each block. The num ber of logic blocks is a decreasing 
function of K (number of inputs to a LUT) because a more functional block 
has higher logic capabilities and hence can implement more of the original 
circuit. On the other hand, the area of a logic block increases exponentially 
with its inputs, as a K hnput LUT requires 2^ programming bits [12] [16] [17] 
[22] [23] [25] [28].
11
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
A nother im portant aspect of LUT based logic blocks is the presence of a 
flip flop within each logic cell. An FPGA implemented using logic blocks 
w ithout an  embedded flip flop requires more logic blocks th an  the one w ith an 
embedded flip flop because each flip-flop m ust be implemented using several 
logic blocks [17] [22] [28] . This feature facilitates the im plem entation of most 
of the current algorithms and applications which are pipelined in order to 
achieve higher speeds.
Hence the  two most im portant conclusions to be drawn are:
• the  optimum num ber of inputs required to achieve the lowest total 
area lies between four and six consistently and th is minima shows very 
little  dependence on the programming technology [17] [22] [28].
• a flip flop is required w ithin a logic block in order to reduce overall chip 
area [17] [22].
A cluster based approach has also been explored in these LUT based logic 
blocks. The structure of the cluster based logic block is illustrated in figure 
2.5. Each cluster has N identical logic blocks and there exists full connectivity 
among all the logic blocks w ithin a cluster [14] [32] [33]. Most of the recent 
logic blocks by A ltera and Xilinx have cluster based logic cell. A cluster size of 
4 has been found to be most area efficient. These cluster based logic blocks 
have very little placement time associated w ith them  [15] [29] [30].
12
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
SET
Output
a.
CLK
CLR
BLE #  1 '
O u tp u tsIn p u ts
B L E #  2
be
B L E # N
Figure 2.5: C luster based logic blocks
Another variation in look-up-table based blocks is the use of 
heterogeneous logic cells [3l]. This structure utilizes LUTs of different sizes 
within a logic block in order to improve performance. The heterogeneous logic 
blocks of sizes (6,4), (5,2), (4,2) and (4,3) were experimentally found to 
outperform the most energy-efficient 4-input homogeneous logic block
13
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
architectures [13] [15] [31]. The Xilinx Spartan XL is a perfect example of a 
heterogeneous logic block [38].
2.2.2 Multiplexer based Logic Blocks:
The multiplexer based logic blocks implement different logic functions by 
connecting each of its inputs either to a constant or to a signal. For example 
consider a 2-1 multiplexer w ith select input ‘s’, inputs ‘a ’ & ‘b’ and output ‘f  = 
sa -t- s’b. By setting signal ‘b’ to logic ‘O’, the multiplexer can implement the 
AND function f - s a  and by connecting input ‘a’ to logic ‘1’, the multiplexer 
implements an OR function. Hence by connecting together a num ber of 
multiplexers and basic logic gates, a logic block can be constructed which can 
implement large num ber of functions [6] [7] [35].
M ultiplexer-based logic blocks have the advantage of providing large 
degree of functionality for a relatively small num ber of transistors. This is 
however, achieved a t the expense of a large num ber of inputs which place a 
high demand on routing resources [17] [IS] [22] [23]. When compared to the 
LUT based architectures, the multiplexer logic blocks are fine grained, but 
show higher utilization factors, since it is easier to use small logic gates 
efficiently. The disadvantage of these logic blocks is th a t they require a 
relatively large num ber of wire segments and program m able switches. Such 
routing resources are costly hoth in term s of area and delay [17] [18] [22] [23] 
[28]. Hence researchers are constantly trying to improve the functionality of 
the logic block so as to reduce the overhead on interconnect resources.
14
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
CHAPTER 3
EXPERIMENTAL PROCEDURE 
This chapter presents our experim ental setup. Our research work is split 
up into two phases. In  the first phase, we analyzed the performance of 
various commercial as well as academic logic block architectures by modeling 
them  in VHDL and analyzing them  with benchm ark circuits in order to verify 
their functionality. We also came up with a modeling scheme to estim ate the 
area occupied and propagation delay of these designs. This data gave us a 
clear indication of the strengths and weaknesses of various logic cells.
In the second phase, we concentrated on improving the performance of 
Actel’s Proasicplus logic cell, which falls under the category of small grained 
logic blocks. This boost in performance was achieved by enhancing the 
functionality of the logic cell w ithout adding a lot of silicon overhead. Three 
promising architectures are suggested and the ir capabilities explored. They 
were also built a t transistor level in a 0.18pm CMOS process in order to 
estim ate area and propagation delay.
15
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
3.1 S tructu ra l Models of Logic Blocks using VHDL
Aldec’s Active-HDL was used to build the logic blocks a t RTL level. To 
facilitate design mapping and data hierarchy, we used a structural method to 
create these logic blocks. The VHDL codes of all the logic blocks were tested 
w ith benchm arks and were also compiled and simulated.
3.2 Area and Delay Models for FPGA Logic Blocks
This section describes a method, based on transistor count, for modeling 
param eters like area and delay of various FPGA logic block architectures.
3.2.1 Background
Since most of the FPGAs came out in the CMOS era, it is safe to assume 
th a t the prevailing logic style used to design the various components was 
CMOS. The main reason for this is its  robustness. The CMOS logic style is 
very reliable and is the least prone to external influences like noise, garbage 
data etc. However it is true that, CMOS logic style occupies redundant area 
and hence need not have been employed through out the chip. Since 
transisto r level information is not available in any of the datasheets, we are 
forced to assume th a t the CMOS logic style prevails all over the chip.
Most FPGA logic blocks use components like inverters, AND gates, OR 
gates, EXOR gates and D flip flops to name a few. So, to estim ate the area or 
delay of the logic block we first have to approximate the area and delay of the 
elements th a t make up a logic block.
16
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
3.2.2 Area and Delay Modeling of the Elements of a Logic Block
Estim ates of area and delay of any gate can be obtained by analyzing 
these components a t transistor level. Consider the simple case of an inverter, 
the transisto r level schematic of a CMOS inverter is as shown in the 
following figure. For reasons of symmetry, the inverter is assumed to have 
equal rise and fall times, which makes the PMOS approximately twice as 
wide as the NMOS [39] [40]. The area occupied by such an inverter is 
assum ed to be ‘A’ sq. units and the propagation delay is assum ed to be ‘D’
time units. From this, the area of a single NMOS is — while the area of the
PMOS is 2A
Figure 3.1: CMOS Inverter
17
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
Extending the same methodology to a 2-input NAND gate, we have:
Area of a 2-NAND gate is ‘2A’ Sq. Units 
Delay of a 2-NAND gate is ‘D’ Time Units.
U sing the  same concept, area and delay of various other logic gates were 
estim ated and the data is presented below in table 3.1.
Table 3.1: Area and Delay estim ates of various components
Component Area ( A sq. units) Delay ( D time units)
Inverter 1 1
NAND gate 2 1
NOR gate 3 1
AND gate 3 2
OR gate 4 2
EXOR gate 2.30 2
2 to 1 Mux 3.60 2
SRAM 2.66 2
FLASH Switch 0.66 1
D Flip Flop 12.33 5
2-LUT 18.64 5
3-LUT 36.94 6
4-LUT 71.56 7
18
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
For our analysis purposes we have chosen the following architectures•’ 
Actel’s Proasicplus
Altera’s Flex6k, FlexlOk, Apex II, Apex 20k, Mercury and Stratix 
Quicklogic’s PASICl, PASIC 3 and Eclipse II 
Xilinx’s Spartan  XL and Spartan II 
DSP logic Module
Digit Serial Reconfigurable Field Programmable Gate Array 
Low Power Programmable Gate Array
3.3 Actel’s Proasicplus
Actel’s Proasicplus is their la test FPGA family. This is a 3.3V, 0.22pm, 4- 
layer CMOS process. The architecture of the Proasicplus logic cell is as shown 
in the following figure [35].
Outputs
Figure 3.2: Actel's Proasicplus logic block. Courtesy: Actel’s D atasheets [35].
19
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
3.3.1 Logic Cell Architecture
The Proasicplus FPGA series consists of many multiplexer based logic 
tiles th a t can be configured by programming the appropriate flash switch 
interconnections. Flash switches are distributed through out the device to 
provide non-volatile, reconfigurable interconnect programming. The logic cell 
has th ree  inputs and one output and is capable of implementing most of the 
3-variable functions. The inputs are available in both true and complemented 
forms and either of them  can be chosen. The cell can also be configured as a 
latch or a flip-flop [35].
3.3.2 Routing Structure
The routing structure of the Proasicplus FPGA has four levels of hierarchy 
in the form of local resources, long line resources, high-speed long lines and 
high performance global lines. The local lines allow the output of each logic 
tile to be directly connected to any of its eight neighboring inputs. Tbe long 
lines vary in length by 1, 2 or 4 logic tiles. All the logic tiles are capable of 
driving signals onto these long lines, which can in tu rn  access every input of 
every logic tile. The lines provide routing resources for higher fan-out 
connections. The high-speed long lines span the entire length and breadth of 
the device and are used for very high fan-out nets. The global lines are used 
to distribute clocks, resets and other high fan-out nets which require 
minimum skew [35].
20
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
3.3.3 VHDL Modeling
A VHDL model of the above logic cell was constructed using Aldec’s 
Active-HDL tool. To facilitate design mapping, a structural approach was 
followed. Since VHDL is mainly an RTL level tool, the FLASH switches were 
represented as multiplexers w ith select lines provided hy the user. The 
functionality of the logic cell was verified by implementing a few basic gates.
Sel
Figure 3.3: Flash Switch vs. Multiplexer
3.3.4 Area and Delay of Actel’s Proasicplus Logic Cell:
From the schematic of the Proasicplus logic cell, we can make out th a t this 
logic block has the following components:
• Inverters (?)
• 2 1  Mux (2)
• 2-NAND Gate (2)
• Flash Switches (14)
Based on the data from table 3.1 we can estim ate the total area required 
for one logic tile as:
Total Area = 7*1A + 2*3.6A + 2*2A -t 14*0.66A
21
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
Total Area -  27.44 A sq. Units.
The combinational delay of the logic block can easily be approximated by 
identifying the critical path  in the logic block. The critical path  in Actel’s 
Proasicplus logic tile consists of the following components:
• Inverter
• Flash Switch
• 2-1 Mux
• Inverter
• Flash Switch
• 2-1 Mux
• 2-NAND Gate
• Inverter
Based on the data from table 3.1, the worst-case delay through this logic 
cell can be estim ated as:
Total Delay = D -t- D -t- 2D -t D -t D -f- 2D -t D -t D
Total Delay = lOD Time Units.
3.4 Altera’s Stratix
The most recent FPGA family from Altera is the S tratix  series, which is 
based on a 1.5V, 0.13pm, all layer copper SRAM process. These devices have 
logic blocks spread out in rows and columns. The basic block of the Stratix 
device is called the “Logic Elem ent” (LE) whose architecture is as shown in 
the figure below.
22
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
carry inO -------
c a r r y in l -------
la b c a rry in ------
add n su b
regchain
d a t a i
Row
O u tp u tdata2da taS C arry
C hainLU T
d a ta i
C olum n
O u tp u tla b c lr l
cw reset
labclr2
labpre
LUT C hain  
1 O u tp u t
labclkl-
labclk2
la b c lk en a l
labclkena2
Figure 3.4: The Stratix  Logic Element. Courtesy S tratix  Datasheet
3.4.1 Logic Cell Architecture
A four input LUT forms the core of the Logic Element and hence each LE 
can implement any 4-variable function. Along w ith the LUT, each LE also 
has a register and dedicated paths for carry-select and cascade chain 
functions. The carry-select chain is used for high speed arithm etic functions 
like counters and adders and the cascade chain is used for wide-input 
functions such as comparators. The register is programmable and can he 
configured as a D, T, SR or JK  flip-flop. Because each LE contains a flip-flop, 
pipelined designs can easily be implemented. For combinational functions, 
the flip-flop can he bypassed and the output of the LUT drives the output of
23
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
Array Blocks (LABs). Each LAB comprises of 10 logic elements, control 
signals and local interconnect, LUT chain and register chain connection lines. 
The LAB provides a coarse grained structure and facilitates efficient routing, 
device utilization and high performance.
3.4.2 Modes of Operation
The S tratix  operates in either normal mode or dynamic arithm etic mode. 
In norm al mode, the LE can be used to implement combinational functions 
and general logic applications. The four data inputs and the carry-in signal 
form the inputs to the logic element. The LE output can he either registered 
or unregistered. Each LE can use the LUT chain connections to drive its 
combinatorial output directly to the next LE within the same LAB. The 
architecture of the Stratix device in its normal mode is as shown in figure 3.5.
As the name suggests the dynamic arithm etic mode is used to implement 
arithm etic functions like adders, counters, accumulators etc. An LE in 
dynamic arithm etic mode uses 4 two input LUTs which are configurable as a 
dynamic adder/subtractor. The first two LUTs compute two summations 
based on a possible carry-in of 1 or 0, while the other two LUTs generate 
carry outputs for the two chains of a carry select circuitry. As before, an LE 
can drive registered and unregistered versions of the LUT output. The 
structure of the LE in this mode is as shown in figure 3.6 [36]:
24
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
Row
O utput
regchain
addnsub set
d a ta i Column
O utput
ata2
cm
elk clr LUT Chain 
O utputdatad
Register 
Chain O utput
CLR QLUT
Figure 3.5- Altera's S tratix  in Normal Mode. Courtesy Stratix D atasheet
□
d a ta S
a d d n s u b
d a t a i
d a t a 2
r e g c h a in
L U T
L U T
L U T
L U T
cinO
coutO c o u t l
c i n l
»
p-
D Q
>
CLR Q
clk clr
Row 
- O u tp u t
C olum n
O u tp u t
R eg is te r  C hain  
O u tp u t
LU T  C hain  
O u tp u t
Figure 3.6: Altera's S tratix  in A rithm etic Mode. Courtesy S tratix  Datasheet.
25
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
3.4.3 Routing Structure
A series of column and row interconnects of varying lengths and speed, 
provide signal interconnects w ithin the device. W ithin the  LAB, the routing 
structure  consists of local interconnect, LUT chain and register chain. The 
local interconnect provides connections between logic elements of the same 
LAB. The LUT chain connections transfer the output of one LE’s LUT to the 
adjacent LE for fast sequential LUT connections within the same LAB. The 
register chain connections transfer the output of one LE register to the 
adjacent LE within an LAB.
The MultiTrack interconnects provide routing resources throughout the 
device. These consist of continuous, performance optimized, routing lines of 
different lengths and speeds, and provide connections within a block or 
between two blocks. The M ultiTrack interconnects have row and column 
interconnects th a t span fixed distances. The row interconnects provide 
resources for LABs within the same row. These include:
• Direct link interconnects between LABs and adjacent blocks
• R4 interconnects traversing four blocks to the right or left
• R8 interconnects traversing eight blocks to the right or left
• R24 interconnects traversing the entire length of the device
The column interconnects are very sim ilar to the row interconnect but are 
used for vertical communication. These include
• LUT chain interconnects w ithin an LAB
26
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
• Register chain interconnects w ithin an LAB
• C4 interconnects traversing four blocks in up or down direction
• C8 interconnects traversing eight blocks in up or down direction
• C l6  interconnects traversing throughout the device [36]
3.4.4 VHDL Modeling
The core of the logic cell, which is the LUT, was created on the basis of a 
m ultiplexer tree. The normal mode and arithmetic modes were constructed 
separately and bench m arks were implemented.
3.4.5 Area and Delay Model of S tratix  LE
The area and delay estim ates vary with the mode of operation. For 
normal mode, we can observe th a t the LE consists of the following 
components:
• 2-1 Mux (7)
. 4-LUT (1)
• 2-XOR Gate (l)
• 2 AND Gate (2)
. 2-NAND Gate (2)
. 3-NOR Gate (l)
« 4-1 Mux (l)
• D-flip flop (l)
Hence the total area  occupied by the logic element is
Total Area = 7*3.6A + 1*71.56A + 1*12.33A + 1*2.3A -t 2*3A + 2*2A
27
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
+ 1*7A+ 1*10.8A 
Total Area = 139.19 A Sq.Units.
The worst-case path  has the following blocks:
• 4-1 Mux 
. 4-LUT
• 2-1 Mux
• 2-1 Mux
• 2 AND Gate
• D-flip flop
• 2-1 Mux
Hence the delay of th is logic tile is estim ated as:
Total Delay — 4D -t 7D + 2D -t 2D -t 2D -t 5D -t 2D 
Total Delay -  24D Time Units 
In its dynamic arithm etic mode of operation, the LE comprises of:
• 2-1 Mux ( l l )
. 2-LUT (4)
• 2-XOR Gate (l)
• 2-AND Gate (3)
• 2-NAND Gate (l)
• 3-NOR Gate (l)
• 4-1 Mux (l)
• D-flip flop (l)
28
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
Hence the total area occupied is approximately equal to:
Total Area = 11*3.6A + 4*18.64A + 1*12.33A + 1*2.3A + 1*10.8A + 1*7A 
+ 1*2A +3*3A 
Total Area = 157.59 A Sq.Units.
From the above schematic we can make out th a t the worst case path has 
the following components:
» 2-XOR Gate 
.  2-LUT
• 2-1 Mux
• 2-1 Mux
• 2-1 Mux
• 2-AND Gate
• D-flip flop
• 2-1 Mux
Based on the data from table 3.1 we can estim ate the total combinational 
delay for one logic tile as:
Total Delay — 2D -t 5D -t 2D -t- 2D -t 2D -t 2D 5D -t 2D
Total Delay -  23 D Time Units
In addition to the S tratix  family other families like Flex, Apex and
Mercury were also analyzed. S tructural models of logic elements in different
modes from these families were constructed and tested for logic
29
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
implem entation. Further more, the area and delay of the LEs were computed 
and is presented in the following table:
Table 3.2: Area and Delay of different Altera families
Device
Family
Operating
Mode
Area 
( A Sq.Units)
Delay 
( D Time Units)
FlexGk
Normal 119.69 18
Arithmetic 118.41 15
FlexlOk
Normal 131.89 20
Arithmetic 123.41 15
Apex II
Normal 126.89 20
Arithmetic 122.01 15
Apex 20k
Normal 126.89 20
Arithmetic 122.01 15
Mercury
Normal 134.09 18
Arithmetic 144.29 18
3.5 Quicklogic
The newest addition to the fleet of FPGAs from Quicklogic is the Eclipse 
series. This FPGA family is based on a 3.3V, 0.25pm, 5-layer CMOS process, 
whose architecture is as shown in the next page [37].
30
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
A l -
A 2 -
A 3 -
A 4 -
A 5 -
A 6 -
D Q
>
CLR Q
Figure 3.7: The Quicklogic Eclipse II logic cell. Courtesy Eclipse D atasheet
3.5.1 Logic Tile Architecture
The Eclipse family is a dual register multiplexer based logic cell which is 
designed for wide fan-in and multiple, sim ultaneous output functions. The 
logic cell consists of two 6-input AND gates, four 2-input AND gates, seven 2- 
1 multiplexers and two D flip flops. The cell has a fan-in of 30 and is claimed 
to be able to fit functions which are as wide as 17 sim ultaneous inputs. The 
logic cell has 6 sim ultaneous outputs, two of which can be registered. This 
high logic capacity and wide fan-in accommodates m any user functions w ith a 
single level of logic delay [37].
31
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
3.5.2 Routing Structure
Six types of routing resources are provided: short wires, dual wires, quad 
wires, express wires, distributed networks and default wires. Short wires 
span the length of one logic cell in the vertical direction. Dual wires run 
horizontally and cover two logic cells. Short and Dual wires are primarily 
used for local connections. Quad wires span four logic cells and are used for 
the im plem entation of medium fan-out nets. Express lines and distributed 
networks ru n  the length of the FPGA and carry signals which require low 
skews. These lines have higher capacitance than  a quad line or a short line 
but have lesser capacitance than  shorter wires connected to run  the entire 
length of the device. Also, the resistance is reduced as no interm ediate 
switches are required. These lines are used for long routes or high fan out 
nets [37].
3.5.3 VHDL Modeling
A structural VHDL model comprising of all the components was built and 
tested for its operation.
3.5.4 Area and Delay of the Eclipse Logic Cell
The Eclipse logic cell has the following components:
• 2-1 Mux (7)
• 2-AND Gate (4)
• 6‘AND Gate (2)
• D-flip flop (2)
32
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
The to tal area is approximately equal to
Total Area = 7*3.6A + 2*12.33A + 4*3A + 2*14.33A 
Total Area = 90.52 A Sq.Units
The delay of the logic cell is estim ated by following the critical path  which 
consists of the following components:
• Inverter
• 6’AND Gate
• 2-1 Mux
• 2-1 Mux
• 2-1 Mux
• D-flip flop
Hence the delay through the block is equal to 
Total Delay — D -t- 2D -t 2D 4- 2D -t- 2D 4- 5D 
Total Delay -  14D Time U nits
Along w ith the Eclipse family other families like PASIC 1 and PASIC 3 
were also analyzed. Structural models of these logic cells were constructed 
and tested for logic implementation. As in previous cases, their area and 
delay were also estim ated and the values are as shown in the following table.
33
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
Table 3.3: Area and Delay of Quicklogic Families
Device
Family
Area 
(A Sq. Units)
Delay 
(D Time Units)
PASIC 1 71.79 12
PASIC 3 82.59 14
3.6 Xilinx
Xilinx introduced the concept of FPGAs w ith its XC2000 series and their 
la test FPGA family is the Virtex series bu t the architecture of the logic block 
is very similar to their predecessors, i.e. the Spartan II family. This series is 
based on a 2.5V, 0.18pm CMOS process. It has a regular, flexible 
programmable architecture comprised of Configurable Logic Blocks (CLBs), 
which are in tu rn  surrounded by a perim eter of programmable Input/O utput 
blocks (lOBs). These functional elements are interconnected by a powerful 
hierarchy of versatile routing channels.
3.6.1 Logic Cell Architecture
The basic building block of the Spartan  II FPGA is the logic cell (LC). 
Each LC consists of a four-input function generator, fast-carry logic and a 
storage element. Each CLB has four LCs, arranged in two similar slices. The 
following figure shows one slice of this CLB.
34
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
Each logic cell is capable of implementing any 4-input function and can 
also be configured a 16 * 1 bit synchronous RAM. In addition to this, tbe four 
LCs can be combined to implement additional combinational logic of five or 
six inpu t variables and claims th a t it can also implement some functions of 
nine inputs and some selected functions of nineteen inputs.
GOUT
YB
G4
YQ
G3
G2
G1
F5in
BY
SR
XB
F4 XQ
F3
F2
F I
BX
CIN-
CLK
BY-
CLR Q
CLR O
LUT
LUT
Carry
Logic
Carry
Logic
Figure 3.8: Slice of tbe Spartan  II CLB. Courtesy Spartan Datasheet.
35
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
3.6.2 Routing Structure
Routing resources of Xilinx are very robust. I t includes local routing, 
general purpose routing, dedicated routing, I/O routing and global routing. 
The local routing resources provide interconnections among LUTs, flip flops, 
internal feedback paths th a t provide high speed connections within a CLB 
and direct paths for horizontally adjacent CLBs. The general purpose routing 
is located in horizontal and vertical routing channels associated with rows 
and columns of CLBs. This provides resources for adjacent CLBs in both 
horizontal and vertical directions. The general routing resource includes 24 
single length lines, 96 buffered hex length lines, out of which one-third are 
bidirectional and 12 buffered bidirectional long lines. The I/O routing 
provides additional routing resources around the periphery of the device. This 
additional routing is called the VersaRing. In  Spartan  II dedicated routing is 
provided for 3-state busses in the horizontal direction and the carry signal in 
the vertical direction. Lastly global routing resources distribute clocks and 
other signals with very high fan-outs throughout the device [l9] [20] [38].
3.6.3 VHDL Modeling
For easier design mapping, structural VHDL models were constructed. 
The LUT was based on the concept of m ultiplexer tree. A single logic cell was 
built its functional capabilities were explored by implementing some basic 
gates.
36
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
3.6.4 Area and Delay Models of the Spartan II Family
From the above schematic we can make out th a t th is logic block has the 
following components:
. 2-1 Mux (20)
• 4-1 Mux (8)
• Carry Logic (4)
.  2-XOR Gate (4)
• 2 AND Gate (4)
. 4-LUT (4)
• D-flip flop (4)
Based on the data from table 3.1 we can estim ate the total area required 
for one logic block as:
Total Area = 20*3.6A + 8*10.8 + 4*24.6A 4*2.3A + 4*3A + 4*71.56A
-t 4*12.33A 
Total Area = 613.56 A Sq.Units.
From the above schematic we can make out th a t the worst case path  has 
the following components:
. 4-LUT
• 2-1 Mux
• 2-1 Mux
• 2-QR Gate
• 4-1 Mux
37
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
• 2-1 Mux
• D-flip flop
This delay can be estim ated as:
Total Delay — 7D 4- 2D -t 2D -t- 2D 4- 4D 4- 2D -t 5D 
Total Delay = 24D Time Units.
In  the same lines we also modeled the Spartan  XL architecture and its 
area and delay turned out to be 286.32A Sq units and 24D Time Units 
respectively.
3.7 DSP FPGA
The DSP logic module appeared in the IEEE transactions on VLSI 
Systems in the year 1995. This article describes an EXOR based logic block 
designed prim arily for DSP applications. The architecture of this DSP FPGA 
is as shown in the following figure. As can be observed, this fine grain logic 
cell has 5-inputs and 2-outputs and shows high functionality. Basic gates 
were implemented and the logic cell was tested for its capabilities. It shows 
poor performance for sequential functions as the logic cell does not have a
register. This is a serious flaw, since most of the current designs are
pipelined [32].
38
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
Figure 3.9: The DSP Logic Module
3.7.1 Area and Delay of the DSP FPGA:
From the above schematic we can make out th a t th is logic block has the 
following components:
• 2-1 Mux (l)
• Inverter (l)
• 2 AND Gate (2)
• 2-XOR Gate (2)
• 2-NOR Gate (l)
With this, area of the logic block is estim ated as
Total Area = 1*3.6A + 1*A + 2*3A + 2*2.3A + 1*3A 
Total Area = 18.2 A Sq.Units.
From the above schematic we can make out th a t the worst case path  has 
the following components:
• Inverter
39
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
• 2-1 Mux
• 2-AND Gate
• 2-NOR Gate
Based on the data  from table 3.1 we can estim ate the total combinational 
delay for one logic tile as:
Total Delay = D -t 2D -t 2D 4- D 
Total Delay = 6D Time Units.
3.8 Digit Serial Reconfigurable FPGA
The digit serial reconfigurable FPGA is very sim ilar to the DSP FPGA but 
also utilizes the concept of cluster based logic blocks. The basic cell of this 
logic block array  is called as a Logic Module and is as shown in the following 
figure.
C in SO
CO
Figure 3.10: Architecture of the Logic Module for DSR FPGA
A logic block array  is made up of four such logic modules and hence a wide 
variety of boolean functions can be implemented. In addition to the logic
40
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
modules, the cell also comprises of fast carry logic circuitry and a register 
array. Each logic array hlock has a total of 26 inputs and 9 outputs. This logic 
block satisfies the requirem ent of rapid prototyping and efficient 
im plem entations of digit serial DSP applications. The structure of the digit 
serial logic block is as shown below [38]:
Figure 3.11: Digit Serial Logic Block
3.8.1 Area and Delay of the DSRFPGA Logic Module:
The LM has the following components
• 2-1 Mux (l)
• Inverter (l)
• 2 AND Gate (3)
. 2-XOR Gate (2)
41
Register
A rray
Logic
Module
CinO-------Logic
Modulesp
ac
C in l -------Fast
Carry
Logic
Carry
Select
Logic
Digit Serial Logic 
A rray C in2 --------
3Q LogicModule CinS
Logic
Module
O
S
oc
■oc
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
• 2-OR Gate (l)
Based on the data from table 3.1 we can estim ate the total area required for 
one logic block as:
Total Area = 1*3.6A + 1*1A + 3*3A + 2*2.3A + 1*4A 
Total Area = 22.2 A Sq.Units 
The delay through the logic module is estim ated based on the components 
in the critical path
• 2-XOR Gate
• 2 AND Gate
• 2-1 Mux
• 2 AND Gate
• 2-OR Gate
Hence the total delay is equal to
Total Delay -  2D + 2D + 2D + 2D 4- 2D 
Total Delay = 10 D Time U nits
3.9 Low Power Programmable Gate Array
The low power programmable gate array  is an energy efficient FPGA 
architecture. The functionality of th is logic cell is based on a cluster of 3- 
input LUTs as shown in the figure below. The 3-LUT is implemented as a 
multiplexer tree. The control signals to the  multiplexer act as inputs to the 
LUT. The inputs to the LUT are stored in memory cells, while the 
functionality of the LUT is controlled by programming the memory cells
42
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
based on the tru th  table of the required function. This clustering scheme 
m akes it possible to combine the resu lts of the four 3-input LUTs in various 
ways. All the  outputs of the logic block can be registered if required [34].
A1A2A3B1B2
B1 B2
D 0
CLR Q
03
>
CLR Q
04
]
>
CLR Q
05
Figure 3.12: The Low Power Programmable Gate Array
3.9.1 Area and Delay of LPPGA
The logic cell has the following components
• 2-1 Mux (9)
• 4 'lM ux (l)
43
01
■02
03
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
. 3-LUT (4)
• Inverter ( l l)
• D -flip  flop  (3)
From table 3.1, the estim ated area is equal to
Total Area = 9*3.6 + 1*10.8A + 4*36.94A + 11*A -t 3*12.33A 
Total Area = 238.95 A Sq.Units.
The critical path  consists of
• 2-1 Mux
• Inverter 
. 3-LUT
• 4-1 Mux
• 2-1 Mux
• D -flip  flop
• 2-1 Mux
Hence the delay through this logic block is equal to 
Total Delay — 2D -t D -t- 6D + 4D -t- 2D 4- 5D -t- 2D 
Total Delay -  22D Time Units.
3.10 Performance Evaluation of Logic Blocks
The performance of the logic blocks is evaluated by implementing 
benchmarks on each of them. The following m etrics were used to see the 
relative m erits and dem erits of logic blocks:
44
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
Number of Logic Blocks: Represented as ‘N l ’, th is refers to the actual 
num ber of cells th a t are required in order to successfully map the 
design on to logic blocks. This metric is inversely related to the 
functionality of the logic cell. That is, a logic cell with high 
functionality (coarse grained) will use fewer logic blocks to implement 
any design than  a logic cell w ith lesser functionality (fine grained). 
Area: This param eter is directly related to the num ber of logic blocks. 
Area occupied by a particular design in given by 
Area Occupied = N l * A l + Ra
Where N l  is the num ber of logic blocks required for a given design, A l  
is the area of each block and Ra is the routing area.
Delay: The delay for a given design is a function of the delay through 
the logic block and the routing delay. Hence the total delay is 
represented as
Total Delay = N lc  * D l  + (N lc  -  1 * Rd)
Where N lc  is the num ber of logic blocks in  the critical path, D l is the 
delay through the logic block and Rd is the routing delay.
Of the two delays, the delay through the logic block can be estim ated 
since each block has a specific architecture made up a number of gates. 
But, the routing delay is much harder to calculate, as th is depends on 
the length of the wires, width of the  channel, type of switches, switch
45
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
m atrix etc. Because of this, the routing delay is estim ated to be zero, 
twice, four times or ten  tim es the delay through the logic block.
• Utilization: The utilization of a logic block is represented as a
percentage. This metric gives a clear indication of the specific num ber 
of gates th a t are used to implement a given function in a logic block.
3.11 Benchm ark Im plementations
To understand the abilities of the logic blocks, we implemented a few 
benchm ark circuits on each of the architectures discussed previously. Area 
and delay of each implementation were also calculated for performance 
comparison. The following benchm arks were chosen:
• 16-variable Boolean Function
• 32-bit Adder
• 16-bit M ultiplier
• 16-bit MAC Unit
3.11.1 Boolean Function
We implemented the following 16-variable Boolean function on all the 
logic block architectures:
Q = ABCD 4- EFGH + (IJKL)’ + (M -t N -t 0  + P)
As can be observed, the equation is in sum of products form and the ‘4-’ 
sign in the above function represents an ‘OR’ operation. In order to explain
46
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
the performance evaluation scheme adopted, let us consider Actel’s 
Proasicplus logic cell as an example.
When implemented on Actel’s logic blocks, the Boolean function was 
mapped as shown in figure 3.13.
Figure 3.13: Boolean Function Im plem entation in Actel’s Proasicplus
Based on the above schematic, the following param eters are estimated:
• Area Required:
= (N l * A l)  + Ra 
= 13 * 27 .44  A  Sq. U nits + Ra 
= 356.72  A  Sq. Units. + Ra
• Total Delay:
= (N l in the critical path  * D l) + (N l -  1 * Rd)
= (4 * 6D) + (3 * Rd)
47
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
= 24D + (3 * Rd)
Now if we consider th a t the routing delay could be zero, twice, four times 
or ten  times the average combinational delay, we have.
Table 3.4: Total Delay as a function of both routing and combinational delay
Routing delay Rd -  0 *D lb Rd = 2 * D lb RD -  4 * D lb Rd = 10 *Dlb
Total Delay 24D 84D 144D 324D
• Utilization Factor: Since we have manually mapped the above boolean 
function onto the logic blocks, we can also estim ate how much of the 
logic block’s resources were utilized for im plem entation purposes. That 
is, we know th a t we have the following components in Actel’s 
Proasicplus logic block:
1. Inverters (7)
2. 2-1 Mux (2)
3. 2-NAND Gate (2)
4. F lash Switches (14)
So, with 13 logic blocks the total number of components is 325, but to 
implement the Boolean function we use only around 112 components. Hence 
the average utilization factor is 34.46%.
48
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
A sim ilar analysis was done on all the logic block architectures and the 
data  collected is presented in  the following table:
Table 3.5: Results for Benchmark !
C om binational Delay of th e  design U tilization
A rchitecture Nl Dl A rea Rd=0 Rd=2D Rd=4D Rd=10D P ercentage
Proasicplus 13 06 356.72 24 84 144 324 34.46
Flex6k 5 13 59&4S 26 62 98 206 40.00
Flex10k 5 13 6S&45 26 66 106 226 30.76
Apex II 5 13 634.45 26 66 106 226 3&33
Apex 20k 5 12 634.45 26 66 106 226 3&33
M ercury 5 09 670.45 18 54 90 198 15.38
S tra tix 5 11 695.95 22 70 118 262 18.75
P a sic 1 3 07 215.37 14 28 42 84 60.00
Pasic 3 3 09 247.77 18 36 54 108 66.66
Eclipse 3 09 271.56 18 36 54 108 57.77
S p artan  XL 3 09 858.96 18 66 114 258 41.66
S p artan  II 2 11 1227.12 22 70 118 262 10.41
DSPLM 12 06 218.40 30 78 126 270 60.71
DSRFPGA 13 10 28&60 50 130 210 450 34.50
LP_PGA 5 18 1194.75 36 80 124 256 50.51
49
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
3.11.2 32- bit Adder
A 32-bit ripple carry adder was implemented on all the logic blocks and 
the various performance param eters were computed. This data is presented 
in the following table.
Table 3.6: Results for Benchmark-2
C om binational Delay of th e  design U tilization
A rchitecture N l Dl Area Rd=0 Rd=2D Rd=4D Rd=10D Percentage
Proasicplus 384 06 12072.9 576 2476 4376 10076 34.66
FlexGk 32 10 3789.1 320 1250 2180 4970 40.00
FlexlOk 32 10 3949.1 320 1250 2180 4970 36.36
Apex II 32 10 39&L3 320 1250 2180 4970 36.36
Apex 20k 32 10 3904.3 320 1250 2180 4970 36.36
M ercury 32 13 4617.2 416 1532 2648 5996 57.89
S tra tix 32 09 5298.8 288 1714 3140 7418 43.47
Pasic 1 128 07 9189.1 448 1960 3472 8008 90.00
Pasic 3 128 09 10571.5 576 2340 4104 9396 7&92
Eclipse 128 09 12610.5 576 2340 4104 9396 66.66
S p artan  XL 16 11 3149.5 176 0896 1616 3776 41.66
S p artan  II 08 11 4940.4 088 0424 0760 1768 25.00
DSPLM 32 06 870.4 192 0564 0936 2052 97.47
DSRFPGA 32 10 934.4 320 0940 1560 3420 100.00
LP_PGA 32 12 7646.4 384 1748 3112 7204 4&85
50
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
3.11.3 16-bit Multiplier
The following table compares the performance of various logic blocks when 
a 10-bit multiplier was implemented. The basic block of th is design is a four- 
b it m ultiplier cell.
Table 3.7: Results for Benchmark-3
C om binational Delay of the  design U tilization
A rchitecture N l Dl Area Rd=0 Rd=2D Rd=4D Rd=10D Percentage
Proasicplus 4992 06 156948.48 1962 8482 15002 34582 40.00
FlexGk 928 10 110212.16 1133 4493 7853 17933 40.00
FlexlOk 928 10 116695.36 1133 4493 7853 17933 3&36
Apex II 928 10 114474.56 1133 4493 7853 17933 3&36
Apex 20k 928 10 114474.56 1133 4493 7853 17933 36 36
M ercury 928 13 131289.92 1465 5497 9529 21625 4L82
S tratix 928 09 147677.12 1019 6171 11323 26779 3&08
Pasic 1 3840 07 267993.60 1631 7199 12767 29471 87.66
Pasic 3 3840 09 309465.60 2097 8593 15089 34577 74^ 7
Eclipse 3776 09 372011.52 2097 8593 15089 34577 6&89
S p artan  XL 464 11 13285&48 627 3315 6003 14067 39.36
S partan  II 232 11 143273.92 319 1663 3007 7039 27.87
DSPLM 1438 06 39116.60 786 2346 3906 8586 84.80
DSRFPGA 1184 10 34572.80 1170 3490 6370 14170 8L 68
LP_PGA 1056 12 252331.20 1404 6508 11612 26924 4&85
51
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
3.11.4 16-bit MAC Unit
In  order to evaluate the logic blocks for pipelined applications, a 16-hit 
m ultiply and accumulate unit was implemented. This data is tabulated 
below:
Table 3.8: Results for Benchmark-4
Com binational D elay of th e  design
A rchitecture Nl Dl A rea Rd=0 Rd=2D Rd=4D Rd=10D U tilization
Proasicplus 5408 06 170027.5 2544 11004 19464 44844 4&55
Flex6k 960 15 114001.2 1613 5933 10253 23213 40.33
FlexlOk 960 15 120644.4 1613 5933 10253 23213 36.66
Apex II 960 15 118378.8 1613 5933 10253 23213 3&66
Apex 20k 960 15 118378.8 1613 5933 10253 23213 36.66
M ercury 960 18 135907.2 2041 7225 12409 27961 4&43
S tratix 960 09 152976.0 1755 8379 15003 34875 3&08
Pasic 1 3904 12 384622.0 2239 9343 16447 37759 87.82
Pasic 3 3904 14 322431.3 2833 11121 19241 44273 75.00
Eclipse 3904 14 376814.0 2833 11121 19241 44273 6&93
S p artan  XL 480 20 137433.6 947 4403 7859 18227 40.00
S p artan  II 240 16 148214.4 447 2175 3903 9087 27.77
DSPLM 2526 06 68707.20 7314 21930 36546 80394 90.26
DSRFPGA 2208 10 64473.60 11410 34210 57010 125410 95.65
LP_PGA 1088 17 259977.6 1948 8460 14972 34508 4&96
52
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
CHAPTER 4
RESULTS AND DISCUSSIONS 
The first half of this chapter interprets the collected data. With the help of 
some useful graphs, the next few pages show the relative performance of logic 
blocks for each benchm ark circuit. This analysis emphasizes the fact th a t 
coarse grained logic blocks are faster than  the fine grained versions. So, the 
next phase in our research was to try  and improve the functionality of the 
most common fine grain architecture, Actel’s Proasicplus logic tile. The 
second half of th is chapter describes the methodology and presents three 
novel architectures which show significant improvement in term s of area, 
delay or both.
4.1 Boolean Function
4.1.1 Number of Blocks
The following graph shows the variation in the num ber of logic blocks used 
among different logic hlock architectures. This plot clearly indicates that, for 
successful im plem entation of the boolean function, the fine grained cells use 
up more logic blocks than  the coarse grained cells. Among these logic block
53
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
architectures, the Proasicplus, and the DSR FPGA logic cells use the highest 
num ber of logic blocks (13), while the Spartan  II uses the least (2).
A rc h ite c tu re s
Figure 4.1: Comparison of the num ber of logic blocks for B1
4.1.2 Occupied Area:
The area occupied by the entire design among different architectures is 
compared in the following graph. We can observe th a t the coarse grained 
logic cells occupy more area than  the fine grained versions. However th is gap 
reduces as the designs get more and more complex. For th is benchmark, the 
Spartan  II occupies the highest area and the DSPLM occupies the least 
am ount of area.
54
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
1400
1200
1000 -
5  800
CO 
: 600
400
200
0 .
X
<oeukPiœQ
<O
eu
A rch itecu res
Figure 4.2: Comparison of the occupied area for B l
4.1.3 Delay
The total delay as a function of both combinational as well as routing 
delay is plotted for two cases. The first graph depicts the total delay purely as 
a function of the combinational delay through the logic blocks. This analysis 
indicates th a t the delay is least in case of Spartan XL, Mercury, Pasic 3 and 
Eclipse and is highest for DSR FPCA.
The second graph is a plot of the to tal delay when routing is also 
considered. As can he seen, the delay is least in the case of Pasic 3 and 
Eclipse architectures, hut is highest in DSR FPCA.
55
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
40 --
3 3
Oh CL Oh
Cl
cd CL
CO
Û
A rc h ite c tu re s
Figure 4.3: Comparison of total delay when Rd = 0 for B l
500 
450 
400 
350 - 
^  300 -  
2  250 -
I
E- 200 i  
150 
100 -  
50
3CL
CÜ
A rc h ite c tu re s
Figure 4.3: Comparison of total delay when Rd = IODl for B l
56
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
4.1.4 Utilization
The utilization factor for all the architectures is compared in the  following 
graph. The utilization factor varies quite a lot between different logic cells. In 
general, the utilization factor is high for fine grained cells th an  coarse 
grained cells. From the graph, we see that, utilization is highest for Pasic 3 
and lowest for Spartan II
60
&H
O 40
-3
Q
3
Oh
A rc h ite c tu re s
Figure 4.5: Comparison of U tilization Factor for B l
4.2 32-hit Adder
4.2.1 Number of Logic Blocks
A comparison between different architectures w ith respect to the num ber 
of logic blocks used to implement a 32-hit ripple carry adder is shown in the 
following figure.
57
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
250
200
150
œ 100
e
50
J H
CL
ë
Cl,k
g
Q
3
CL
A rc h ite c tu re s
Figure 4.6: Comparison of num ber of logic cells for B2
As before, the num ber of logic blocks required for the design varies with 
functionality. Actel’s Proasicplus logic cell requires the highest num ber of 
blocks while the Spartan  II requires the least. Although the  DSP LM and the 
DSR FPGA are fine grained logic cells, their design is optimal for the 
implementation of an adder and hence the num ber of logic cells is not very 
high.
4.2.2 Occupied Area
A plot comparing the occupied area for the design is presented. As seen in 
the graph, the fine grained cells occupy more area th an  the coarse grained 
ones. The occupied area is highest for Eclipse and is least for DSP LM.
58
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
14000
12000
10000
'5
q  8000
œ
c
w 6000 2 <
4000
2 00 0  —
S
Q
gCL
S
3
hJ
A rch itec tu re s
Figure 4.7: Comparison of the area occupied for B2
4.2.3 Delay
Figure 4.8 is a graph comparing the total delay which is plotted as a 
function of combinational delay of the logic blocks. This graph shows th a t 
even without any routing issues, the fine grained cells have larger delays 
than  the coarse grained cells and the same performance is observed when 
routing is considered. The Spartan  II proves to be the fastest logic cell while 
the Actel Proasicplus is the slowest.
59
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
600  -
500 --
« 400
Q
3o 300
200  -
X
CL
aCLk
CïC
g
Architectures
Figure 4.8: Comparison of Total delay when Rd = 0 for B2
10200
8160
^  6120
4080
w o ■3
K X CO
I
£
< g
< 1
CL
1 s a 3■s CL CL CL^
CL w s
w
Q
CL
CLC/]
eg- Q
A rc h ite c tu re s
Figure 4.9: Comparison of Total Delay with Rd = IODl for B2
60
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
4.2.4 Utilization Factor
The variation in the utilization factor among different logic block 
architectures is plotted below. We can observe th a t the fine grained logic cells 
like DSP LM, DSR FPGA and Pasic 1 achieve very high utilization factors, 
while the  coarse grained cells like Spartan II show poor utilization.
Q
g
Oh
Architectures
Figure 4.10: Comparison of U tilization factor for B2
4.3 16-bit M ultiplier
4.3.1 Number of Logic Blocks
The num ber of logic cells required to implement a 16-bit m ultiplier in 
different architectures varies as shown in the graph below. The results 
obtained are very sim ilar to the previous benchmarks. Actel’s Proasicplus
61
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
takes up the highest num ber of logic blocks while Xilinx’s Spartan II requires 
the least.
4200
3500 -
2800
2100
I  1400
700
n
X 3CL
Cd
ë'^1
s
A rchitectures
Figure 4.11: Comparison of num ber of logic blocks for B3
4.3.2 Occupied Area
The area occupied by the  design is compared among various logic cells. 
Since the design requires such a huge number of logic blocks, the area 
occupied by the design is directly related to the num ber of logic blocks. As can 
be observed, the Eclipse logic cell m arks the upper bound while the DSP LM 
m arks the lower bound.
62
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
400000
350000
300000
I  250000 
P
m 200000 
I  150000 
100000 
50000 -
g
&HPi
A rchitectures
Figure 4.12: Comparison of occupied area for B3
4.3.3 Delay
The total delay of the design as a function of combinational delay alone is 
plotted below. From the graph, we can observe th a t the delay is largest for 
the Proasicplus architecture and sm allest for the Spartan II logic cell. A 
sim ilar performance is observed when routing delay is also considered. This 
can be attribu ted  to the number of logic blocks in the critical path, which is 
considerably lesser for a coarse grained cell than  a fine grained logic block.
63
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
2 50 0
2000
^  1500
^  1000 I
500 —
hJ(L gpH g
A rchitectures
Figure 4.13: Comparison of total delay when R d  = 0 for B3
36000
30000
24000
18000
12000
6000
.B* S 3 3CL £L
pH s ë Q
ÙH
Cd a.'
CO
eg- Q
A rchitectures
Figure 4.14: Comparison of total delay when Rd = 10 Dl for B3
64
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
4.3.4 Utilization Factor
The following graph compares the utilization factor amongst different logic 
block architectures. Similar to previous occasions, the fine grained cells show 
better utilization factors than  the coarse grained architectures.
100
80
60
g 40
3
20
CO 0-. (L
PL
g
A rchitectures
Figure 4.15: Comparison of U tilization Factor for B3
4.4 16-bit MAC U nit
4.4.1 Number of Logic Blocks
The variation in the num ber of logic blocks is shown in the following 
graph. Architectures with a memory element in  their design require lesser 
cells to implement the MAC unit. The key point to be noticed is the increase 
in the num ber of logic blocks by DSP LM and the DSR FPCA. This is due to 
the fact th a t these cells do not have a flip-flop associated w ith their design,
65
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
and hence require a very high num ber of logic blocks. Actel’s Proasicplus 
requires the highest num ber of logic blocks and the Spartan  II architecture 
uses the  least num ber of cells.
4500
4000 -
3500 -
3000 -
f  2500 -
2000  -
1500 "
1000
500
r i n
CA ÇU
A rchitectures
S
I
3cukcd
S
pH
Figure 4.16: Comparison of the number of logic blocks for B4
4.4.2 Occupied Area
In previous cases we had observed th a t the area of the implemented 
design was high for a coarse grained cell than  a fine grained cell. However in 
th is case, the difference isn’t  much since the fine grained cells lack a memory 
element. Quicklogic’s Pasic 1 sets the  upper limit on the occupied area while 
DSR FPCA sets the lower limit. The following graph shows the area occupied 
among different architectures.
66
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
400000
350000
300000
5 250000 
<
% 200000
150000
100000
50000
-J
Q
< <0 O
CL CL,
1 s'Q
Architectures
Figure 4.17: Comparison of Occupied Area for B4
4.4.3 Delay
The graph below compares the total delay w ith respect to the 
combinational delay alone. As can be seen, due to the nature of the 
benchmark circuit, the architectures w ith a memory element achieve better 
speeds than  the ones w ithout any memory element. The delay is largest for 
the DSR FPCA and is sm allest for the Spartan II.
67
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
12000
10000
8000
.2 6000
g
O 4000
2000  -
X
g
3
cd
A rch itec tu re s
Figure 4.18: Comparison of total delay when Rd = 0 for B4
140000
120000
100000
« 80000
60000
40000
20000
CL
O
k
œ
Û
A rc h ite c tu re s
Figure 4.19: Comparison of Total Delay when Rd = lODLfor B4
68
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
4.4.4 Utilization Factor
Similar to previous benchmarks, the utilization factor was better for the 
fine grained cells th an  the coarse grained ones. Spartan  II again m arks the 
lower bound while the DSR FPGA m arks the upper bound.
60
40
20 —
X
Q
S(Lk
W
Û
A rchitectures
Figure 4.20: Comparison of U tilization Factor for B4
4.5 Observations
The results provided in the earlier sections are further analyzed and 
meaningful conclusions drawn. The following pages provide an insight 
towards the architectural design of a logic block.
69
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
4.5.1 G ranularity
From the results obtained, we know th a t the granularity  of a logic cell 
plays a very im portant role in its performance. For complex designs fine 
grained cells (logic cells with little functionality) are much slower than  the 
coarse grained cells (logic cells w ith high functionality). This is because of the 
increased num ber of logic blocks needed to implement a particular design. 
Hence it  is imperative th a t a logic block be more functional. On the other 
hand, the design of a logic block cannot be very complicated either. This 
would increase the combinational delay through the logic block to such a high 
extent th a t we wouldn’t gain anything in the end.
The other aspect associated w ith granularity  is the area of the logic cell. A 
coarse grained logic block takes up much more area th an  the fine grained 
block. But this factor is over shadowed by the gain in speed as discussed 
later.
4.5.2 Utilization Factor
One of the unique aspects in our study of FPGA logic block architectures is 
the estim ation of utilization factor. Our research shows th a t the fine grained 
logic cells achieve higher utilization factors th an  their coarse grained 
competitors. The price associated w ith th is improvement is speed again. In a 
fine grained cell, there are very few logic gates present in the architecture. 
These gates are highly essential for the logic cell to perform its operations. 
Where as, in a coarse grained cell, we can observe a redundancy with respect
70
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
to the  logic gates. This is provided for the sole purpose of increasing 
flexibility, which is an im portant factor for FPGAs as they are used for a wide 
variety of applications. These highly flexible architectures show very 
promising performances w ith respect to speed.
4.5.3 Speed of the  design
The two im portant aspects separating an FPGA from an ASIC are density 
and speed of a particular design. FPGAs are primarily used for prototyping 
and low cost design im plem entations and in order to show comparable 
performance to an ASIC, there is a need for these FPGAs to be faster. From 
the data previously obtained, we can easily make out tha t FPGAs whose logic 
blocks have higher degrees of functionality are much faster than  the ones 
with lesser functionality.
The above analysis holds good in spite of the fact th a t the coarse grained 
cells have very large combinational delays through the logic block. Since the 
delay of an implemented design on an FPGA is dominated by routing, the 
improvement in speed is quite significant for a high functionality logic block 
due to the reduced num ber of stages in the critical path  of the design.
For example, consider the im plem entation of a 16-bit Multiplier. The 
Xilinx Spartan II, which is a coarse grained logic block, approximately has 29 
logic blocks in its critical path  while the Actel Proasicplus, which is a fine 
grained cell, has around 326 logic blocks. This factor directly reflects on the
71
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
im plem entation speed between the two architectures. The Xilinx Spartan II 
is around 5 times faster than  the Actel Proasicplus.
Generalizing on the above result, th is is the main reason why look-up- 
table based FPGAs have dominated the programmable logic m arket for so 
long and still continue to do so. Since LUTs offer much greater functionality 
and speed than  any other kind of programmable architecture they are best 
suited for a wide range of end user applications.
4.5.4 Utility of a Flip-Flop
Most of the present day algorithms and designs need certain degree of 
pipelining in order to achieve better speeds. This makes it necessary for a 
logic block to have a register as pa rt of its architecture. For example, consider 
the im plem entation of the 16-bit MAC unit. From the data, it is quite clear 
tha t logic blocks w ith memory elements in their design are much faster and 
than the ones without it. Even among the fine grained cells, Actel’s 
Proasicplus, which can be configured as a D flip-flop, is a t least 3 times faster 
than  DSR FPGA which cannot be configured as a flip-flop. Hence in all of 
today’s commercial architectures, we can observe the ability of the logic cell 
to be configured as a memory element.
4.6 Functional Improvement of Actel’s Proasicplus
4.6.1 Actel’s Proasicplus:
The architecture of the original Proasicplus is shown below.
72
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
oFigure 4.21: The Proasicplus logic tile
4.6.1.1 Logic Capabilities
The following table lists the boolean functions th a t can be implemented by 
a single logic block. The first column specifies the boolean function th a t is 
being implemented while the next four columns provide the configuration 
data tha t is required in order to implement th a t particular function.
From the table we can clearly make out th a t the Proasicplus logic cell can 
implement all 2-variable and some of the 3-variable functions. This block also 
has the ability to be configured as a D-flip-flop. The only drawback is that, 
when the cell is configured as a memory element, it cannot be used to 
implement any logic. In  other words, th is logic cell cannot register its own 
output.
73
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
Table 4.1: Logical Capabilities of the Actel Proasicplus
Boolean Inputs
Functions A B C D
2-AND Gate 0 X 1 Y
2-OR Gate X Y 1 1
2-XOR Gate X Y 1 X’
3 AND Gate X Y’ Z’ Z
3-NOR Gate X’ Y z Z’
D Flip Flop D CLK CLR
3-NAND Gate Not Possible
3 0R  Gate Not Possible
3-XOR Gate Not Possible
4.6.2 Motivation for Improvement
The architecture of the Actel Proasicplus clearly indicates that the logic 
cell can be used to implement boolean functions or can be configured as a flip- 
flop. But the two operations cannot he implemented simultaneously. Hence 
our goal was to modify the original architecture, so th a t the logic cell could 
register its own output. In  this process, we came up w ith three promising 
architectures whose abilities are either same or a t least twice as much as the 
original Proasicplus logic tile.
74
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
4.7 Modified Architectures
This section gives a brief introduction to the modified architectures and 
also provides an insight on their functional capabilities.
4.7.1 M odi: Logic Cell with D Flip Flop
The first modified architecture is as presented below. An im portant 
feature of th is architecture is the inclusion of the D-flip flop. As shown in the 
following table, the functional ability of this logic cell is very similar to the 
original Proasicplus, except for the fact th a t this modified logic block can be 
used for sequential as well as combinational purposes simultaneously. Also 
the second output line th a t was added provides better flexibility and 
improves performance.
A
D'
>
CLR Q
Clk Clr
01
02
Figure 4.22: Architecture of Modi
75
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
Table 4.2: Logical Abilities of Modi
Boolean
Functions Inputs Outputs
A B C D 01 02
2 AND Gate 0 X Y 1 V
2-QR Gate X Y 1 1 V
2-XOR Gate X Y X’ 1 V
8 AND Gate X Y’ Z’ Z V
3-NOR Gate X’ Y z Z’ V
3-NAND Gate X Y Z’ z V
3 0R  Gate X’ Y z Z’ V
D-Flip Flop D CLK CLR V
3-XOR Gate Not possible
4.7.2 Mod2:
The second modified architecture is presented in the following figure. This 
logic cell has 8 inputs and 4 outputs and can be analyzed by splitting it into 
two sections. Each section has 4 inputs and 2 outputs and is capable of 
implementing all the Boolean functions th a t the original Proasicplus cell can. 
By combining two such halves, the resulting logic block is doubly efficient. 
The functional ahilities of th is block are listed in the table below. This logic
76
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
block can implement all 2 and 3-variable functions and most of 4 and 5 
variable functions. A part from all this, the logic cell can also be configured as 
a D flip-flop.
o i
0 2
03
Figure 4.24: Architecture of Mod2
77
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
Table 4.3: Logical Capabilities of Mod2 and Mod3
Boolean
Functions
In p u ts O utputs
A B c D E F G H 0 1 02 0 3 0 4
2-AND G ate 0 X Y 1 0 P Q 1 V 7
2-OR G ate X Y 1 1 P Q 1 1 7 V
2-XOR G ate X Y X’ 1 P Q P ’ 1 V 7
3-AND G ate X Y’ Z’ Z P Q’ S’ S 7 7
3-NOR G ate X' Y z Z’ P ’ Q s S’ 7 7
3-NAND G ate X Y’ Z’ Z P Q’ S’ s V V
3-OR G ate X' Y z Z’ P ’ Q s S’ V V
3-XOR G ate X Y X' 1 Z 0 2 Z’ 1 V
4-AND G ate w X' Y’ Y 0 02 z 1 V
4-NOR G ate w X Y Y’ 0 02 Z’ 1 7
4-NAND Gate w X' Y’ Y 0 02 z 1 V
4-OR Gate w X Y Y’ 0 02 Z’ 1 V
4-XOR G ate N ot Possible
5-AND Gate P Q’ R’ R S’ 02 T s V
5-NOR Gate P ’ Q R R’ s 02 T’ S’ V
5-NAND Gate P Q’ R’ R S’ 02 T s V
5-OR Gate P ’ Q R R’ s 02 T’ S’ V
5-XOR G ate Slot Possible
D-Flip Flop 0 Clk D Clr 01 Clk 0 4 Clr 7
78
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
4.7.3 Mod3
One of the architectural features in the previous modifications as well as 
the original Proasicplus logic cell is the choice in inputs. That is, all the 
inputs are available in both true and complemented forms. But the design of 
the logic cell, allows only one of these inputs to be used at any given point in 
time. Hence in our th ird  modification, we designed the logic block without 
th is flexibility. As can be seen in the following figure, this approach reduced 
the to ta l num ber of gates by a big m argin w ithout any loss of functionality. 
This cell also can implement all 2 and 3 variable functions and most of 4 and 
5 variable functions. The configuration of the logic block for various Boolean 
functions is sim ilar to the previous version and is as tabulated in table 3.
A
G
D
H
O l
02
0 3
0 4
Figure 4.24: Architecture of Mod3
79
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
4.8 Transistor Level Modeling
In  order to better understand the area and delay requirem ents of the 
above architectures, we also did a transistor level analysis. Following CMOS 
logic style, the pre-layout simulations were done in Pspice w ith a O.lSgm 
model card. The layout was carried out in MAGIC and the extracted files 
were later sim ulated again in Pspice and verified for tim ing issues.
4.8.1 Layout Specifications
In this era of deep sub micron devices, we used a TSMC 0.18pm technology 
file with À = 0.09pm, for our layout purposes. This technology file consists of 6 
m etal layers and 1 poly layer and is for 1.8 volt applications.
Since none of the transisto r level information is available in any of the 
data sheets, the devices were assum ed to have minimum feature size. That is 
the width of the transistor was equal to 0.36pm and the length of the 
transistor was 0.18pm. This also ensured th a t all the architectures were 
optimized for area. Of the available 6 m etal layers, two were used for this 
study. Metal 1 was used for all inter-cell routing and Metal2 was used for all 
intra-cell routing.
The following four figures represent the layouts of all the architectures.
80
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
rB I
4 %
Figure 4.25: Layout of Actel Proasicplus Logic Block
81
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
m i -
a»***:
Figure 4.26: Layout of Modi
82
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
SAW
0
"V V M ^
%
ë
Figure 4.27: Layout of Mod2
83
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
Figure 4.28: Layout of Mod3
4.9 Performance Comparison
Performance comparison among these architectures is two fold. The first 
metric is area, which can be represented in term s of num ber of transistors 
and actual layout area. However, it is im portant to note th a t the layout area 
does not serve as a metric since different designers m ight layout the same 
design differently. Hence, transistor count is used as the deciding factor 
among different architectures. For all the architectures, the layout area and 
the number of transistors are presented in the following table.
84
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
Table 4.4: Performance comparison in terms of Area
Architecture # of T ransistors Area (in mm^)
Proasicplus 74 6.89
M odi 66 6.73
Mod2 112 9.53
Mod3 48 4.00
As seen in the table, the original Proasicplus design requires about 74 
transistors and an area of 6.89 Sq mm. The first modified architecture (the 
one w ith the flip-flop) has a transisto r count of 66 which is about 10.8% less 
than  the original cell. Although the second modified architecture needs 112 
transistors, th is design is twice as capable as the original cell. That is, the 
original logic block needs a t least 148 transistors (excluding routing) to 
perform the same functions as the mod2 architecture. Hence on the whole, we 
save about 24.3% in area. The most area efficient amongst all the above 
architectures is the mod3. This design offers high functionality with a very 
low transisto r count (48) and hence saves about 67.5% in area. From an area 
perspective, mod3 is the most efficient.
The second metric used for performance comparison is the propagation 
delay of a logic block. This delay varies with the function that is being 
implemented. As shown below, the  propagation delay was measured for
85
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
different boolean functions. The original Proasicplus has an average delay of
228.4 ps. However the M odi architecture, which offers the sam e 
functionality, has an  average delay of 206.42 ps and hence proves to be 
around 9.6% faster. For the same functionality, the Mod2 and ModS 
architectures have an average delay of 195.5 ps and 170.21 ps respectively, 
which m eans th a t the speed of the Mod2 is be tter by at least 14.4% and th a t 
of Mods by 25.2%. This improvement in delay is also evident in designs th a t 
exploit the full functionality of these logic cells.
86
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
Table 4.5: Propagation Delay of the four architectures
Boolean P ropagation  Delay ( in ps)
F unction Proasicplus M odi Mod2 Mod3
2-AND G ate 23&5 22A5 228.5 187.5
2-OR G ate 23&5 227.5 228.5 187.5
2-XOR G ate 23h 5 22&5 185.25 187.5
3'AND G ate 230.0 23&5 212.5 187.5
3-NOR G ate 20&0 23Œ5 190.5 189.0
3-NAND G ate 155.0 172.5 128.25
3-OR G ate 142.5 149.75 128.0
3-XOR G ate 386.5 386.0
4-AND G ate 420.0 386.0
4-NOR G ate 416.5 38&0
4-NAND G ate 372.0 342.0
4-OR G ate 38h5 342.0
4-XOR G ate Not Possible
5-AND G ate 422.5 38&0
5-NOR G ate 416.5 386.0
5-NAND Gate 376.5 342.0
5-QR Gate 38Œ5 342.0
5-XOR G ate N ot Possible
87
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
Among the four architectures, Mod3 is the fastest and offers very high 
functionality and occupies the least amount of area. These architectures were 
also tested  w ith the previous benchmarks and the results are as shown below
Table 4.6: Comparative Results for Bencham rk l
C om binational Delay of th e  design U tilization
A rchitecture N l Dl A rea Rd=0 Rd=2D Rd=4D Rd=10D Percentage
Proasicplus 13 06 356.7 24 84 144 324 34 4 6
M odi 10 6 279.3 22 94 166 382 51.10
Mod2 5 13 250.8 28 56 84 168 4&82
Mod3 5 10 112.4 20 40 60 120 64^4
Table 4.7: Comparative results for Benchmark-2
C om binational D elay of th e  design U tilization
A rchitecture Nl Dl A rea Rd=0 Rd=2D Rd=4D Rd^lOD Percentage
Proasicplus 224 06 7042.56 576 2476 4376 10076 32hO
M odi 192 06 5362.5 352 1864 3376 7912 45.37
Mod2 80 10 4815.36 320 1636 2952 6900 50.00
Mods 80 10 1798.4 240 1180 2120 4940 71.42
88
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
Table 4.8: Comparative Results for Benchmark-3
Com binational Delay of th e  design U tilization
A rchitecture Nl Dl A rea Rd=0 Rd=2D Rd=4D Rd=10D Percentage
Proasicplus 4128 06 15694848 1962 8482 15002 34562 40.00
M odi 3648 10 10188&64 1200 6408 11616 27240 45.34
Mod2 1456 10 73032.96 1093 5629 10165 23773 50.00
M ods 1456 10 32730.88 820 4060 7300 17020 71.42
Table 4.9: Comparative results for benchmark-4
C om binational Delay of th e  design U tilization
A rchitecture Nl Dl A rea Rd=0 Rd=2D Rd=4D Rd=10D Percentage
Proasicplus 4384 06 137832.96 2541 10981 19421 44741 3&66
M odi 3840 06 107251.20 1552 8296 15040 35272 45.82
Mod2 1568 78650.88 1425 7333 13241 30965 48.52
M ods 1568 35248.64 1069 5289 9509 22169 6T85
89
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
CHAPTER 5
CONCLUSIONS AND FUTURE WORK
5.1 Summary and Contributions
Currently, most of the logic blocks are either LUT based or multiplexer 
based. LUT based logic blocks offer very high functionality but also need 
more area, which grows exponentially w ith its inputs. The multiplexer based 
logic blocks can be chosen as an alternative bu t their functionality is not as 
high as their competitors. However, these logic blocks are much more area 
efficient and are a lot faster.
Our study was aimed at analyzing the logic block architectures of various 
commercial as well as academic FPGAs and to be tter understand the relative 
m erits and demerits of each cell. For th is purpose, we modeled the 
architectures at RTL level using Aldec’s Active-HDL and then implemented 
benchm arks on them. The data collected revealed tha t, as the number of logic 
blocks required for a given design increased, the multiplexer based logic cells 
performed much slower than  the look-up-table based logic cells. We found out 
th a t among all the tested architectures, the Xilinx Spartan II was the fastest 
and has the highest functionality.
90
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
Our analysis pointed th a t the lack of speed in the fine grained cells was 
directly related  to their functionality. This was further tested  w ith the Actel 
Proasicplus logic cell. Three new architectures were proposed whose logical 
abilities were better th an  the Proasicplus. When tested w ith the same 
benchm arks again, these new cells proved to be much faster and were more 
area efficient when compared to the original Proasicplus. Extending our 
analysis to transisto r level, these cells were laid out in MAGIC using the 
TSMC 0.18pm technology file w ith X = 0.09pm. It was then observed that, 
although all the new blocks were faster, the Mod3 achieved the best area 
delay product.
5.2 Future Work
In future it would be in teresting to extend the transistor level analysis to 
estim ate power. W ith increasing need for low power, low energy devices, an 
analysis of power consumption and dissipation could be very fruitful for 
future programmable solutions.
The other possible research area is to improve logic block architectures for 
specific applications. For example, the DSP LM was designed for DSP related 
applications. However due to the lack of a memory element in the logic block, 
this architecture fails to im press when used for the im plem entation of 
pipelined designs. Hence, future work could be concentrated on improving
91
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
these architectures so th a t the gap between FPGAs and ASICs could be 
reduced at least for some specific cases.
Finally, it would be beneficial to evaluate the logic cells using the latest 
CAD tools. One of the major hurdles in th is study was the lack these tools. 
Hence, in future w ith our synthesizable VHDL codes, other tools can be 
in tegrated to perform tasks like technology mapping as well as placement 
and routing. This will enable us to emulate the architectures in a better way 
thereby more meaningful results can be obtained. Also, CAD support will 
help us in implementing bigger and more complex benchmarks for 
performance evaluation.
92
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
APPENDIX A 
CONFIGURATIONS SCHEMATICS OF ACTEL 
PROASICPLUS LOGIC BLOCK
0  =  X & Y
Figure Al: 2-AND Gate using Actel's Proasicplus Logic Block
0  =  X + Y
Figure A2: 2 0R  Gate using Actel's Proasicplus Logic Block
93
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
X X
Y
‘ 1’
O = X xor Y
X' X’
Figure A3: 2-XOR Gate using Actel's Proasicplus Logic Block
X X
Y’
0  = XYZ
Figure A4: 3 AND Gate using Actel's Proasicplus Logic Block
94
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
X -{ > - X’
Y
0  = (X’ Y’ Z’)
Figure A5: 3-NOR Gate using Actel Proasicplus Logic Block
D
CLK
CLR
QD
Figure A6: D Flip-Flop using Actel's Proasicplus Logic Block
95
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
APPENDIX B
CONFIGURATIONS SCHEMATICS OF MODI ARCHITECTURE
_Y_
01
D *  Q
>
O.R Q
C% = X & Y
Clk Clr
Figure B l: 2 AND Gate using Modi architecture
01
D Q
0 2  =  X & Y
Clk Clr
Figure B2: 2 0 R  Gate using M odi architecture
96
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
J L
X’
01
02 = X xor Y
Clk Clr
1”
Figure B3: 2-XOR Gate using M odi architecture
X
TL
01
02 = XYZ
Clk Clr
Figure B4: 3-AND Gate using M odi architecture
97
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
X’
z"
D Q
t>
CLR Q
01
0 2  = X’ Y’ Z’
Clk Clr
Figure B5: 3*N0R Gate using M odi architecture
X
>
Clk Clr
01 = (XYZ)'
02
Figure B6: 3- NAND Gate using M odi architecture
98
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
X’
0 1  = X + Y + Z
H > - >
02
Clk Clr
Z’'
Figure B7: 3-OR Gate using M odi architecture
99
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
APPENDIX C 
CONFIGURATIONS SCHEMATICS OF M 0D2 LOGIC BLOCK
oi
0 2  =  X  &  Y
r { >
0 3
0 4  =  P  &  Q
Figure Cl: 2-AND Gate using Mod2 architecture
100
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
Ol
0 2  = X + Y
0 3
_ Q _
Figure C2- 2-OR Gate using Mod.2 architecture
01
0 3
0 4  = P + Q
Figure C3: 2-XOR Gate using Mod2 architecture
101
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
Ol
0 2  = XYZ
03
0 4  = PQR
Figure C4: 3 AND Gate using Mod2 architecture
01
0 2  = (X+ Y+ Z)’
03
0 4  = (P+ Q+ R)’
Figure C5: 3-NOR Gate using Mod2 architecture
102
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
O l = (XYZ)'
02
0 3  = (PQR)’
Figure C6: 3- NAND Gate using Mod2 architecture
0 1 = X  + Y +Z
02
0 3  =  P + Q +R
Figure Cl- 3-OR Gate using M odi architecture
103
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
01
02
0 3
0 2
0 4  - X xor Y xor ZN/A
Figure C8: 3-XOR Gate using Mod2 architecture
01
02
0 3
02
0 4  = WXYZN/A
Figure C9: 4-AND Gate using Mod2 architecture
104
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
O l
02
0 3
02
N/A 0 4  = (W + X + Y + Z)’
Figure ClQ: 4-NOR Gate using Mod2 architecture
0 1
02
0 3  = (WXYZ)’
02
0 4N/A
Figure C ll:  4-NAND Gate using Mod2 architecture
105
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
w01
02
0 3  = W + X  + Y + Z
0 4N/A
Figure C12: 4-OR Gate using Mod2 architecture
01
02
0 3
02
0 4  = PQRSTN/A
Figure C l3: 5-AND Gate using Mod2 architecture
106
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
01
02
0 3
0 2
0 4  = (P+Q+R+S+T)’N/A
Figure Cl4-‘ 5-NOR Gate using Mod2 architecture
01
02
0 3  = (PQRST)’
02
N/A
Figure C l5 ’- 5-NAND Gate using Mod2 architecture
107
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
01
02
0 3  = P+Q+R+S+T
0 4N/A
Figure C16: 5 0R  Gate using Mod2 architecture
01
02
C lk
01
N/A
0 4
03
0 4  = QDC lk
C lr
C lrN/A
Figure Cl?: D Flip Flop using Mod2 architecture
108
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
APPENDIX D 
CONFIGURATIONS SCHEMATICS OF MODS LOGIC BLOCK
O l
0 2  =  X & Y
0 3
0 4  =  P & Q
Figure D l: 2 AND Gate using ModS architecture
109
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
O l
02  = X+Y
03
0 4  = P+Q
Figure D2: 2'OR Gate using ModS architecture
01
0 2  =  X  x o r  Y
0 3
0 4  =  P  x o r
Figure D3: 2-XOR Gate using ModS architecture
110
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
oi
0 2  = XYZ
0 3
0 4  = PQR
Figure D4: 3 AND Gate using ModS architecture
01
0 2  = (X+Y+Z)'
0 3
0 4  = (P+Q+R)’
Figure D5: 3-NOR Gate using Mod3 architecture
111
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
O l = (XYZ)’
02
0 3  = (PQR)’
0 4
Figure D6: 3-NAND Gate using Mod3 architecture
0 1  =X+Y+Z
02
0,3 = P+Q+R
0 4
Figure D7: 3 0R  Gate using Mod3 architecture
112
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
01
02
0 3
0 4  = X xor Y xor ZN/A
Figure D8: 3-XOR Gate using ModS architecture
01
02
0 3
0 4  = WXYZN/A
Figure D9: 4 AND Gate using ModS architecture
113
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
wOl
02
03
04 = (w+x+y+z)’N/A
Figure DIO- 4-NOR Gate using ModS architecture
01
02
0 3  = (WXYZ)’
04N/A
Figure D ll :  4-NAND Gate using ModS architecture
114
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
w01
02
0 3  = W +X+Y+Z
0 4N /A
Figure D12' 4 0R  Gate using ModS architecture
01
02
03
0 4 = P Q R S TN /A
Figure DIS: 5 AND Gate using ModS architecture
115
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
01
02
0 3
0 4  = (P+Q+R+S+T)'N/A
Figure D14= 5-NOR Gate using ModS architecture
O l
0 2
0 3  =  (P Q R S T )'
0 4N/A
Figure D15: 5 NAND Gate using ModS architecture
116
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
01
02
0 3  = P + Q + R + S + T
0 4N/A
Figure D16: 5 0R  Gate using Mod3 architecture
01
C lk
02
N /A
0 4
0 3
0 4  =  Q DC lk C lr
N/A
Figure D17: D Flip Flop using ModS architecture
117
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
BIBLIOGRAPHY
[1] W. Miller. “Real World Applications for Field Programmable Gate Array 
Devices”, in the IE E E  M icroelectronics Conference, pages 548-551, 1994.
[2] A. M adanayake, L. Bruton and C. Comis. “FPGA Architectures for 2D/3D 
FIR/IIR plane wave filters”, in the IE E E  Circuits and System s, pages 613- 
616, 2004.
[3] V. Hopkin and B. Kirk. “FPGA M igration to ASICs”, in the IE E E  
M icroelectronics Conference, pages 268-271, 1995.
[4] W.A.Moreno, K. Poladia. “Field Program m able Gate Array Design for an 
Application Specific Signal Processing Algorithms”, pages 222 225, 1998.
[5] S. Hauck. “The Role of FPGAs in Reprogrammable Systems”, in the 
Proceedings o f the IEEE, Volume 86, Issue 4, pages 615 638, April 1998.
[6] S. Brown, S. Francis, J. Rose and V. Vranesic. “Field Programmable Gate 
Arrays”, K luw er Academic P ublishers 1992.
[7] J. Rose, A.E.Gamal, A.S. Vincentelli. “Architecture of Field-Programmable 
Gate Arrays”, in the IE E E  Journal, Ju ly  1993.
[8] S. Brown and J. Rose. “FPGA and CPLD Architectures: A Tutorial”, in the 
IE E E  Design and  Test o f Computers, pages 42-57, summer 1996.
[9] S. Brown. “FPGA A rchitectural Research: A Survey”, in the IE E E  Design 
and Test o f Computers, w inter 1996.
118
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
[10] D. Bhatia. “Field Programmable Gate Arrays. A cheaper way of 
customizing product prototypes”, in the IE E E  Potentials, Volume 13, Issue 1, 
pages 16-19, 1994.
[11] V. Hopkin and B. Kirk. “FPGA M igration to ASICs”, in the IE E E  
M icroelectronics Conference, pages 268-271, 1995
[12] S. Singh, P. Chow and J. Rose and D. Lewis. “The Effect of Logic Block 
Architecture on FPGA Performance”, in the IE E E  Journal o f Solid  S ta te  
Circuits, Volume 27, pages 281-287, March 1992
[13] V. Betz, J. Rose and A. M arquardt. “Architecture and CAD for Deep Suh- 
Micron Devices”, K luw er Academ ic Publishers, New York, 1999.
[14] A. M arquardt, V. Betz and J. Rose. “Speed and Area Tradeoffs in Cluster 
Based FPGAs”, in the IE E E  Transactions on VLSI, Volume 8, pages 84 93, 
February 2000.
[15] E. Ahmed. “The Effect of Logic Block G ranularity on Deep-Submicron 
FPGA Performance and Density”, M aster’s Thesis, University of Toronto, 
2001 .
[16] S. Trimherger. “Effect of Logic Block Architecture on FPGA Routing”, in 
the IE E E  Design A utom ation Conference, 1995.
[17] J. Rose, P. Chow, R. Francis and D. Lewis. “Architectures of Field 
Programmable Gate Arrays: The Effect of Logic Block Functionality on Area 
Efficiency”, in the IE E E  Journal o f Solid  S ta te  Circuits, Volume 25, pages 
1217-1225, October 1990.
119
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
[18] J.L. Kouloheris and A.El.Gamal. “FPGA Performance vs. Cell 
G ranularity”, in the IE E E  Custom In tegra ted  C ircuits Conference, 1991.
[19] D. Lewis, E. Ahmed, G. Baeckler, V. Betz, M. Bourgeault, D. Cashman, 
D. Galloway, M. Hutton, C. Lane, A. Lee, P. Leventis, S. M arquardt, C. 
McClintock, K. Padalia, B. Pedersen, G. Powell, B. Ratchev, S. Reddy, J. 
Schleicher, K. Stevens, R. Yuan, R. Cliff and J. Rose, "The Stratix II Logic 
and Routing Architecture", in the A C M  Sym posium  on FPGAs, pages 14-20, 
February 2005.
[20] D. Lewis, V. Betz, D. Jefferson, A. Lee, C. Lane, P. Leventis, S. 
M arquardt, C. McClintock, B. Pedersen, G. Powell, S. Reddy, C. Wysocki, R. 
Cliff, and J. Rose, "The Stratix  Routing and Logic Architecture", in the A C M  
Sym posium  on EPGAs, pages 15-20, February 2003.
[21] M. Pedram, B.S. Nobandegani, B.T.Preas. “Architecture and Routability 
Analysis for Row-based FPGAs”, in the lE E E /A C M  Conference for Computer- 
A ided Design, November 1993.
[22] J.S. Rose, R.J. Francis, P. Chow, and D. Lewis, "The Effect of Logic Block 
Complexity on Area of Program mable Gate Arrays", in the IE E E  Custom  
In tegrated  C ircuits Conference, San Diego, May 1989, pp. 5.3.1 - 5.3.5.
[23] V. Betz and J. Rose. "How much logic should go into an FPGA Logic 
Block”, in the IE E E  Design and  Test M agazine, pages 10-15, 1998.
[24] D. Hill and N.S. Woo. "The Benefits of Flexibility in Look-up Table 
FPGAs”.
120
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
[25] P. Chow, S. Seo, J. Rose, K. Chung, I Rahardja, and G. Paez. “The Design 
of SRAM based Field Programmable Gate Arrays: P a rt L Architecture”, in 
the  IE E E  transactions on VLSI, Volume 7, pages 191-197, June 1999.
[26] P. Chow, S. Seo, J. Rose, K. Chung, I Rahardja, and G. Paez. “The Design 
of SRAM based Field Programmable Gate Arrays: P a rt IP Circuit Design and 
Layout”, in the IE E E  transactions on VLSI, Volume 7, pages 321-330, 
Septem ber 1999.
[27] V. Betz and J. Rose. “Effect of Prefabricated Routing on FPGA Area 
Efficiency”, in the IE E E  transactions on VLSI, Volume 6, September 1998.
[28] S. Singh, J. Rose, D. Lewis, K. Chung, P. Chow "Optimization of Field- 
Program m able Gate Array Logic Block Architecture for Speed", in the IE E E  
Custom In tegra ted  Circuits Conference, May 1990, pp. 6.1.1 - 6.1.6.
[29] A. M arquardt, V. Betz, and J. Rose, "Using Cluster-Based Logic Blocks 
and Timing-Driven Packing to Improve FPGA Speed and Density”, in the 
A C M  Sym posium  on EPGAs, pp.37-46.
[30] V. Betz and J . Rose, "Cluster-Based Logic Blocks for FPGAs: Area- 
Efficiency vs. Input Sharing and Size", in the IE E E  Customs In tegra ted  
C ircuits Conference 1997, Santa Clara, CA, pp. 551-554.
[31] J. He, J. Rose, "Advantages of Heterogeneous Logic Block Architectures 
for FPGAs”, in the IE E E  Custom In tegra ted  Circuits Conference, San Diego, 
May 1993 pp. 7.4.1 - 7.4.5.
121
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
[32] M. Agarwala and P.T.Balsara. An Architecture for a DSP Field 
Program m able Gate Array”, in  the IE E E  Transactions on VLSI, pages 136 - 
141, M arch 1995.
[33] Hanbo Lee and Gerald. E. Sohelman. “Digit-Serial Reconfigurable FPGA 
Logic Block Architecture”.
[34] V. George, H. Zhang and J. Rahaey. “The Design of Low Energy FPGAs”.
[35] Actel D atasheets
[36] A ltera Datasheets
[37] Quicklogic D atasheets
[38] Xilinx D atasheets
[39] J. Rahaey, A. C handrakasan and B. Nilokic. “Digital Integrated Circuits. 
A Design Perspective”. Prentice H all E lectronics and V LSI Series.
[40] N. Weste and K. Eshraghian. “Principles of CMOS VLSI Design”.
122
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
VITA
G raduate College 
University of Nevada, Las Vegas
Rohith Ram nath
Home Address:
1600, E. Rochelle Ave., Apt 42 
Las Vegas, NV-89119.
Degrees:
Bachelor of Engineering, Electronics and Communication Engineering, 
2001, University of Mysore, India
Thesis Title'-
Analysis of logic hlock architectures and functional improvement of 
fine grained cells.
Thesis Examination Committee:
Chairperson, Dr. Yingtao Jiang, Ph. D.
Committee Member, Dr. Emma Regentova, Ph. D.
Committee Member, Dr. Venkatesan M uthukum ar, Ph. D.
Graduate College Representative, Dr. Ajit Roy, Ph. D.
123
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
