Performance and area optimization for reliable FPGA-based shifter design by Syed, Zahid Ali
Graduate Theses, Dissertations, and Problem Reports 
2008 
Performance and area optimization for reliable FPGA-based 
shifter design 
Zahid Ali Syed 
West Virginia University 
Follow this and additional works at: https://researchrepository.wvu.edu/etd 
Recommended Citation 
Syed, Zahid Ali, "Performance and area optimization for reliable FPGA-based shifter design" (2008). 
Graduate Theses, Dissertations, and Problem Reports. 1963. 
https://researchrepository.wvu.edu/etd/1963 
This Thesis is protected by copyright and/or related rights. It has been brought to you by the The Research 
Repository @ WVU with permission from the rights-holder(s). You are free to use this Thesis in any way that is 
permitted by the copyright and related rights legislation that applies to your use. For other uses you must obtain 
permission from the rights-holder(s) directly, unless additional rights are indicated by a Creative Commons license 
in the record and/ or on the work itself. This Thesis has been accepted for inclusion in WVU Graduate Theses, 
Dissertations, and Problem Reports collection by an authorized administrator of The Research Repository @ WVU. 
For more information, please contact researchrepository@mail.wvu.edu. 
Performance and Area Optimization for 
Reliable FPGA-based Shifter Design 
 
 
by 
 
 
Zahid Ali Syed 
 
 
Thesis submitted to the 
College of Engineering and Mineral Resources 
at West Virginia University 
in partial fulfillment of the requirements 
for the degree of 
 
 
Master of Science 
in 
Electrical Engineering 
 
 
Afzel Noore, Ph.D., Chair 
James D. Mooney, Ph.D. 
Hany H. Ammar, Ph.D. 
 
Lane Department of Computer Science and Electrical Engineering 
 
 
Morgantown, West Virginia 
2008 
 
 
Keywords: FPGA, Reliable Shifter Design, Optimization, Assertion Based 
Verification, Synthesis, Open Verification Library, Embedded Logic 
 
 
© Zahid Syed, 2008 
  
 Abstract 
 
Performance and Area Optimization for Reliable FPGA-based Shifter Design 
 
by 
Zahid Ali Syed 
Master of Science in Electrical Engineering 
West Virginia University 
Afzel Noore, Ph.D., Chair 
 
This thesis addresses the problem of implementing reliable FPGA-based shifters. An 
FPGA-based design requires optimization between performance and resource utilization, 
and an effective verification methodology to validate design behavior. The FPGA-based 
implementation of a large shifter design is restricted by an I/O resource bottleneck. The 
verification of the design behavior presents a further challenge due to the ‘black-box’ 
nature of FPGAs. To tackle these design challenges, we propose a novel approach to 
implement FPGA-based shifters. The proposed design alleviates the I/O bottleneck while 
significantly reducing the logic resources required. This is achieved with a minimal 
increase in the design delay. The design is seamlessly scalable to a multi-FPGA chip 
setup to improve performance or to implement larger shifters. It is configured using 
assertion checkers for efficient design verification. The assertion-based design is further 
optimized to alleviate the performance degradation caused by the assertion checkers. 
  
 
 
iii 
 
Acknowledgements 
 
My stay at West Virginia University has been a period of constant learning. My teachers, 
friends and family have been a part of my graduate education and I would like to take 
this opportunity to express my gratitude towards them.  
It has been an honor to have had Dr Noore as my mentor and research advisor. I have 
much to thank him for, from his untiring encouragement to his objective criticism. I am 
obliged to him for having tolerated my constant stream of questions, and unscheduled 
interruptions for advice. His support as a friend and guide has made me a better person 
in many aspects of life. 
I would also like to thank Dr. Ammar and Dr. Mooney, my committee members, for their 
suggestions and guidance. 
My thanks go out to my friends for their support and making my stay at WVU 
memorable: Ajit, Ashwin, Harry, Mangu, Nitin, Prasad, Praveen, Verma, Vijith, and the 
rest. You know who you are. 
My family has been a pillar of support throughout my life. My parents and siblings have 
always offered the moral support necessary to perform well and I am indebted to them 
for standing by me in times of need. My wife has been the chief driving force behind the 
completion of this thesis. 
And finally, I thank my cousin and roommate Ahad. Our stay at WVU has been fruitful in 
great part through our mutual support. There is little by way of words that might express 
my gratitude for this.   
 
 
iv 
 
Table of Contents 
 
Acknowledgements.   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   . iii  
List of Figures.  .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .    vi  
List of Tables.    .    .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .    vii  
1 Introduction, Problem Statement and Methodology.    .   .   .   .   .   .   .   .    1 
1.1 FPGAs and FPGA programming.    .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .    1 
1.2 Assertion-based verification.   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .    2 
1.3 Problem statement.   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .    3 
1.4 Thesis organization.  .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .    4 
2 Literature survey.  .   .  .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .    5 
2.1 Shifter design.  .  .   .  .  .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .    5 
2.2 Assertion-based verification.  .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .    6 
3 Field Programmable Gate Arrays.   .  .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .    8 
3.1 Introduction.   .   .   .  .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .    8 
3.2 Uses.   .  .   .  .   .  .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   8 
3.3 Manufacturers.  .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .      9 
3.4 The Spartan-3A series.    .    .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .       10 
3.5 FPGA programming and design approach.   .   .   .   .   .   .   .   .   .   .   .   .   .   .       10 
3.6 HDL modeling.  .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .      12 
4 Shifter Design and FPGA-based Shifter Limitations.  .  .   .   .   .   .   .   .   .    14 
4.1 Common shifter designs.   .   .   .  .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   . 14 
4.2 FPGA-based shifter limitations.  .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .    18 
5 Proposed FPGA-based shifter design.   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .    19 
5.1 Proposed design of an embedded auxiliary shifter.    .    .   .   .   .   .   .   .   .   .  19 
5.2 Sequential-logic level processing.   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .    20 
5.3 Embedded auxiliary shifter and shift-control partitioning.   .   .   .   .   .   .   .    21 
5.4 Combinational-logic level processing.   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .    22 
5.5 Index updating.  .  .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   23 
5.6 Design optimization.  .  .   .  .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .  23 
5.7 Extension to multi-chip FPGA design.  .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .  24 
6 Implementation of Shifter Designs in HDL.   .   .   .   .   .   .   .   .   .   .   .   .   .    26 
6.1 Barrel shifter.  .   .  .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .    26 
6.2 Logarithmic shifter.  .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .    27 
6.3 Shift register.   .   .   .   .  .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .    27 
6.4 Proposed auxiliary shifter design.  .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .  28 
6.4.1 Sequential_logic.   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .    28 
 
 
v 
 
6.4.2 Combinational_logic.  .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   29 
6.5 Experimental results.  .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .    30 
7 The VHDL Open Verification Library.  .  .  .   .   .   .   .   .   .   .   .   .   .   .   .   .   32 
7.1 Controllability.  .  .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .  32 
7.2 Observability.   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .  32 
7.3 Challenges during verification.  .  .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .    33 
7.4 Assertion-based verification .    .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .  34 
7.5 Assertion checker standards. .  .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   35 
7.6 Current assertion standards.  .   .  .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   36 
7.6.1 Open Verification Library (OVL).    .   .   .   .   .   .   .   .   .   .   .   .   .   .  36 
7.6.2 X/Z Check in OVL checkers.  .   .  .   .  .   .   .   .   .   .   .   .   .   .   .   .   .    37 
7.7 Performance characteristics of the VHDL OVL checkers.    .   .   .   .   .   .   .  38 
7.8 Assertions in the proposed shifter design.  .   .   .   .   .   .   .   .   .   .   .   .   .   .    39 
7.9 Estimated resource overhead and performance reduction.  .  .   .   .   .   .   .  41 
8 Alleviating Performance Overhead Due to Assertions.  .   .   .   .   .   .   .  42 
8.1 Partitioning and component removal in Xilinx ISE .   .   .   .   .   .   .   .   .   .  43 
8.1.1 Routing.  .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .  44 
8.1.2 Placement.   .  .   .  .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .    44 
8.1.3 Synthesis.   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   44 
8.1.4 Inherit.   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .  45 
8.2 Using Xilinx Partitions in the design under study.  .   .   .   .   .   .   .   .   .   .  45 
8.3 Experimental results and analysis.   .  .   .  .   .   .   .   .   .   .    .   .   .   .   .   .   .  45 
8.3.1 Assertion-based design.   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .  45 
8.3.2 Re-configured assertion-free design.   .   .  .   .  .   .   .   .   .   .   .   .   .    48 
9 Conclusion and Future Work.   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .  50 
9.1 Conclusion.   .   .   .   .  .   .  .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .    50 
9.2 Future work.   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   50 
References.   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   52 
Appendix.   .   .   .   .   .   .   .   .   .   .  .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .    54 
Appendix A .   .   .   .   .   .   .   .   .  .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   . 54 
A.1 The VHDL OVL library .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   54 
A.1.1 The ovl_ctrl_record type.    .    .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .  54 
A.1.2 Synthesizing the VHDL OVL library.  .   .   .   .   .   .   .   .   .   .   .   .   .    55 
Appendix B.  .   .   .   .   .   .   .   .  .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .  .  57 
 
  
  
 
 
vi 
 
List of Figures 
3.1 EDA Design Flow.   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .    11 
4.1 Design of a 4-bit Barrel Switch.    .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .    14 
4.2 Design and working of a 4-bit barrel shifter.   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .  .  15 
4.3 Design of an 8-bit logarithmic shifter.    .   .    .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .  16 
4.4 Design of an 8-bit logarithmic shifter/rotator.   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .    17 
5.1 Design of a 2048-bit shifter using 128-bit embedded logic.    .    .   .   .   .   .   .   .   .   20 
5.2 Control flow of the combinational logic.    .    .    .   .   .   .   .   .   .   .   .   .   .   .   .   .   .    21 
6.1 HDL design of a barrel shifter.   .   .   .   .    .    .    .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   26 
6.2 HDL Design of a logarithmic shifter.    .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   27 
6.3 HDL hierarchy of the auxiliary shifter.    .    .    .    .    .    .   .   .   .   .   .   .   .   .   .   .   .  28  
 
 
vii 
 
List of Tables 
3.1 Characteristics of the Spartan-3 Series.   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   9 
3.1 Characteristics of the Spartan-3A family.   .   .   .    .   .   .   .   .   .   .   .   .   .   .   .   .   10  
6.1 Comparing performance metrics of the proposed FPGA approach with 
existing shifter designs.   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   . 30 
7.1 Currently available OVL Checkers.   .   .   .   .   .   .   .    .    .    .    .    .   .   .   .   .   .   . 37 
7.2 OVL checker characteristics for the 64-bit embedded logic.    .    .   .   .   .   .   .   .  40 
7.3 OVL checker characteristics for the 128-bit embedded logic.   .   .   .   .   .   .   .   . 40 
7.4 Estimated performance characteristics of an assertion-based shifter design.   . 41 
8.1 Performance characteristics of various assertion-based designs.    .   .   .   .   .   . 47 
8.2 Performance degradation in an assertion-based design compared to an 
assertion-free design.    .    .    .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   . 47 
8.3 Percentage error in estimating the performance metrics of an assertion-based 
design.   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .    . 47 
8.4 Performance gain through assertion removal.   .   .   .   .   .   .   .   .   .   .   .   .   .    . 48 
A.1 Description of the VHDL OVL checkers.   .    .    .    .   .   .   .   .   .   .   .   .   .   .   .   . 55 
 
 
 
1 
 
Chapter 1 
Introduction, Problem Statement and 
Methodology 
 
Data shifting is an important operation of arithmetic processors, and array processors 
used for graphics and video processing applications. The operation is performed through 
shifters implemented using combinational logic or sequential logic. Each design 
approach has specific advantages that are useful in different applications. Barrel shifters 
and logarithmic shifters are combinational-logic based shifters that perform large shifts 
in a single clock cycle and are used in applications where speed is of primary importance.  
Sequential shift registers perform a single-bit shift every clock cycle but utilize fewer 
resources than combinational-logic based shifters. The synchronous design of the shift 
register also eliminates the possibility of the occurrence of a race condition in the circuit. 
1.1 FPGAs and FPGA programming 
Field Programmable Gate Arrays (FPGAs) are soft core processors and can be modified 
on the field unlike hard-wired processors. This makes FPGAs suitable for applications 
where the function of the chip may vary. Thus, the vendor manufactures the chip and the 
buyer configures it as needed. FPGAs are used in digital signal processors, DSP, 
software-defined radio, aerospace and defense systems, ASIC prototyping, medical 
imaging, computer vision, speech recognition, cryptography, bioinformatics, computer 
hardware emulation and a growing range of other areas.  
 
 
2 
 
The logic for an FPGA-based design is commonly described using Hardware 
Description Language (HDL) tools. An HDL is used to formally describe the design and 
operation of electronic circuits, and to verify its operation by means of simulation. 
Examples of HDLs include SystemC, Verilog, VHDL,  and SystemVerilog. 
HDLs are thus used to create a design description for the required logic. The HDL 
design is then simulated and verified for proper functioning. The verification process is 
performed through various verification methodologies. The design is then synthesized 
for implementation on an FPGA using FPGA-design tools. 
1.2 Assertion-based verification 
Verification of a hardware design has become the biggest bottleneck in the time-to-
market of an FPGA design. Due to the increasing complexity of modern integrated 
circuits, verification requires up to 70% of the design time [14]. Any bug that is detected 
and fixed in a design may lead to further complications that have to be detected and fixed 
yet again. Due to this, there is a need for an efficient verification system that detects 
faults early in the design phase. 
Assertion Based Verification (ABV) is a new verification methodology that offers 
several advantages over traditional verification. Assertions are, at the most basic level, 
logical expression checkers that return a binary value. In ABV, assertion checks are 
inserted at strategic points in the code. This enables the design engineer to gain 
additional information on the state of the system during the simulation or verification 
process. Any internal error that occurs in the system is conveyed to the design engineer 
by these assertion monitors. In ABV methodology, the assertion checkers used to 
monitor the system extend beyond simple logical expression checking monitors. For 
example, a checker may be assigned to monitor whether a signal remains stable for a 
 
 
3 
 
specified period of time. Another checker may be used to verify that a consequent 
expression holds only if the antecedent expression holds.  
 The use of assertion checkers, however, leads to a resource overhead and a 
performance lag based on the number and size of the checkers implemented. This 
performance drawback has to be considered when implementing ABV.  
1.3 Problem statement 
An FPGA has limited I/O and logic resources that need to be strategically utilized when 
implementing a design. As the size of an FPGA-based design increases, it reaches a 
design bottleneck due to an I/O pin or a logic resources constraint. An FPGA-based 
shifter is an I/O intensive design. Thus, the I/O limitations of an FPGA restrict the 
design of large FPGA-based shifters. 
This thesis explores the use of a novel algorithm to alleviate this bottleneck. The 
algorithm should be synthesizable into an FPGA-based design. The design should be 
optimized for an FPGA and scalable to a multiple-FPGA chip setup to improve 
performance or to implement a larger shifter.  
The use of assertion checkers within the shifter design is also explored. The thesis 
utilizes assertion checkers from the Open Verification Library (OVL). The performance 
metrics of individual OVL checkers are studied and tabulated in a datasheet for reference 
and further research. 
Furthermore, a novel approach that reduces the overhead caused by assertion 
modules in an FPGA-based design is explored. This approach should have the 
advantages of an assertion based design with performance characteristics comparable to 
a conventional assertion-free design. 
 
 
4 
 
1.4 Thesis organization 
Chapter 2 reviews the research previously undertaken in shifter design and assertion 
based verification. Chapter 3 provides a brief overview of FPGAs and FPGA 
programming. The characteristics of the FPGA family used in this thesis are described 
and an overview of the steps for implementing FPGA-based designs is discussed. The 
various types of hardware modeling flows are then enumerated. 
Chapter 4 provides an understanding of the various shifter designs currently used 
and the design bottlenecks encountered when implementing FPGA-based shifters. 
Chapter 5 proposes a novel shifter design that alleviates FPGA-based design bottlenecks. 
The HDL structure and design of the common shifters and the proposed design are 
elaborated in Chapter 6 and the performance metrics of the implemented HDL designs 
are tabulated. Chapter 7 discusses the challenges during design verification and explains 
the use of the assertion-based verification methodology. The OVL assertion checkers that 
are inserted in the proposed shifter design are described.  Each VHDL OVL checker is 
synthesized individually to obtain a performance datasheet for the checkers. The 
generated data is used to estimate the performance overhead in an assertion-based 
shifter design. Chapter 8 proposes and implements a novel method to alleviate the 
resource overhead and delay due to assertions. The performance of the assertion-based 
design is compared against the performance metrics estimated in Chapter 7. 
 
 
 
5 
 
Chapter 2 
Literature survey 
 
2.1 Shifter design 
The barrel shifter first proposed by Lim in 1972 [1] uses a single stage to shift the input 
data. This results in a complicated design that increases power consumption. A 
logarithmic shifter design alleviates the power dissipation problem [3] and also occupies 
less chip area [2]. Since the shift operation forms an integral component of any 
arithmetic logic unit, researchers have explored several variations of the barrel shifter 
and logarithmic shifter design to suit their application needs.  
Tharakan and Kang [4] presented a variation of a 32-bit logarithmic shifter that 
uses ternary logic. A design that shifts left or right by 1, 3 and 9 bits is proposed as 
opposed to using conventional binary logic that shifts in logarithmic powers of two. This 
combination is used to perform a left or right shift of any magnitude less than 32. The 
resulting CMOS logic design demonstrates a 30% increase in speed but increases the 
required chip area by 17%. 
Yih et al. proposed a variation of the barrel shifter for use in a CORDIC computer 
[5]. The primary objective of the design is to reduce the chip area. A 2-level barrel shifter 
is proposed that processes the data in 2 stages connected in series. The proposed design 
reduces the size of the CORDIC shifter by 68%.  
Beerel et al. present a low power asynchronous barrel shifter design for use in 
multimedia applications [6]. The objective of the design is to reduce its power 
 
 
6 
 
consumption during MPEG video processing. A distribution chart for a sample set of 
video files showed that 85% of the codewords were of length 6 bits or less. The barrel 
shifter was split in two levels to optimize for this codeword length. The shifter utilizes 
only the first level if the codeword length is 6 bits or less. The two levels are used 
simultaneously for a larger codeword. The design results in a 61% reduction in the 
average energy consumed. 
Finally, the design of common shifters is summarized by Pillmeier [7]. The delay 
and chip area characteristics of the designs are then compared. 
2.2 Assertion-based verification 
Assertion Based Verification (ABV) is used in both software and hardware verification 
and has thus been promoted by researchers in both fields. The assertion-based 
methodology for hardware verification was originally developed by Harry Foster [8]. The 
advantages of ABV in reducing the design verification time is discussed in [8] [9]. Meyer 
[10] addresses the use of assertions in Object Oriented Programming (OOP). The paper 
suggests the use of assertion checks at points where the sub-program is called by the 
main program. Failure to comply with the assertion returns the control to another sub-
program that handles such special cases in an assertion checker format. [11] describes 
the use of ABV in System on Chips (SoCs). Each SoC component presents unique 
verification problems. For example, resource allocation requires proper semaphore logic 
to lock resources and prevent conflict. To detect possible conflict scenarios, the 
behavioral assumptions are modeled using assertions. Assertions are further used in 
design interfaces help gather information on the data being transferred. This assists in 
designing systems linked through an interface. 
 
 
7 
 
[12] provides an empirical analysis of assertion use. It proposed the use of assertions 
only at the critical points in the design to create a minimal overhead and speed 
reduction. Riazati et al. have synthesized assertions for a basic CPU design for detection 
of stuck-at faults and presented the overhead resources needed for their implementation  
[13]. The effectiveness of an assertion checker is measured through a performance metric 
that is proportional to the area overhead and fault coverage of the corresponding 
checker. 
 
 
8 
 
Chapter 3 
Field Programmable Gate Arrays 
 
3.1 Introduction 
A field-programmable gate array (FPGA) is a semiconductor device containing 
programmable logic blocks and programmable interconnects. The logic blocks can be 
programmed to perform the function of basic logic gates or more complex combinational 
functions such as decoders or simple mathematical functions. 
FPGAs are usually slower than their Application-Specific Integrated Circuit 
(ASIC) counterparts. They cannot handle as complex a design, and draw more power. 
However, FPGAs have a shorter time to market and the ability to re-program in the field. 
This lowers non-recurring engineering costs. As FPGA size, capabilities and speed have 
increased, it has allowed implementation of larger functions until some FPGAs are now 
marketed as a full System on Chip (SoC) [16]. 
3.2 Applications 
FPGAs are increasingly being used in conventional High Performance Computing 
applications where computational kernels such as FFT or Convolution are performed on 
the FPGA instead of a microprocessor. The use of FPGAs for computing tasks where time 
intensive tasks are offloaded from software to FPGAs is known as reconfigurable 
computing [26]. The current (2007) generation of FPGAs can implement ~100 single 
precision floating point units, with each unit computing a result every clock cycle. The 
 
 
9 
 
flexibility of the FPGA allows for even higher performance by trading precision for an 
increased number of parallel arithmetic units [17]. 
The adoption of FPGAs in high performance computing is currently limited by 
the complexity of FPGA design compared to conventional software and the extremely 
long turn-around times of current design tools. A 4-8 hour wait is necessary after even 
minor changes to the source code of the FPGA. 
3.3 Manufacturers 
The main manufacturers of general purpose FPGAs are Xilinx  [27] and Altera [28]. 
Other manufacturers for special purpose FPGAs include Lattice Semiconductor [29], 
Actel [30], and Atmel [31]. 
Table 3.1: Characteristics of the Spartan-3 Series 
Spartan-3 devices Logic Cells Usage 
Spartan-3 1700 to 75000 
For high density and high pin count 
applications 
Spartan-3A 1500 to 25000 I/O optimized 
Spartan-
3AN 
1500 to 25000 Non volatile memory packaged with a 3A 
Spartan-3E 2000 to 33000 Logic optimized 
Spartan-3A DSP 
37000 to 
54000 
DSP optimized 
 
Xilinx is the leading FPGA manufacturer with device families for glue logic 
(CoolRunner, CoolRunner II), low-cost (Spartan) and high-end (Virtex) applications. 
The Virtex-II, Virtex-4 and Virtex-5 FPGA families are used to design System-on-Chip 
(SoC) as they can be embedded with two IBM PowerPC cores  [26]. The Spartan series 
FPGAs are slower than the corresponding Virtex FPGAs and do not support embedded 
PowerPC cores. However, as a low-cost FPGA family, it offers an economical alternative. 
 
 
10 
 
The current generation of Spartan-3 FPGAs is fabricated using 90nm technology. The 
FPGA families within the series are grouped according to the general use as shown in 
Table 3.1. 
3.4 The Spartan-3A series 
The Spartan-3A series offers low logic densities coupled with higher I/O pin count. This 
configuration is suitable for designing shifters due to similar design requirements. The 
Spartan-3A family is comprised of the FPGAs detailed in Table 3.2. The research work in 
this thesis is performed using the Xilinx Spartan-3A family of FPGAs.  
Table 3.2: Characteristics of the Spartan-3A family  
Device Package I/O pins Slices 
XC3S50A FT256 144 704 
XC3s200A FG320 248 1792 
XC3S400A FG400 311 3584 
XC3S700A FG484 372 5888 
XC3S1400A FG676 502 11264 
 
3.5 FPGA programming and design approach 
The design of an integrated circuit is typically performed using Electronic Design 
Automation (EDA) tools. The general flow of an EDA process is shown in Figure 3.1. 
Initially, the functional objectives of the chip are decided. The architecture of the circuit 
is implemented at Register Transfer Level (RTL). For ASICs and FPGAs, this is 
performed using Hardware Description Languages (HDL). The hardware description is 
then verified for proper behavior and functioning. Any corner cases or behavioral faults 
are eliminated at this stage through functional verification, assertion based verification, 
simulation etc. 
 
 
11 
 
 
The RTL design of the chip is then used to create a floorplan of the design. The 
input/output (I/O) pins are assigned and objects such as arrays and cores are placed. 
The logic synthesis process generates a design netlist from the HDL design to describe 
the connectivity of the design. The netlist is then used to perform a Place-And-Route 
(PAR). The placement process assigns exact locations for design components. An inferior 
placement assignment affects the performance of the design and may make it impossible 
to manufacture the design due to exhaustion of routing resources. The routing process 
adds the wiring to connect the gates in the netlist. The FPGA is further optimized to meet 
timing requirements or eliminate any violations such as noise and yield  [18] [19]. 
Figure 3.1: EDA Design Flow 
FPGA-design tools are of two types: the proprietary software from the FPGA 
manufacturer or third-party software. The third-party software tends to be expensive 
while offering better technical support. The FPGA manufacturer’s proprietary software is 
usually available for free to promote the corresponding products. However, the netlists 
generated from any proprietary software can only be used exclusively with that software. 
Xilinx offers a proprietary EDA tool named Xilinx ISE, while Altera uses its Quartus II 
software. The Quartus II software features include implementation, device fitting, and 
 
 
12 
 
JTAG programming solutions. The Xilinx ISE offers HDL synthesis and simulation in 
addition to the features available in Quartus II. 
Since the thesis research is being performed on the Xilinx Spartan-3A family, the 
proprietary FPGA tools for it are needed. The Xilinx ISE 9.2i Webpack is thus used for 
performing the synthesis and implementation of the HDL designs in the remainder of 
the thesis. 
3.6 HDL modeling 
The three distinct types of modeling used in VHDL are structural modeling, dataflow 
modeling and behavioral modeling [15]. Structural modeling describes the design as a set 
of interconnected components without a description of the component behavior. This 
model thus represents the hierarchy of the design accurately but lacks the description of 
the design behavior. A structural model is implemented through component 
instantiation within the parent entity. The component instances are connected to parent 
components through their ports to form an interconnected design.  
The dataflow modeling system utilizes concurrent signal assignment statements 
to describe the flow of the data. The design structure is not explicitly stated in the 
dataflow model, but can be deduced from the flow of the data through the model. 
The behavioral or functional modeling process represents the system as set of 
sequential functions. This modeling system does not describe the design hie rarchy of the 
system in any way. The behavioral description simply gives the relationship between the 
input and output using functions without any indication of the design structure. 
A typical hardware design utilizes two or more types of modeling descriptions. 
For example, the structural model may be used to describe the hierarchy of the system 
 
 
13 
 
and the behavior of each sub-system is defined using either dataflow or behavioral 
modeling.  
 
 
14 
 
Chapter 4 
Shifter Design and FPGA-based Shifter 
Limitations 
 
4.1 Common shifter designs 
A shift register requires as many clock cycles as the magnitude of the shift to be 
performed. In comparison, a barrel switch [1] operates by shifting the data completely in 
a single clock cycle. This characteristic of the barrel shifter results in high speed 
performance required by high-speed processors and graphic and video processing units. 
The implementation of a barrel shifter in these applications considerably improves the 
speed of the device. 
Figure 4.1: Design of a 4-bit barrel switch 
A basic barrel shifter consists of logic switches arranged as shown in Figure 4.1. 
The input A to the barrel switch is transmitted to the output through the switches. The 
switches are controlled using the shift control X that determine the path of the data 
 
 
15 
 
through the barrel switch. The state of the switches determines the flow of data through 
the design to generate the output B. 
The barrel shifter was first introduced in modern devices in the Intel 80386 
processor in 1986. It is implemented in modern devices through multiplexers as 
illustrated for a 4-bit barrel shifter in Figure 4.2. An n-bit barrel switch utilizes n n-bit 
multiplexers.  
 
Figure 4.2: Design and working of a 4-bit barrel shifter 
Each multiplexer has the same inputs with the arrangement of the input data 
varying by one position for every multiplexer in the array. The shift control determines 
the input bit that appears at the output of each multiplexer. This is illustrated in Figure 
4.2 where the 4-bit input data is rotated by 3 bits. Each multiplexer has the same input 
data A and shift control X. However, the input data to each multiplexer is incrementally 
 
 
16 
 
rotated by a single bit. The output B of the barrel shifter is thus rotated by the desired 
amount. 
A typical barrel shifter thus has an n-bit data input, an n-bit data output, and a 
log2n-bit shift control to determine the magnitude of the shift.  
The design of a barrel shifter results in high speed performance but also presents 
numerous problems. The number of input lines to the barrel shifter design is n2. As the 
design size increases, the complexity and chip area of the design increases exponentially. 
The large number of inputs also causes a loading effect at the input source. These effects 
make the design of large barrel shifters infeasible in real-time applications.  
The drawbacks of the barrel shifter design are addressed in the logarithmic 
shifter design. Unlike a barrel shifter, the log shifter shifts the data in stages. Each stage 
shifts its input by an integer power of two. Thus, an 8-bit log shifter is composed of 3 
stages as shown in Figure 4.3. The stages shift the data by 1, 2 and 4 bits using 2-bit 
multiplexers arrayed in the stages.  
Figure 4.3: Design of an 8-bit logarithmic shifter 
 
 
17 
 
The log shifter design results in reduced power consumption [3] and utilizes less 
chip area [2] compared to a barrel shifter. However, the increased number of stages 
leads to a slight reduction in the speed of the log shifter compared to a barrel shifter [2].  
The log shifter can be modified to include a rotate functionality in the log shifter. 
This is performed by connecting A0 as input to the multiplexer with A7  as input. The 
rotate and shift functionality can thus be combined together as shown in Figure 4.4. 
 
Figure 4.4: Design of an 8-bit logarithmic shifter/rotator 
A single multiplexer selects between '0' when logical right shifting, or An-1  for 
arithmetic right shifting. In the stage controlled by Xk, 2k multiplexers select between ‘0’ 
(or An-1 ) for shifting and the 2k lower bits of the data for rotating [2]. 
 
 
18 
 
4.2 FPGA-based shifter limitations 
An n-bit log shifter/rotator design described in Section 4.1.2 utilizes an n-bit input, a 
log2n-bit shift control, a 2-bit opcode and an n-bit output. The opcode determines the 
type of shift/rotate to be performed. The total number of I/O pins r utilized by the shifter 
when implemented on an FPGA is, 
 r = 2n+log2n+2 (4.1) 
Since a shifter design is I/O intensive, the number of FPGA I/O pins available is 
the common bottleneck for an FPGA-based shifter design. The largest FPGA in the 
Spartan-3A family used in this thesis is the XC3S1400A with 502 I/O pins as described 
in Section 3.4. Thus, the largest shifter that the Spartan-3A family can implement has a 
128-bit input. Using a conventional design, a shifter of larger size cannot be 
implemented on the Spartan-3A FPGAs. To overcome this bottleneck, a novel algorithm 
is proposed for implementing FPGA-based shifters. 
 
 
19 
 
Chapter 5 
Proposed FPGA-based shifter design 
 
5.1 Proposed design of an embedded auxiliary shifter 
In the proposed design, an auxiliary shifter of size s and additional logic are embedded in 
the FPGA. It is used to process and shift the n-bit input. The implementation of the 
embedded auxiliary shifter is a hybrid design using combinational and sequential logic.  
The input data A to be shifted is comprised of n  bits, with the ith bit denoted by Ai, 
. A is partitioned into  data blocks with each block denoted by 
. p denotes the initial index of the block. Each block consists of s bits with the jth bit 
denoted by 10, sjA j
p
. A is partitioned such that, 
 
s
n
jp
j
p AA  (5.1) 
The proposed approach partitions the input based on the block number p and the 
number of data blocks . p determines whether a block is comprised of either even-
referenced bits or odd-referenced bits of the input. Every block thus created is processed 
independently. This is in contrast to using a simple linear approach where,  
 jspj
p AA  (5.2) 
This would cause the number of input bits needed in a block to be dependent on 
the magnitude of the operation to be performed.  
 
 
20 
 
5.2 Sequential-logic level processing 
Figure 5.1 shows the overview of the proposed design for a 2048-bit shifter implemented 
using a 128-bit embedded auxiliary shifter and a 2048-bit shift register that stores the 
data. The blocks are processed in decreasing order of the value of the corresponding p. 
The register first inputs the block  (in this case, A15) to the FPGA where it is 
processed by the embedded auxiliary shifter.  
This generates an updated index q for that specific block. It is  stored along with 
the processed block within the FPGA and the value of p is decremented by 1. The next 
block is then input. This sequence of operations  is iteratively carried out until block A0 is 
processed. The synthesis of the desired shifted output now follows. The block with 
 is output followed by the remaining blocks in decreasing order of the value of 
Figure 5.1: Design of a 2048-bit shifter using 128-bit embedded logic 
 
 
21 
 
the corresponding q. If the maximum delay to process and store a block is d, then the 
total processing time t required for the n-bit input is, 
   (5.3) 
 
5.3 Embedded auxiliary shifter and shift-control 
partitioning 
 
The inputs to the embedded auxiliary shifter are the s-bit data block, a 3-bit opcode O, an 
m-bit primary shift-control X (m = log2n) and An−1, the most-significant bit of the input 
A. While X determines the magnitude of the shift, O specifies the direction (left/right) 
and the type of shift operation (rotate/arithmetic shift/logical shift). In addition to these 
inputs, the shifter uses the initial index p and generates an updated index q as described 
Figure 5.2: Control flow of the combinational logic 
 
 
22 
 
in Section 5.2. A secondary shift control Y maps to X if the direction of shift is right, or 
maps to 2’s complement of X if the direction of shift is left.  
The primary operation performed by the shifter is right shift and right rotate. In a 
right arithmetic shift, the block is appended with An−1 as its most-significant bit and the 
arithmetic shift is performed. In case of a left  shift, the X most significant bits are 
masked with ‘0’. The data is then rotated right by Y. A left rotate is performed by rotating 
right by Y. 
The embedded auxiliary shifter thus uses the shift-control X (or Y) to perform a 
mask (or shift/rotate). However, using an m-bit shift-control could lead to a mask (or 
shift/rotate) greater than the size of the s-bit data block. To correct this, the shift-control 
is partitioned into two independent bitstrings which are then used independently. The 
first bit-string is comprised of the log2s most-significant bits of X (or Y). Its integer value 
is denoted by x1  (or y1 ). The second bit-string is comprised of the remaining m−log2s 
least-significant bits. Its integer value is denoted by x2 (or y2). 
5.4 Combinational-logic level processing 
Figure 5.2 shows the flow of control through the combinational-logic level of the 
embedded auxiliary shifter. The first set of operations carried out on a block is to 
perform bit-masking with zeroes. It is performed only in case of a left shift. As a first 
step, the x1  most-significant bits of the block are masked. The second step involves 
conditional masking of one additional bit. This is performed only if, 
  (5.4) 
The second set of operations involves either right shifting or right rotating. It is 
performed on all blocks irrespective of the type of shift. If the operation is right 
 
 
23 
 
arithmetic shift or right logical shift, then the corresponding operation is performed by 
an amount equal to y1 . For every other operation, the block is  rotated right by y1 . The 
right arithmetic shift for a block is always performed after appending An−1  as its most-
significant bit. 
The second step involves conditional right shift (or right rotate). The  data block is 
right arithmetic/logical shifted (or right rotated) by another bit if p < y2. 
These two sets of operations when performed result in an appropriately  
shifted/rotated block. 
5.5 Index updating 
Before processing each block, q is initialized to the same value as the block’s  
corresponding p.  When performing a conditional right shift/rotate, it’s value  is 
decremented by y2. However, it is always in the range,  
  (5.5) 
If it reaches ‘ ’ during the operation, the count resumes from the highest value, . 
5.6 Design optimization 
To minimize the processing time and the logic resources used, the following  
optimizations are performed: 
1. The first step of both sets of operations described in Section 5.4 masks (or right 
shifts/rotates) the data block by x1  (or y1 ) in a single stage. This is similar to a 
barrel shifter design. These steps may be performed in a series of smaller stages 
so that the total mask (or right shift/rotate) performed is equal to x1  (or y1 ). This 
 
 
24 
 
reduces the complexity of the design, and consequently the FPGA slices used. The 
variation used in this research is similar to a logarithmic shifter where a series of 
stages shift data in integer powers of two. To further reduce the delay in a 
logarithmic design, the stages are arranged in decreasing order based on the 
magnitude of shift performed in each stage. Experimentally, this arrangement is 
generally faster with up to 37.8% lower processing time and 5.3% fewer slice 
usage compared to a design with progressively increasing order of shifts in each 
stage. 
2. The updated index q changes value when a block is processed at the 
combinational-logic level. To attain the necessary speed, an embedded ROM is 
used to determine the updated value of q. 
3. The total FPGA pins r used by an embedded auxiliary shifter of size s is, 
 r = 2s + log2s + 4 (5.6) 
When using an FPGA with f I/O pins, s is optimized to maximize the amount of 
data transferred and thus minimize the time taken to process the entire data. The 
ratio  denotes the percentage of I/O pin utilization.  
4. The performance in speed of the proposed shifter design is improved when the 
size of the embedded logic is increased.  
5. The number of I/O pins and slices utilized by the proposed shifter design is 
reduced when the size of the embedded logic is decreased.  
5.7 Extension to multi-chip FPGA design 
The proposed design alleviates the I/O resource bottleneck for an FPGA shifter design. 
However, as the ratio n : s increases, the total processing time t also increases and may 
exceed the design constraints. To reduce the delay, a multi-chip FPGA can be used where 
each chip processes a consecutive set of blocks. This allows multiple blocks to be 
 
 
25 
 
processed simultaneously. When all blocks have been shifted, the desired output is 
synthesized in the same manner as described for a single FPGA design.  
 
 
26 
 
Chapter 6 
Implementation of Shifter Designs in HDL 
The performance of the proposed design is quantitatively compared against the 
conventional shifter designs: the barrel shifter, the logarithmic shifter and the shift 
register. This is performed by implementing the VHDL description of the shifters on an 
FPGA. The I/O and logic resources used and the performance lag of the four designs are 
then used to compare the performance. The following section describes the HDL 
structure and behavior of the shifter designs.  
6.1 Barrel shifter 
The barrel shifter performs the shift using a single stage. This behavior is modeled 
through functional modeling in VHDL as illustrated in Figure 6.1. 
 
Figure 6.1: HDL design of a barrel shifter 
A single function calculates the output based on the input data, the shift 
magnitude and the opcode. VHDL supports logical operations such as arithmetic shift 
(sra, sla), logical shift (srl, sll) and rotate (ror, rol) through specific operators. These are 
used to create a simple relation between the input and output of the barrel shifter. The 
resulting function calculates the output through a single concurrent signal assignment 
statement. 
 
 
27 
 
6.2 Logarithmic shifter 
The HDL description of a log shifter is designed in stages resembling its design layout 
shown in Figure 6.2. The design uses a mixed modeling system. It utilizes a dataflow 
modeling system with an underlying functional model. The functional layer models each 
stage of the log shifter while the dataflow layer simulates the arrangement of the stages 
to perform the shift/rotate. A concurrent signal assignment statement performs the 
function of the corresponding stage. These statements are connected in series to form 
the shifter. The stages are arranged in increasing order based on the magnitude of shift 
performed in each stage. 
 
Figure 6.2: HDL Design of a logarithmic shifter 
6.3 Shift register 
A sequential-logic based shift register utilizes a clock to begin a shift cycle at the rising 
edge of the clock. With each clock cycle, a shift/rotate by one bit is performed based on 
the opcode and shift control.  
The HDL design of the shift register utilizes functional modeling. The shift 
process is described within a process statement with the clock signal included in the 
sensitivity list. A case statement performs the shift/rotate operation based on the opcode 
value at each rising edge of the clock signal. 
 
 
 
28 
 
6.4 Proposed auxiliary shifter design 
The proposed design utilizes sequential logic and combinational logic as described in 
Chapter 5. The combinational logic is encapsulated within the sequential logic. This 
hierarchy is preserved when implementing the design in VHDL. Figure 6.3 illustrates the 
HDL structural description of the proposed shifter. The sequential logic of the shifter is 
described by an entity sequential_logic. This entity controls the data processing 
performed by the combinational logic components: the built-in ROM and the log shifter. 
These two combinational logic components are bound by a single entity named 
combinational_logic. 
6.4.1 Sequential_logic 
This is the top-level entity defined for the shifter. Its inputs are the input block data, shift 
control, opcode, lastbit, a clock signal, a reset pin and the output block. This design 
utilizes a clock signal to input a block and store the processed output of 
Figure 6.3: HDL hierarchy of the auxiliary shifter 
 
 
29 
 
combinational_logic.  The reset bit is set to active high to reset the shift process. When 
set to active low again, the shift process starts with the first block input to 
combinational_logic. The processed output is stored and the next block is then input at 
the rising edge of the next clock cycle. The remaining process proceeds as described in 
Section 5.2. 
6.4.2 Combinational_logic 
The combinational_logic entity receives the input from sequential_logic, shifts the data 
by the requisite amount and outputs it to the memory array.  
The log shifter within combinational_logic has an identical HDL structure and the 
same functionality as the log shifter in Section 6.2. However, its stages are arranged in 
the descending order of the magnitude of shift performed in each stage. This 
arrangement is faster compared to an increasing order of stages. The ROM component of 
combinational_logic is comprised of the following entities:  
1. Next_block_index: Returns an integer value to combinational_logic. It 
calculates the current index of the next block that is to be input to the shifter. 
2. Mask_most_sig: When performing a left shift, the most significant bits of the 
input are masked first. The magnitude of this mask is the integer value of the y 1 
most significant bits of amount.  This value is calculated by the mask_most_sig 
entity and returned to combinational_logic. 
3. Least_sig_value: This entity calculates the integer value of the y2 least significant 
bits and returns it to combinational_logic during a conditional mask operation. 
4. Shift_rotate_value: This entity calculates the integer value of , and returns it 
to combinational_logic during a conditional shift/rotate operation.  
5. Updated_index_u: This entity calculates this updated index u of the block. 
 
 
30 
 
6.5 Experimental results 
The Xilinx Spartan-3A family of FPGAs [8] provides a cost-effective FPGA design 
solution. The research in this section uses two types of FPGAs from the Spartan-3A 
family. The XC3S200A has 248 I/O pins and 1792 logic slices and the XC3S400A has 311 
I/O pins and 3584 logic slices. To evaluate the performance of the proposed shifter 
design, its delay, I/O, and logic resources utilization are compared with a shift register, a 
barrel shifter, and a logarithmic shifter design. 
Table 6.1: Comparing performance metrics of the proposed 
FPGA approach with existing shifter designs  
Shifter 
Design 
Shifter 
Size 
(bits) 
FPGA 
Type 
I/O 
utilization 
r/f% 
Total 
slices 
used 
% 
Total 
processing 
delay t 
(ns) 
Shift Register 128 400A 86% 2% 1083.9 
Barrel shifter 128 400A 86% 69% 10.63 
Log shifter 128 400A 86% 45% 9.94 
Proposed 
design: 64-bit 
auxiliary 
shifter 
128 200A 56% 44% 11.52 
256 200A 56% 44% 23.36 
512 200A 57% 45% 53.98 
1024 200A 57% 49% 98.88 
Proposed 
design: 128-bit 
auxiliary 
shifter 
256 400A 86% 50% 15.17 
512 400A 86% 49% 34.53 
1024 400A 87% 50% 61.52 
2048 400A 87% 51% 138.69 
 
Table 6.1 summarizes the experimental results. Using the 400A FPGA, the largest 
conventional shifter that can be designed is of 128 bits. However,  with the proposed 
design, a shifter of 2048 bits is easily implemented using the same FPGA with an 
increase of only 1% in the I/O pin utilization.  The design complexity can be inferred 
through the FPGA slices utilized.  
The regular shift register uses only 2% of the FPGA slices available and the barrel 
shifter uses the most slices at 69%. In contrast, the proposed shifter design uses a 
 
 
31 
 
maximum of 51% slices when implementing a 2048-bit shifter. Table 6.1 also compares 
the performance of various shifter designs and the effect of varying the size of the 
embedded logic. Among the 128-bit shifters, the shift register is the slowest with a delay 
of 1083.9 ns due to sequential design, while the log shifter is the fastest with a delay of 
9.94 ns. The delay for the proposed design is a modest 11.52 ns when using a 64-bit 
embedded shifter. 
When the size of the embedded logic is increased from 64-bits to 128-bits, the 
performance of the proposed design improves by at least 35% for 256, 512 and 1024-bit 
shifter designs. The FPGA slices used, however, increase by an  average of 115%. The 
results show that with the proposed design approach, larger shifters can be implemented 
without any I/O bottleneck. The design  is optimized for improved performance and 
minimal area overhead. 
 
 
32 
 
Chapter 7 
The VHDL Open Verification Library 
Any hardware design implemented in HDL needs to be verified for the desired functional 
behavior before the implementation process proceeds. The drawbacks of an improper 
verification process include behavioral faults in the design and in extreme cases, design 
failure. The verification process may comprise simulation, formal verification, assertion 
based verification etc., or a combination of various methodologies. The ease of 
verification of any design is primarily dependent on two factors: controllability and 
observability. 
7.1 Controllability 
Controllability refers to the ability to influence an embedded finite state-machine, 
structure, or specific line of code within the design by stimulating v arious input ports.  In 
theory, a simulation testbench has high controllability of the design model’s input ports 
during verification. However, it can have low controllability of an internal structure 
within the model. 
7.2 Observability 
Observability refers to the ability to observe the effects of a specific internal finite state-
machine, structure, or stimulated line of code. Thus, a testbench generally has limited 
observability if it only observes the external ports of the design model. This is because 
the internal signals and structures are invisible to the testbench. To identify a design 
error using the testbench approach, the following conditions must hold: 
 
 
33 
 
1. The testbench must generate proper input stimulus to activate a  bug. 
2. The testbench must generate proper input stimulus to propagate all effects 
resulting from the bug to an output port [20]. 
7.3 Challenges during verification 
The design complexity of current ASICs and System-on-Chips has increased at a rapid 
rate leading to a significant increase in the time taken to verify the design. While 
synthesis tools have increased the productivity of design engineers, the same 
breakthrough has not occurred in the verif ication field. Due to this, verification of 
current designs poses a significant challenge to design engineers . In today’s designs, 
verification makes up to 70% of the design development time [14]. 
Consider a 32-bit comparator. The number of input data combinations for the 
comparator is 264. To test for a worst-case single-bit bug, every input combination needs 
to be used as a test vector. Using a one-million cycle-per-second simulator would take 
approximately 600,000 years to check for all possible single-bug errors [21]. A complex 
design would take an even more impractical amount of time for complete verification. 
The verification problem is rooted in the traditional approach to verification: A 
set of input test vectors is used in a design simulation to monitor the actual output 
against the desired output. The test bench is designed to take account of all corner cases 
that may possibly occur in the design operation. Thus, the test bench needs to utilize a 
very large number of test vectors  to verify any design. For large designs, the number of 
test vectors needed becomes impractically large. A simple bug may require an enormous 
number of test vectors to stimulate it. This results in low controllability.  
Another problem facing the traditional verification approach is when internal 
errors do not propagate to the output making the errors unobservable. Even if the error 
 
 
34 
 
is deduced, it takes a large amount of time to generate and use the test vectors necessary 
to pinpoint the cause and location of the error. 
Finally, when using formal verification processes, there are points within a design 
that cannot be stimulated using test vectors. These verification ‘hotspots’ thus cannot be 
detected with test vectors. Failure to recognize them may render the design incapable of 
performing its intended task [22]. 
7.4 Assertion-based verification  
Assertion-based verification (ABV) is used to reduce the time taken for design 
verification. It utilizes assertion modules that are inserted at strategic points in the 
design to maximize their effectiveness. Assertions can be used either as targets for formal 
proof or as constraints. When used as constraints, assertions define the legal input 
behavior for the design under verification. Formal-verification tools then exhaust all 
possible inputs that satisfy the constraints during the process, thereby verifying 
assertions. 
ABV thus increases the observability of the design by allowing the verification 
engineer to monitor the internal signal propagation of the design. Using ABV, the 
internal components of the design can be verified individually. This leads to increased 
design observability and an enormous reduction in the number of test vectors needed to 
verify the complete design. Thus, ABV has several advantages over other verification 
methodologies including, 
1. Increasing observability of bugs. 
2. Increasing controllability. 
3. Finding bugs earlier in the design cycle.  
 
 
35 
 
4. Uncovering bugs that would have otherwise remained undetected.  
5. Preventing wasted simulation cycles. 
6. Improving verification productivity.  
7. Facilitating the integration of work from multiple designers.  
8. Supporting design reuse and third part Intellectual Property. 
Since the assertions are included as the design is created, formal verification can be 
started at an early stage without waiting for the completion of the design. Assertions can 
also be used at the design boundaries to provide checks for interface behavior.  
7.5 Assertion checker standards 
VHDL and Verilog allow the construction of assertion modules through the use of the 
assertion statement. This statement can be used to check boolean functions in basic 
monitoring scenarios. However, when assertions are required to monitor more complex 
tasks such as signal processing, data transfer, common modes of usage or operation and 
reporting, the built-in assertion statement falls short. A standardized set of assertion 
modules addresses this problem. Furthermore, a standard set of rules increases the 
portability of the design. This allows ease of use and transparency in the design. 
Assertion modules can then be structured so that they do not interfere with the 
functioning of the design or the synthesis results. 
A standardized set of assertions also reduces the need for creation of assertion 
modules each time a new design is created. A library of such modules allows the user to 
simply instantiate a module where needed. Additional mechanisms such as controlling 
the error reporting parameters, the number of errors before failure, or 
 
 
36 
 
enabling/disabling a particular type of assertion module can all be implemented using 
this standardized set. Thus, using a standardized set of assertions has a noticeable 
impact on the design creation time. Users can further have confidence that once these 
assertions are inserted into a design, they can be used by many tools and are portable for 
design re-use [22]. 
7.6 Current assertion standards 
The three standards relevant to ABV are managed by Accellera [23], the organization 
responsible for driving these standards. They are, 
1. The Open Verification Library (OVL) 
2. Property Specification Language (PSL) 
3. System Verilog Assertions (SVA) 
7.6.1 Open Verification Library (OVL) 
OVL is the only existing assertion-specification standard in Accellera that currently 
works with any IEEE-1364 (Verilog) and IEEE-1076 (VHDL) compliant simulator. It also 
works with a growing number of formal-verification tools. 
The OVL is a freely downloadable open source library available in two versions. 
One version contains Verilog modules, and the other version contains VHDL modules. 
These modules are used to specify properties of an HDL design to be verified, either in 
simulation or using formal or semi-formal methods. The modules are instantiated as 
assertion monitors that will flag violations of the specified property. These modules 
provide a standard interface for multiple design verification tools, thus enabling a 
seamless flow. The OVL defines 31 checkers listed in Table 7.1 [24]. 
 
 
37 
 
Table 7.1: Currently available OVL Checkers 
assert_always assert_implication assert_quiescent_state 
assert_always_on_edge assert_increment assert_range 
assert_cycle_sequence assert_never assert_time 
assert_decrement assert_next assert_transition 
asset_delta assert_underflow assert_unchange 
asset_ever_parity assert_odd_parity assert_width 
assert_fifo_index assert_one_cold assert_win_change 
assert_frame assert_one_hot assert_win_unchange 
assert_handshake assert_proposition assert_window 
assert_no_overflow assert_no_underflow assert_zero_one_hot 
assert_no_transition     
 
Every assertion library definition contains a severity_level, options, and a 
message. As an example, the assert_always  assertion continuously monitors the 
test_expr at every positive edge of the triggering event or clock clk. It contends that the 
specified test_expr will always evaluate TRUE. If test_expr evaluates to FALSE, the 
assertion will fire (that is, an error condition has been detected in the code). 
The OVL checkers available in the OVL VHDL library and the procedure for the 
checker configuration are listed in Appendix A. 
7.6.2 X/Z Check in OVL checkers 
Assertion checkers can produce an indeterminate result if a checker port value contains 
an X (unknown) or Z (high-impedance) bit when the checker samples the port. To assure 
determinate results, assertion checkers have special assertions for X/Z checks. 
 
 
38 
 
The ovl_never_unknown and ovl_never_unknown_async assertion checker types are 
specifically designed to verify that their associated expressions have known and driven 
values. Thus, they perform an explicit X/Z check. 
All other assertion checker types have implicit X/Z checks. These are assertions that 
ensure that specific checker ports have known and driven values. 
7.7 Performance characteristics of the VHDL OVL 
checkers 
 
Each OVL checker utilizes logic resources and introduces a delay when synthesized in a 
design. This insertion of checkers thus produces a noticeable impact on the design 
performance. The performance impact due to the VHDL OVL checkers is studied in the 
following section. The datasheet generated serves as a reference and is used for further 
research. When the performance impact due to the insertion of checkers needs to be 
studied, the reference data assists in estimating the resources used by the inserted 
checkers. 
The study is performed by synthesizing and simulating each VHDL OVL checker 
on an FPGA in isolation. A single type of assertion checker is implemented for checking 
test data of widths ranging from 1 bit to 128 bits. The testbench consists of a number of 
checker instances connected in series to calculate the average performance metrics. The 
test data is passed through this series such that it is checked in each assertion module 
and then passed through to the next assertion module in series. The resulting data 
represents the average performance metrics for that assertion checker. The tests were 
performed for 4, 8 and 16 assertion checkers of each type connected in series. This 
arrangement was used to reduce the skewing of the calculations and minimize the 
 
 
39 
 
influence of other external factors. The generated datasheet for the VHDL OVL checkers 
is tabulated in Appendix B. 
7.8 Assertions in the proposed shifter design 
To improve its robustness and observability, assertion modules are inserted into the 
proposed shifter design. The primary function of the assertion modules is to verify the 
integrity of the data passing through the shifter and to perform X/Z checks. This verifies 
that the data does not contain an X (unknown) or Z (high-impedance) value. Since the 
embedded logic is primarily a combinational-logic based (memory-less) design, there is 
little scope for insertion of assertion modules that check memory-based signals. 
The shifter design utilizes three types of OVL assertion modules: 
1. ovl_never_unknown: This assertion module is used as a constraint in the design. 
It performs an explicit X/Z check to constrain the inputs and outputs of the 
design. The design uses an ovl_never_unknown checker each for the input A, the 
shift control X, the opcode O and the output B. The size of the checker varies 
according to the size of the corresponding test expression. 
2. ovl_next: The most-significant bit of the input data, An-1, is input to the shifter in 
every clock cycle of the shift process. The data on this pin remains constant for  
the complete shift cycle. This stability of An-1 is checked using the ovl_next 
module. 
3. ovl_range: The value of the updated index q is in the range, 
 
10
s
n
q
. (7.1) 
q is calculated and used internally and thus cannot be measured through an 
output port. To check for any error related to the components calculating q, an 
 
 
40 
 
ovl_range module is assigned to monitor q. Any X/Z error or violation of the allowed 
range is reported by this assertion module. The ovl_range module can only be used 
when the test expression has a length of 2 bits or more. This checker is thus used only 
when the number of blocks to be processed is greater than 2. In the design under test, 
this checker cannot be used when implementing a 128-bit shifter using a 64-bit 
embedded logic or a 256-bit shifter using a 128-bit embedded logic. 
The size and location of the assertion modules used in the 64-bit embedded logic 
and the 128-bit embedded logic are given in Table 7.2 and 7.3 respectively. 
Table 7.2: OVL checker characteristics for the 64-bit embedded logic 
Effective 
shifter 
size 
Test data width (bits) for assertion checkers used at specific ports 
ovl_never_unknown ovl_range ovl_next 
Input A 
Output 
B 
Shift 
Control X 
Opcode O 
Updated 
index q 
Last bit 
A (n-1) 
128 64 64 7 3 0 1 
256 64 64 8 3 2 1 
512 64 64 9 3 3 1 
1024 64 64 10 3 4 1 
 
Table 7.3: OVL checker characteristics for the 128-bit embedded logic 
Effective 
shifter 
size 
Test data width (bits) for assertion checkers used at specific ports 
ovl_never_unknown ovl_range ovl_next 
Input A 
Output 
B 
Shift 
Control X 
Opcode O 
Updated 
index q 
Last bit 
A (n-1) 
256 128 128 8 3 0 1 
512 128 128 9 3 2 1 
1024 128 128 10 3 3 1 
2048 128 128 11 3 4 1 
 
  
 
 
41 
 
7.9 Estimated resource overhead and performance 
reduction 
 
Using the data reference sheet in Appendix B, the performance of the assertion-based 
design can be estimated. The resource overhead and the delay due to the assertion 
checkers are added to the corresponding performance data in Section 6.5 for the 
proposed design. The estimated performance of the assertion-based design is given in 
Table 7.4. These estimations are cross-checked against the actual performance of the 
design when it is implemented. 
Table 7.4: Estimated performance characteristics of an assertion-
based shifter design 
Shifter 
Design 
Shifter 
size 
(bits) 
FPGA 
Type 
I/O 
utilization 
r/f% 
Estd. 
Slices 
Used 
Estd. 
processing 
delay (ns) 
Proposed 
design: 
64-bit 
auxiliary 
shifter 
128 200A 62% 51.3% 13.35 
256 200A 64% 51.5% 27.77 
512 200A 64% 52.6% 63.71 
1024 200A 65% 56.7% 120.18 
Proposed 
design: 
128-bit 
auxiliary 
shifter 
256 400A 91% 57.1% 17.42 
512 400A 92% 56.2% 39.79 
1024 400A 93% 57.2% 72.96 
2048 400A 93% 58.3% 163.40 
 
 
 
42 
 
Chapter 8 
Alleviating Performance Overhead Due to 
Assertions 
 
The insertion of the assertion checkers increases the design transparency and 
controllability during the simulation phase. However, it also leads to a logic resource 
overhead and an increase in delay. To alleviate the performance impact, a novel 
approach to assertion-based FPGA design is proposed. 
The shifter design is implemented with the required assertion modules. However, 
the assertion modules are partitioned into a separate logical block on the FPGA 
floorplan. The test data is routed to the assertion modules’ block where it is checked for 
the corresponding constraints and errors.  
The proposed approach permits the creation of the shifter design with a 
placement similar to a non assertion-based approach. The proposed design still uses 
assertion modules and thus allows the use of ABV to test it. When the verification 
process is complete, the FPGA is reconfigured to remove the assertion modules. 
However, the reconfiguration is performed without changing the placement of the shifter 
design. The FPGA is reconfigured only to remove the logical block containing the 
assertion modules. This approach results in a minimal reconfiguration runtime.  
The proposed design, however, possesses certain characteristics:  
1. When the design is first implemented with the assertion modules’ block, the total 
design delay is higher compared to a conventional assertion-based design. The 
 
 
43 
 
partitioning of the assertion modules  increases the data path that the test data 
travels, thus increasing the delay. 
2. The placement of the shifter design is relatively inefficient compared to a non-
assertion based shifter design. This is due to the addition of the assertion 
modules in the design. This increases the design delay.  
3. The runtime required for reconfiguration is lower compared to the initial design 
placement and timing process. This is because the design placement of the shifter 
is preserved when the assertion modules are removed.  
8.1 Partitioning and component removal in Xilinx 
ISE [25] 
 
Partitions in Xilinx ISE help optimize the implementation process by preserving 
unchanged implemented portions of the design. If the HDL, timing, physical constraints 
and implementation options of partition are unchanged, the implementation tools will 
use a “copy-and-paste” process to guarantee that the implementation data for that 
Partition is preserved. 
The preservation level for a Partition can be set to one of the following:  
 Routing 
 Placement 
 Synthesis 
 Inherit 
The preservation level can be changed from the default level of “routing” for each 
partition. The amount of design data preserved will decrease as the preservation level is 
changed from “routing” to “placement” to “synthesis”.  
 
 
44 
 
8.1.1 Routing 
“Routing” preserves the data of the partition through routing. This level of preservation 
gives the implementation tools the least amount of flexibility to meet the timing or 
implementation objectives, but provides the highest degree of preservation. The 
following implementation data is always preserved at the “routing” preservation level: 
1. Synthesized netlist and information 
2. Placement information 
3. Routing information 
 
8.1.2 Placement 
“Placement” preserves the data of the partition through placement. The following 
implementation data is always preserved for the “placement” preservation level: 
1. Synthesized netlist and information 
2. Placement information 
3. Some Routing information, possibly all routing information 
 
8.1.3 Synthesis 
“Synthesis” only preserves the synthesis netlist of the design. The following 
implementation data is always preserved at the “synthesis” preservation level:  
 Synthesized netlist and information 
 Some placement information, possibly all placement information 
 Some routing information, possibly all routing information. 
 
 
45 
 
8.1.4 Inherit 
The “inherit” preservation level sets the “preserve” attribute to the same level as the 
parent partition. The default setting for all child partitions is “inherit”. 
8.2 Using Xilinx Partitions in the design under study 
The proposed approach partitions the assertion-based shifter design into two logical 
blocks. The shifter design is placed in one block while the inserted assertion modules are 
partitioned into the other block. The partitioning process is performed at the HDL level. 
The shifter design and the assertion modules are defined in separate entities and are 
interfaced through the test data that is to be monitored by the assertion checkers. This 
test data is passed to the assertion modules by the shifter design.  
This HDL design is implemented on the FPGA. Following this, the preservation 
level of the shifter design is set to “placement” and the assertion modules removed. The 
“placement” preservation level preserves the placement of the shifter design when 
reconfiguring the FPGA. The performance characteristics of the reconfigured shifter 
design are then generated. 
8.3 Experimental results and analysis 
8.3.1 Assertion-based design 
The performance characteristics of the design described in Section 8.2 are tabulated in 
Table 8.1. Table 8.2 compares the percentage increase in the resource overhead and 
design delay of the proposed assertion-based design over the assertion free design 
implemented in Section 6.5. The results show that the assertion-based design has an 
average increase of 14.8% in the number of slices utilized. Table 8.3 further indicates 
 
 
46 
 
that the actual overhead slice usage of the design is an average of 7.5% higher than the 
estimated increase calculated in Section 7.9. 
The design delay in the assertion-based design also increases by an average of 
40% while the estimate of the average delay increase is 18.3% lower than the actual 
design delay. 
The variation in the performance from the mathematical estimation can be attributed 
a number of factors including, 
1. The reference datasheet utilizes identical assertion checkers arranged in series on 
a single FPGA. The checkers implemented are instances of the same HDL design. 
This may cause sharing of resources during implementation to optimize the 
design. This leads to a reduction in the number of slices actually used. The 
assertion modules implemented in the assertion-based design are of different 
types and sizes. The resource sharing ability of th is design is reduced in this case. 
This leads to the skewing of the data. 
2. The proposed design partitions the assertion modules into a separate logic block. 
Due to this, the data to be checked travels a longer datapath to the assertion 
checker. Each test data to be monitored utilizes a different datapath. The 
comparatively inefficient routing of the signals leads to an increase in the longest 
path delay of the design, thus increasing the design delay.  
3. As the size of the design increases, the placement efficiency of the assertion 
modules decreases. This further degrades the performance. 
 
 
 
 
 
 
47 
 
Table 8.1: Performance characteristics of various assertion-based designs 
Shifter 
Design 
Shifter 
size 
(bits) 
FPGA 
Type 
I/O 
utilization 
r/f% 
Slices 
Used 
Total 
processing 
delay (ns) 
Configuration 
Time (s) 
Proposed 
design: 
64-bit 
auxiliary 
shifter 
128 200A 62% 61% 16.21 183 
256 200A 64% 66% 33.81 231 
512 200A 64% 67% 73.22 280 
1024 400A 65% 67% 152.96 379 
Proposed 
design: 
128-bit 
auxiliary 
shifter 
256 400A 91% 59% 21.47 212 
512 400A 92% 59% 41.87 511 
1024 400A 93% 60% 90.77 627 
2048 400A 93% 61% 180.16 1235 
 
 
Table 8.2: Performance degradation in an assertion-based 
design compared to an assertion-free design 
 
Shifter 
Design 
Shifter 
size 
(bits) 
I/O 
utilization 
r/f% 
Slices 
Used 
Total 
processing 
delay (ns) 
Proposed 
design: 
64-bit 
auxiliary 
shifter 
128 6% 17% 41% 
256 8% 22% 45% 
512 7% 22% 36% 
1024 8% 18% 55% 
Proposed 
design: 
128-bit 
auxiliary 
shifter 
256 5% 9% 42% 
512 6% 10% 21% 
1024 6% 10% 48% 
2048 6% 10% 30% 
 
 
Table 8.3: Percentage error in estimating the 
performance metrics of an assertion-based design 
Shifter 
Design 
Shifter 
size 
(bits) 
Slices 
Used 
Total 
processing 
delay (ns) 
Proposed 
design: 
64-bit 
auxiliary 
shifter 
128 10% 21% 
256 14% 22% 
512 14% 15% 
1024 11% 27% 
Proposed 
design: 
128-bit 
auxiliary 
shifter 
256 2% 23% 
512 3% 5% 
1024 3% 24% 
2048 3% 10% 
 
 
48 
 
 
8.3.2 Re-configured assertion-free design 
The assertion checkers can be removed from the design once the verification process is 
completed. This is performed by reconfiguring the FPGA to remove the assertion 
modules while preserving the placement of the shifter design. The performance gains 
from the reconfiguration are shown in Table 8.4. 
The removal of the assertion modules results in a constant reduction of 8% in the 
slices used for the shifters implemented using 64-bit and 128-bit embedded logic. The 
reconfiguration also leads to a 32% reduction in the total processing delay. Table 8.4 also 
shows the reconfiguration runtime required to remove the assertion checkers and 
reconfigure the FPGA. The use of Partitions and the “Placement” preservation level for 
the embedded shifter limits the average reconfiguration runtime to a modest 31% of the 
initial configuration time. 
The reconfigured design is slightly less efficient and slower than the assertion-
free design implemented in Section 6.5. However, the use of assertions significantly 
decreases the time required for verification. This ultimately leads to a shorter time to 
market of the design. 
Table 8.4: Performance gain through assertion removal  
 
 
Shifter 
Design 
Shifter 
size 
(bits) 
Slices 
Reduction 
% 
Processing 
delay 
reduction% 
Reconfiguration 
time % 
Proposed 
design: 
64-bit 
auxiliary 
shifter 
128 8% 17.4% 30.6% 
256 8% 24.0% 31.2% 
512 8% 22.2% 32.9% 
1024 8% 30.5% 31.1% 
Proposed 
design: 
128-bit 
auxiliary 
shifter 
256 7% 24.7% 32.1% 
512 8% 13.8% 31.3% 
1024 8% 32.2% 33.0% 
2048 9% 17.9% 28.7% 
 
 
49 
 
The choice of approach for the design implementation is based on the design 
requirements. A design that has short time-to-market can be implemented using 
assertion-based design. This reduces the time needed for design verification. However, 
when the design performance is the primary constraint, the use of other verification 
methodologies may be preferable. 
 
 
50 
 
Chapter 9 
Conclusion and Future Work 
9.1 Conclusion 
Conventional shifter designs implemented on an FPGA are limited by its size due to I/O pin 
constraints. A novel approach has been proposed in this thesis that alleviates this major 
bottleneck and allows implementation of large shifters. It uses combinational and sequential 
hybrid design with varying complexities of the embedded logic. The algorithm has been 
translated into an HDL design which has been optimized for use on FPGAs. The design is 
implemented on the Spartan-3A FPGAs. The proposed design is suitable for real-time 
applications due to improved performance, low chip area utilization, and fewer I/O pins. The 
proposed approach can be seamlessly extended for multi-chip FPGA designs. 
The use of OVL assertion checkers in the design is proposed to allow ease of verification. 
Assertion checkers are placed at strategic points in the design and used as constraints and 
improve transparency of the design. The use of these assertion checkers leads to an increase in 
the logic resources utilized and the total delay of the design. These problems are addressed 
through the use of a novel approach to assertion-based design. The proposed approach reduces 
the resources utilized and the performance delay. The low reconfiguration runtime of the 
approach allows its implementation in practical scenarios. 
9.2 Future work 
The objective of the proposed design was to alleviate the I/O bottleneck of FPGA-based design 
with a minimal degradation in performance. Further research work on this topic would focus on 
 
 
51 
 
the modification of the design to improve the design performance. Improving the HDL design of 
the proposed approach would be the first step to this optimization. The largest logical block in 
the HDL design is the ROM implemented in the combinational logic of the shifter. An optimized 
version of this design would implement this logical block in a more efficient manner.  
The efficiency of the placement and routing of any design is dependent on the skill of the  
designer. Apart from the use of a faster FPGA, a hardware designer skilled in placement 
methodologies could implement the design with more favorable performance results. The use of 
a newer software versions and more efficient placement algorithms can lead to a better 
performance of the design. 
 Assertion-based verification is still in the early stages of development. While the use of 
assertions is not a recent idea, the practical use of assertions in hardware verification to improve 
productivity has gained importance only in recent years. The creation of assertion standards has 
accelerated the use of assertions amongst the verification community.  
The thesis discussed the placement of assertion checkers in the design and the resource 
and performance overhead due to the insertion of assertion checkers. The next logical step to 
further the research is to document the verification process of the design through a combination 
of formal verification and assertion based verification. The formal verification of the design 
requires the development of a mathematical process to formally verify the design.  
The assertions introduced in the design act as constraints or provide a monitor to 
observe the functioning of an internal component of the design. The number of assertions used 
is dependent on the final objective of the exercise. The use of a large number of assertions 
increases the transparency of the design, but leads to a large number of outputs. The design 
would then require an I/O optimization for the use of these assertions. 
 
 
52 
 
References 
[1] R. S. Lim, “A Barrel Switch Design,” Computer Design, pp. 76-78, August 1972. 
[2] M. R. Pillmeier, “Barrel Shifter Design, Optimization, and Analysis,” Master’s thesis, 
Lehigh University, January 2002. 
[3] K. P. Acken, M. J. Irwin, and R. M. Owens, “Power Comparisons for Barrel Shifters,” 
Proc. of the Int. Symp. on Low Power Electronics and Design, pp. 209-212, 1996. 
[4] G. M. Tharakan and S. M. Kang, “A New Design of a Fast Barrel Switch Network,” IEEE 
Journal of Solid-State Circuits, vol. 28, pp. 217-221, February 1992. 
[5] S.-J. Yih, M. Cheng, and W.-S. Feng, “Multilevel Barrel Shifter for CORDIC Design,” 
Electronics Letters, vol. 32, pp. 1178-1179, June 1996. 
[6] P. A. Beerel, S. Kim, P.-C. Yeh, and K. Kim, “Statistically Optimized Asynchronous Barrel 
Shifters for Variable Length Codecs,” Proc. of the Int. Symp. on Low Power Electronics 
and Design, pp. 261-263, 1999. 
[7] M. R. Pillmeier, M. J. Schulte, and E. G. Walters III, “Design Alternatives for Barrel 
Shifters,” Proc. of SPIE: Advanced Signal Processing Algorithms, Architectures, and 
Implementations, vol. 4791, pp. 436-447, Seattle, Washington, July 2002. 
[8] H. D. Foster, A. C. Krolnik, and D. J. Lacey, Assertion-Based Design, 2nd ed. Kluwer 
Academic Publishers, 2003. 
[9] K. C. Chen, “Assertion-based verification for SoC designs,” Proc. of the 5th International 
ASIC Conference, vol. 1, pp. 12-15, October 2003.  
[10] B. Meyer, “Applying design by contract,” IEEE Computer, vol. 25, pp. 40-51, 1992. 
[11] P. Yeung, K. Larsen, “Practical Assertion-based Formal Verification for SoC Designs,” 
Proc. International Symposium on System-on-Chip, pp. 58-61, 15-17 November 2005. 
[12] J. M. Voas, K. W. Miller, “Putting assertions in their place,” Proc. of International 
Symposium on Software Reliability Engineering, pp. 152-157, 1994. 
[13] M. Riazati, S. Mohammadi, A. Afzali-Kusha, and Z. Navabi, “Improved Assertion 
Lifetime via Assertion-Based Testing Methodology,” International Conference on 
Microelectronics, pp. 48-51, 16-19 December 2006. 
[14] Novas Software Inc., “SystemVerilog: An EDA Vendor Perspective,” The 12th EDA 
Interoperability Developers’ Forum, October 2003. 
[15] J. Bhasker, A VHDL Primer, 3rd ed. Pearson Education, 2005. 
[16] W. Wolf, FPGA-Based System Design, 1st ed. Prentice Hall, 2004. 
 
 
 
53 
 
[17] S. Brown, J. Rose, “Architecture of FPGAs and CPLDs: A Tutorial,” IEEE Design and 
Test of Computers, vol. 13, no. 2, pp. 42-57, 1996. 
[18] L. Berger, A. Greiner, and E. P. Lopes, “A Consistent Approach in Logic Synthesis for 
FPGA Architectures,” Proceedings of the International Conference on ASIC,  pp. 104-
107, Pekin, October 1994. 
[19] L. Scheffer, L. Lavagno, Electronic Design Automation For Integrated Circuits 
Handbook, 1st ed. CRC Press, 2006. 
[20] H. Foster, K. Larsen, M. Turpin, “Introduction to the New Accellera Open Verification 
Library.” [Online]. Available: http://www.eda.org/ovl/pages/pdfs/dvcon06_foster.pdf  
[14] Novas Software Inc, SystemVerilog: An EDA Vendor Perspective, in The 12th EDA 
Interoperability Developers’ Forum, October 2003 
[21] R. Stolzman, “Understanding Assertion-Based Verification.” [Online]. Available: 
http://www.edadesignline.com/showArticle.jhtml?articleID=192200468  
[22] J. Horgan, “Assertion Based Verification.” [Online]. Available: 
http://www10.edacafe.com/nbc/articles/view_weekly.php?articleid=209195&page_no=
1  
[23] Accellera Organization Inc. [Online]. Available:  http://www.accellera.org/ 
 
[24] Accellera Organization Inc., “Accelera Standard OVL v2.0 – Library Reference Manual”, 
June 2007.  
 
[25] C. Zeh, “Incremental Design Reuse with Partitions” [Online]. Available: 
http://www.xilinx.com/support/documentation/application_notes/xapp918.pdf 
 
[26] A.Cosoroaba, F. Rivoallon, “Achieving Higher System Performance with the Virtex-5 
Family of FPGAs.” [Online]. Available: 
http://www.xilinx.com/bvdocs/whitepapers/wp245.pdf 
[27] Xilinx, Inc., [Online]. Available: http://www.xilinx.com/ 
[28] Altera Corp., [Online]. Available: http://www.altera.com/ 
[29] Lattice Semiconductor Corp., [Online]. Available: http://www.latticesemi.com/ 
[30] Actel Corp., [Online]. Available: http://www.actel.com/ 
[31] Atmel Corp., [Online]. Available: http://www.atmel.com/ 
[32] Z. A. Syed, A. Noore, “Performance Optimization to Alleviate I/O Constraints in 
Designing Large FPGA Shifters,” IEICE Electronics Express, vol. 5, no. 1, pp. 29-34, 
2008. 
 
 
 
54 
 
Appendix 
Appendix A 
A.1 The VHDL OVL library 
The accellera_ovl_hdl [24] library contains VHDL implementations of the OVL 
checkers. The v2.1 of the VHDL OVL library released in Sep 2007 implements 10 of the 
most commonly used OVL checkers in verification. The remaining 21 checkers will be 
addressed in later versions of the VHDL OVL library. The functions of the 10 VHDL OVL 
checkers are described in Table A.1. The VHDL OVL components are compatible with the 
Verilog OVL versions. However, the VHDL components include an additional generic 
named controls that provides global configuration of the library. The VHDL OVL library 
is synthesizable and its components support both std_logic/std_logic_vector and 
std_ulogic/std_ulogic_vector port types. 
A.1.1 The ovl_ctrl_record type 
The global library configuration is controlled by an ovl_ctrl_record constant assigned to 
the controls generic on every checker instance. This constant is defined in every design 
package so that the global variables can be controlled from a single place. 
The ovl_ctrl_record type is divided into three groups: 
1. Elements that are of the ovl_ctrl type and can be assigned OVL_ON or OVL_OFF 
values. These elements mainly control the generate statements used in the 
checkers. 
 
 
55 
 
2. User-configurable values that control the message printing and how long the 
simulation should continue after a fatal assertion occurs. 
3. Default values of the generics that are common to all checkers. 
The ovl_ctrl_record type variable that is used to control the design package for the 
design under study is given in the Appendix. 
Table A.1: Description of the VHDL OVL checkers [24] 
 
A.1.2 Synthesizing the VHDL OVL library 
All the code in the OVL VHDL library is completely synthesizable except the path_name 
attribute in the architectures and the std_ovl_procs.vhd file. This issue is fixed by 
Assertion Statement Description 
ovl_always  Checks the single-bit expression test_expr at each rising 
edge of clk to verify whether it evaluates to TRUE. 
ovl_cycle_sequence  Checks the expression event_sequence at the rising edges 
of clk to identify whether or not the bits in event_sequence 
assert sequentially on successive rising edges of clk. 
ovl_never Checks the single-bit expression test_expr at each rising 
edge of clk to verify the expression does not evaluate to 
TRUE. 
ovl_one_hot Checks the expression test_expr at each rising edge of clk 
to verify the expression evaluates to a one-hot value. A 
one-hot value has exactly one bit set to 1. 
ovl_range Checks the expression test_expr at each rising edge of clk 
to verify the expression falls in the range from min to max, 
inclusive. The assertion fails if test_expr< min or max < 
test_expr. 
ovl_implication If antecedent_expr holds then consequent_expr must hold 
in the same cyle 
ovl_never_unknown test_expr must never be an unknown value, just boolean 0 
or 1 
ovl_never_unknown_async test_expr must never be an unknown value 
asynchronously, it must remain boolean 0 or 1 
ovl_next test_expr must hold num_cks cycles after start_event 
holds 
ovl_zero_one_hot test_expr must be one-hot or zero, i.e. at most one bit set 
high 
 
 
56 
 
modifying the architecture bodies to set the path string constants to ““ and using the 
std_ovl_procs_syn.vhd file in the std_ovl directory. 
  
 
 
57 
 
Appendix B 
 
VHDL OVL Checker Performance Data 
This appendix lists the performance characteristics of the VHDL OVL checkers. The data is 
generated using the procedure described in Section 7.8. Each table lists the resource utilization 
and delay characteristics of each checker. The size of the test data varies from 1bit to 128 bits for 
each checker. To calculate an average value, 4, 8, and 16 checkers are implemented for every test 
data width. 
Assert never unknown - Slices Used 
 
Assert always - Slices Used 
Number of 
checkers 
Test Data Width (bits) 
 
Number of 
checkers 
Test Data Width (bits) 
1 4 16 64 128 
 
1 4 16 64 128 
4 5 14 56 242 486 
 
4 3 11 43 157 335 
8 9 29 118 510 946 
 
8 5 17 94 298 641 
16 14 62 213 1014 1920 
 
16 10 37 156 514 1302 
Assert never unknown - Total delay (ns) 
 
Assert always - Total delay (ns) 
Number of 
checkers 
Test Data Width (bits) 
 
Number of 
checkers 
Test Data Width (bits) 
1 4 16 64 128 
 
1 4 16 64 128 
4 0.29 0.66 0.98 0.86 1.21 
 
4 0.95 0.73 0.91 1.20 2.24 
8 0.57 1.44 1.66 1.19 1.59 
 
8 1.66 1.55 1.52 1.59 2.50 
16 1.25 2.30 2.21 1.63 2.19 
 
16 1.70 1.53 2.02 2.61 2.54 
             Assert range - Slices Used 
 
Assert cycle sequence - Slices Used 
Number of 
checkers 
Test Data Width (bits) 
 
Number of 
checkers 
Test Data Width (bits) 
1 4 16 64 128 
 
1 4 16 64 128 
4 10 35 134 542 1002 
 
4 6 29 145 401 870 
8 19 78 273 1031 2046 
 
8 11 63 413 914 1610 
16 39 139 522 2079 4102 
 
16 25 136 846 1649 3132 
Assert range - Total delay (ns) 
 
Assert cycle sequence - Total delay (ns) 
Number of 
checkers 
Test Data Width (bits) 
 
Number of 
checkers 
Test Data Width (bits) 
1 4 16 64 128 
 
1 4 16 64 128 
4 0.40 0.72 0.93 1.46 1.94 
 
4 1.49 0.83 0.80 1.83 1.72 
8 1.90 1.62 2.56 1.79 2.78 
 
8 1.62 1.36 1.08 1.70 2.34 
16 2.10 2.82 2.06 2.27 2.01 
 
16 1.83 2.25 2.46 2.31 2.70 
 
  
 
 
58 
 
Assert next - Slices Used 
 
Assert never - Slices Used 
Number of 
checkers 
Test Data Width (bits) 
 
Number of 
checkers 
Test Data Width (bits) 
1 4 16 64 128 
 
1 4 16 64 128 
4 13 52 195 772 1540 
 
4 5 13 53 210 492 
8 22 102 372 1488 3055 
 
8 8 31 109 489 889 
16 45 199 734 3013 6202 
 
16 13 55 202 997 1834 
Assert next - Total delay (ns) 
 
Assert never - Total delay (ns) 
Number of 
checkers 
Test Data Width (bits) 
 
Number of 
checkers 
Test Data Width (bits) 
1 4 16 64 128 
 
1 4 16 64 128 
4 1.14 1.05 1.76 2.41 2.58 
 
4 0.75 1.24 1.21 1.44 1.41 
8 1.11 1.68 2.93 2.26 3.09 
 
8 0.55 0.96 1.73 2.20 2.13 
16 2.58 2.15 2.29 2.96 3.93 
 
16 1.65 1.32 2.36 2.71 2.98 
             Assert zero one hot - Slices Used 
 
Assert one hot - Slices Used 
Number of 
checkers 
Test Data Width (bits) 
 
Number of 
checkers 
Test Data Width (bits) 
1 4 16 64 128 
 
1 4 16 64 128 
4 8 35 157 438 931 
 
4 7 28 149 422 910 
8 12 64 457 922 1591 
 
8 10 57 434 902 1553 
16 27 166 839 1576 2855 
 
16 22 143 823 1567 2831 
Assert zero one hot - Total delay (ns) 
 
Assert one hot - Total delay (ns) 
Number of 
checkers 
Test Data Width (bits) 
 
Number of 
checkers 
Test Data Width (bits) 
1 4 16 64 128 
 
1 4 16 64 128 
4 1.28 1.17 1.16 1.90 1.99 
 
4 0.21 0.33 1.18 2.16 1.47 
8 1.62 1.85 1.31 2.13 2.37 
 
8 1.10 1.60 1.66 2.97 2.50 
16 1.67 1.80 2.63 2.81 2.66 
 
16 1.59 1.98 2.22 2.60 3.66 
             Assert implication - Slices Used 
 
Assert never unknown async - Slices Used 
Number of 
checkers 
Test Data Width (bits) 
 
Number of 
checkers 
Test Data Width (bits) 
1 4 16 64 128 
 
1 4 16 64 128 
4 15 37 192 712 1376 
 
4 5 14 56 242 486 
8 20 66 344 1213 2668 
 
8 9 29 118 510 946 
16 36 187 751 2787 4532 
 
16 14 62 213 1014 1920 
Assert implication - Total delay (ns) 
 
Assert never unknown async - Total delay (ns) 
Number of 
checkers 
Test Data Width (bits) 
 
Number of 
checkers 
Test Data Width (bits) 
1 4 16 64 128 
 
1 4 16 64 128 
4 0.86 1.38 1.65 2.00 2.63 
 
4 0.68 1.22 1.41 1.55 1.73 
8 1.13 1.74 2.23 2.76 2.95 
 
8 0.48 1.41 2.80 2.44 2.03 
16 1.77 1.93 2.64 3.01 3.70 
 
16 1.07 1.65 2.96 2.33 2.61 
 
