Configurable Floating-Point Unit for the SHMAC Platform by Indergaard, Audun Lie
Configurable Floating-Point Unit for the 
SHMAC Platform
Audun Lie Indergaard
Electronics System Design and Innovation
Supervisor: Per Gunnar Kjeldsberg, IET
Department of Electronics and Telecommunications
Submission date: June 2014
Norwegian University of Science and Technology
 
iProblem Description
The Single-ISA Heterogeneous MAny-core Computer (SHMAC) is an ongo-
ing research project within the Energy Efficient Computing Systems (EECS)
strategic research area at NTNU. SHMAC is planned to run in an FPGA
and be an evaluation platform for research on heterogeneous multi-core sys-
tems. Due to battery limitations and the so-called Dark silicon effect, future
computing systems in all performance ranges are expected to be power lim-
ited. The goal of the SHMAC project is to propose software and hardware
solutions for future power-limited heterogeneous systems.
The standard SHMAC processing tile only supports fixed-point calcula-
tions. For efficient programming, floating-point is preferred. In the general
case this normally comes with a performance, energy and area overhead. For
many applications it is not necessary to have a floating-point unit (FPU)
that follows the complete IEEE standard. However, the overhead can then
be reduced significantly. A possible trade-off would be a configurable FPU,
e.g., with respect to bit width and exception handling.
The main parts of this assignment are as follows:
• Study the IEEE floating-point standard and its implementation.
• Study application specific FPU implementations and in particular any
configurable FPU implementations found in the literature.
• Implement a simple FPU for use on the SHMAC platform and test this
for selected software applications.
• Study the requirements of the selected software applications and look
for FPU optimization possibilities.
• Implement one or more application specific and/or configurable FPUs.
• Evaluate performance and energy gains achieved as well as area re-
sults. If time allows, also compare with fixed-point implementations of
the software applications.
Assignment given: January 15th 2014
Supervisor: Per Gunnar Kjeldsberg
ii
iii
Abstract
The use of floating-point hardware in FPGAs has long been considered infea-
sible or related to use in expensive devices and platforms. However, floating-
point operations are crucial for many scientific computations and for effi-
cient programming, floating-point is preferred. The IEEE Standard 754 for
floating-point arithmetic provides a method that will yield the same results
whether the processing is done in hardware, software or the combination of
the two. However, the scope of this standard is much more comprehensive
than what is needed for many systems and can cause a lot of overhead. This
thesis presents ways to lower the power consumption, area usage and latency
by using a configurable floating-point unit (FPU) with variable bit-width.
There is a linear relation between the bit-width of floating-point numbers
and the dynamic power consumption, while there is an exponential relation
between the bit-width and area consumption. If only a limited range and
precision are needed, using a tailored FPU design can reduce the area and
dynamic power consumption by up to 96%. Choosing the right FPU can also
reduce the number of clock cycles per operation with up to 98%. For the
applications analyzed, a maximum of 33% of the bit-width in floating-point
numbers are unnecessary, and removing these leads to great performance and
area gains. By analyzing the frequency the different operations are used in
applications, some floating-point operations can be emulated in software and
greater area and power savings can be accomplished.
iv
vSammendrag
Flyttallsenheter i FPGAer har lenge vært lite hensiktsmessig og relatert
til bruk i kostbare enheter og plattformer. Likevel er flyttallskalkulasjoner
nødvendig for mange vitenskapelige beregninger, samt det blir lettere a˚ pro-
grammere software for enheten. IEEE Standard 754 for flyttallsaritmetikk
viser til metoder som vil gi riktige resultater uansett om det er designet for
hardware, software eller en kombinasjon av de to. Derimot resulterer om-
fanget av denne mye ekstra kombinatorikk som er unødvendig for mange sys-
temer. Denne rapporten presenterer ma˚ter a˚ minke effektforbruket, arealet
og forsinkelsen i systemet ved a˚ bruke en konfigurerbar flyttallsenhet med
variabel bitbredde.
Det er et lineært forhold mellom det totale antall bit brukt p˚a flyttallet
og det resulterende dynamiske effektforbruket, mens det er et eksponentielt
forhold mellom bitbredden og arealet. Hvis kun en begrenset bitbredde og
presisjon er nødvendig, kan en tilpasset flyttallsenhet redusere arealet og det
dynamiske effektforbruket med opptil 96%. Ved riktig valg av FPU kan s˚a
mye som 98% av klokkesyklene for aritmetiske operasjoner bli redusert. For
de analyserte applikasjonene, maksimum 33% av bitbredden til flyttallene
er unødvendig og ved a˚ fjerne disse kan ytelsen og arealet bedres. Ved a˚
analysere frekvensen for forskjellige operasoner i applikasjonene, kan deler av
flyttallsenheten bli emulert i software og mindre areal og effektforbruk kan
oppn˚as.
vi
vii
Preface
This thesis was written at the Norwegian University of Science and Technol-
ogy and is a part of the SHMAC research project initiated by EECS, which
aims to investigate the challenges posed by heterogeneous computing system.
It was written during the spring of 2014 and was chosen because optimization
problems are challenging and a lot of research is done on the subject.
To approach this thesis, the IEEE 754 Standard was studied and I soon
figured out that this standard is much more comprehensive than what is
needed for many systems. After studying articles on the subject, I discovered
that the bit-width of floating-point numbers have a big influence on the area
and power consumption. This became my primary focus. By analyzing
software, I found out that there is a significant difference between the rates
of usage for the different arithmetic operations. Therefore, I explored the
options of having some floating-point arithmetic in software.
Next, different floating-point units were explored. The floating-point unit
from Xilinx and the floating-point library support variable bit-width. How-
ever, since the floating-point library is big and complex, I decided to design
my own floating-point unit. I quickly discovered that the workload was bigger
than expected, and only the adder, subtractor and multiplier were prioritized.
My next challenge was to implement an FPU on the SHMAC platform.
However, the Amber core, which is implemented on the SHMAC platform,
does not have hardware floating-point support. Another student is working
on this task, but as of today, this is not yet implemented. To be indepen-
dent of this problem, I decided to use the OpenRISC core, which does have
hardware floating-point support. My next challenge was to find a Xilinx
FPGA to implement the processing core. However, none of the institutes
on the IME faculty did have available Xilinx FPGAs that was big enough
to handle the OpenRISC. The only option was the ZedBoard using the Xil-
inx Zynq. I found an implementation of the OpenRISC for this system, but
unfortunately many of the vital functions were removed. Other open source
processing cores were considered, but none of these included the functionality
I was looking for. As a result, I decided to abandon the implementation of
the FPU on a processing core and focus more on testing and analyzing.
I would like to thank Per Gunnar Kjeldsberg and others working with the
SHMAC project for help, feedback and guidance on this project.
Audun Lie Indergaard
June 11th 2014
viii
Contents
1 Introduction 1
2 Background 5
2.1 IEEE Standard 754 for Floating-Point Arithmetic . . . . . . . 5
2.1.1 Formats . . . . . . . . . . . . . . . . . . . . . . . . . . 6
2.1.2 Conversion to Binary Format . . . . . . . . . . . . . . 7
2.1.3 Exceptions . . . . . . . . . . . . . . . . . . . . . . . . . 8
2.2 Arithmetic on Floating-Point Numbers . . . . . . . . . . . . . 8
2.2.1 Addition and Subtraction . . . . . . . . . . . . . . . . 9
2.2.2 Multiplication . . . . . . . . . . . . . . . . . . . . . . . 10
2.2.3 Division . . . . . . . . . . . . . . . . . . . . . . . . . . 11
2.3 Fixed-Point . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
2.4 Asynchronous Design . . . . . . . . . . . . . . . . . . . . . . . 12
2.5 Floating-Point and Fixed-Point Unit Design . . . . . . . . . . 12
2.5.1 LogiCORE IP Floating-Point Operator . . . . . . . . . 12
2.5.2 OpenCores Single Precision Floating-Point Unit . . . . 13
2.5.3 OpenCores Double Precision Floating-Point Unit . . . 14
2.5.4 Floating-Point Library . . . . . . . . . . . . . . . . . . 14
2.5.5 Fixed-Point Library . . . . . . . . . . . . . . . . . . . . 15
2.5.6 SoftFloat . . . . . . . . . . . . . . . . . . . . . . . . . 15
2.6 Processing Cores . . . . . . . . . . . . . . . . . . . . . . . . . 15
2.6.1 Amber 2 Core . . . . . . . . . . . . . . . . . . . . . . . 16
2.6.2 OpenRISC . . . . . . . . . . . . . . . . . . . . . . . . . 16
2.7 FPGAs and Tools . . . . . . . . . . . . . . . . . . . . . . . . . 18
2.7.1 Xilinx Virtex-5 XC5VLX330 FPGA . . . . . . . . . . . 18
2.7.2 Xilinx Spartan -6 LX16 . . . . . . . . . . . . . . . . . . 18
2.7.3 Xilinx ZynqTM-7000 . . . . . . . . . . . . . . . . . . . 19
2.7.4 Tools . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
2.8 Testbenches . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
ix
x CONTENTS
3 Related Work 23
3.1 Bit-Width Optimisation for Fixed-and Floating-Point . . . . . 24
3.2 Minimizing Floating-Point Power Dissipation via Bit-Width
Reduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
4 Design Space Exploration 29
4.1 Floating-Point Standard . . . . . . . . . . . . . . . . . . . . . 29
4.2 Floating-Point Implementation . . . . . . . . . . . . . . . . . 30
4.3 Processing Core . . . . . . . . . . . . . . . . . . . . . . . . . . 31
4.4 FPGA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
4.5 Testbenches . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
5 Implementation and Results 33
5.1 Test Plan . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
5.1.1 Functionality . . . . . . . . . . . . . . . . . . . . . . . 33
5.1.2 Performance . . . . . . . . . . . . . . . . . . . . . . . . 34
5.2 Design and Performance . . . . . . . . . . . . . . . . . . . . . 34
5.2.1 LogiCORE IP Floating-Point Operator . . . . . . . . . 35
5.2.2 OpenCores Single Precision Floating-Point Unit . . . . 39
5.2.3 OpenCores Double Precision Floating-Point Unit . . . 39
5.2.4 Floating-Point Library . . . . . . . . . . . . . . . . . . 40
5.2.5 Fixed-Point Library . . . . . . . . . . . . . . . . . . . . 40
5.2.6 Configurable Floating-Point Arithmetic Design . . . . . 41
5.3 Precision and Range in Testbenches . . . . . . . . . . . . . . . 46
6 Discussion 47
6.1 Size and Latency Analysis . . . . . . . . . . . . . . . . . . . . 47
6.2 Power Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . 49
6.3 Optimization for Software . . . . . . . . . . . . . . . . . . . . 51
7 Conclusion 55
Bibliography 57
Appendices 61
A Matlab Code 63
A.1 Floating-Point Unit Testbench Generator . . . . . . . . . . . . 63
A.2 Floating-Point Unit Testbench Check . . . . . . . . . . . . . . 65
A.3 Calculate Value of Floating-Point Numbers . . . . . . . . . . . 66
CONTENTS xi
B HDL Code 69
B.1 Floating-Point Design . . . . . . . . . . . . . . . . . . . . . . . 69
B.1.1 Top-Level Design for Xilinx IP . . . . . . . . . . . . . . 69
B.1.2 Adder and Subtracter . . . . . . . . . . . . . . . . . . 77
B.1.3 Multiplier . . . . . . . . . . . . . . . . . . . . . . . . . 84
B.1.4 Top-Level Design for Floating-Point Library . . . . . . 89
B.1.5 Top-Level Design for Fixed-Point Library . . . . . . . . 91
B.1.6 Configurable Adder and Subtractor . . . . . . . . . . . 94
B.1.7 Configurable Multiplier . . . . . . . . . . . . . . . . . . 101
B.2 Floating-Point Unit Testbench . . . . . . . . . . . . . . . . . . 106
C Diagrams 109
D Calculations 111
D.1 Calculated Mantissa Bit-Width for Floating-Point Numbers . 111
D.2 Calculated Fraction Bit-Width for Fixed-Point Numbers . . . 112
E File Hierarchy 113
xii CONTENTS
List of Figures
1.1 A Heterogeneous Many-Core System [2]. . . . . . . . . . . . . 2
1.2 High-Level Architecture of ARM-Based Single-ISA Heteroge-
neous MAny-Core Computer (SHMAC) [4]. . . . . . . . . . . . 2
2.1 Single Precision Floating-Point Number Representation [8]. . . 6
2.2 Bit Order with Fixed-Point Representation [12]. . . . . . . . . 12
2.3 Block Diagram of Generic Floating-Point Binary Operator
Core from Xilinx[12]. . . . . . . . . . . . . . . . . . . . . . . . 13
2.4 Hierarchy of Double Precision Floating-Point Core [14]. . . . . 14
2.5 Amber Tile on SHMAC Platform[22]. . . . . . . . . . . . . . . 17
2.6 OpenRISC 1200 Core Architecture[23]. . . . . . . . . . . . . . 17
2.7 Spartan-6 LX16 Evaluation Board Block Diagram [27]. . . . . 19
2.8 ZedBoard Block Diagram [27]. . . . . . . . . . . . . . . . . . . 20
3.1 Accuracy Compared with Various Exponent and Mantissa Bit-
Widths [40]. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
3.2 Energy and Latency per Operation for Different Operand Bit-
Widths [40]. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
5.1 Diagram of Top-Level Design for Floating-Point Implementa-
tion. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
5.2 Flow Diagram from Input is Present to Output is Produced. . 37
5.3 Flow Chart of Addition and Subtraction with Floating-Point
Numbers. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
5.4 Flow Chart of Multiplication with Floating-Point Numbers. . . 44
6.1 Number of LUTs for Different Xilinx Floating-Point Unit De-
signs. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
6.2 Dynamic Power Consumption with Different Xilinx Floating-
Point Unit Designs. . . . . . . . . . . . . . . . . . . . . . . . . 50
6.3 Cake Diagram of the Differences Between the Xilinx IP Arith-
metic for Different Bit-Width with no DSP Usage. . . . . . . . 52
xiii
xiv LIST OF FIGURES
C.1 Diagram of Floating-Point Implementation. . . . . . . . . . . 110
List of Tables
2.1 Parameters for Binary Floating-Point Formats [7]. . . . . . . . 7
2.2 Bit-Width of Result for Different Operations [16]. . . . . . . . 15
2.3 Latency and Area Usage for Single Precision Floating-Point
in SoftFloat on the SHMAC Platform [17]. . . . . . . . . . . . 16
2.4 Number of Floating-Point Operations in Different Testbenches
[35]. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
3.1 Calculated Range for Floating-Point Numbers with Different
Exponent Bit-Widths(e) . . . . . . . . . . . . . . . . . . . . . 25
3.2 Calculated Range for Fixed-Point Numbers with Different In-
teger Bit-Width(k) . . . . . . . . . . . . . . . . . . . . . . . . 25
5.1 Latency, Size and Power Consumption for Xilinx IP . . . . . . 38
5.2 Latency, Size and Power Consumption for OpenCores Single
Precision Floating-Point Unit . . . . . . . . . . . . . . . . . . 40
5.3 Latency and Size for OpenCores Double Precision Floating-
Point Unit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
5.4 Latency, Size and Power Consumption for Floating-Point Li-
brary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
5.5 Latency, Size and Power Consumption for Fixed-Point Library 41
5.6 Latency, Size and Power Consumption for Configurable Floating-
Point Unit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
5.7 Range and Precision Analysis of Benchmarks . . . . . . . . . . 46
6.1 Size and Latency for Different Floating-Point Units . . . . . . 49
6.2 Power Consumption for Different Floating-Point Units . . . . 50
xv
xvi LIST OF TABLES
List of Acronyms
SHMAC Single-ISA Heterogeneous MAny-core Computer
ISA Instruction Set Architecure
APB Advanced Peripherals Bus
FPGA Field Programable Gate Array
IP Intellectual Property
FPU Floating-Point Unit
FP Floating-Point
ASIC Application-Specific Integrated Circuit
NaN Not a Number
IP Intellectual Property
RISC Reduced Instruction Set Computing
DDR Double Data Rate
MMU Memory Management Unit
RTL Register-Transfer Level
DSP Digital Signal Processor
XST Xilinx Synthesis Technology
LUT LookUp Table
SoC System on Chip
SPEC Standard Performance Evaluation Corporation
xvii
xviii CHAPTER 0. LIST OF ACRONYMS
CMU Communication Management Unit
NIST National Institute of Standards and Technology
HDL Hardware Description Language
VHDL Very High Speed Integrated Circuit HDL
DPC Dynamic Power Consumption
IO Input/Output
Chapter 1
Introduction
In the roughly 65 years since the first general-purpose electronic computer
was created, computer technology has made incredible progress. According
to Moore’s law, the number of transistors on an integrated circuit doubles
approximately every two years and this will, according to Pollack’s rule, en-
able a new microarchitecture that delivers a 40% performance increase [1].
The rapid growth in microprocessor performance has been enabled by three
key technology drivers; transistor-speed scaling, core microarchitecture tech-
niques and cache memories [2]. In a Dennard scaling process the dimension
of transistors are reduced, while the electric fields are held constant to main-
tain reliability. As transistors scales down, the supply voltage and threshold
voltage scales to keep the electrical field constant. Since transistors are not a
perfect switch, the current leakage when the transistor is off, increase expo-
nentially with reduction in the threshold voltage. This results in transistor
leakage being a substantial portion of the power consumption and transistors
can no longer be scaled to increase performance. This, among other aspects,
causes the Dark Silicon effect, which is that only a portion of the die can be
used simultaneously to sustain the power budget [3].
1
2 CHAPTER 1. INTRODUCTION
Figure 1.1: A Heterogeneous Many-Core System [2].
Borkar et al. [2] predict that heterogeneous processors, consisting of a
number of large cores for single-thread performance and many small cores
for throughput performance, will better utilize the power budget. Figure 1.1
shows a heterogeneous many-core system. Many small cores operating at a
low frequency and voltages near threshold will consume less power then large
single-threaded cores, therefore it is important to schedule the tasks to the
most suitable processor.
Figure 1.2: High-Level Architecture of ARM-Based Single-ISA Heteroge-
neous MAny-Core Computer (SHMAC) [4].
To investigate the different issues with heterogeneous systems, the Single-
ISA Heterogeneous MAny-core Computer (SHMAC) platform is proposed.
This system, as shown in Figure 1.2, is a tile-based architecture that supports
the ARM Instruction Set Architecture (ISA) [4]. Each tile can either contain
a processor, advanced peripherals bus (APB), scratchpad, main memory or
3a dummy. The dummy tile only contains a router and is used to fill the
remaining unused tiles.
The use of floating-point (FP) hardware in FPGAs has long been con-
sidered infeasible or related to use in expensive devices and platforms [5].
Fixed-point is an alternative to FP and is frequently used in many smaller
hardware systems where decimal numbers are needed. However, the complex-
ity of fixed-point operations demands much more preparations and analysis
to make sure that the precision and range of the calculations are sustained.
Implementing a floating-point unit (FPU) on an FPGA consumes a large
amount of resources and is power hungry. However, FP operations are cru-
cial for many scientific applications such as graphics processing, physical
simulation, mathematical computations, multimedia application, etc. and
for efficient programming, FP is preferred. In applications where FP is not
frequently used, emulated FP operations are common. However, in FP in-
tensive applications, emulated FP operation can consume over 90% of the
application’s total clock cycles, which is unacceptable in most situations [6].
Most FPU design supports the IEEE Standard 754 for floating-point
arithmetic, which among others includes formats, operations, conversions
and exceptions. In embedded systems the precision and range of the FP
numbers and the operations that are required, are often known. Using a
full size FPU supporting the IEEE standard may result in greater power
consumption and a bigger part of the FPGA being used. However, using the
standard makes it easier to adapt the FPU to different systems.
The SHMAC platform contains many different cores and only the cores
handling much arithmetic with decimal numbers needs an FPU. However, the
need for range and precision may differ, so implementing the same FPU on
every core may result in a lack of utilization and larger power consumption.
To overcome this problem a configurable FPU will be implemented. The
advantages and disadvantages of this unit will be discussed along with ways
to analyse software to find the needed range and precision.
This report will first, in Chapter 2, discuss the background information
needed to understand the decisions, implementations and evaluations done.
Chapter 3 contains some related research in this area. In Chapter 4 the opti-
mizations of the FPUs and how to test the system are determined. Chapter
5 contains the design of the systems and how well they perform. It also in-
cludes an analysis of a selection of benchmarks, and how to find the required
bit-width for these. Chapter 6 discusses the results and Chapter 7 concludes
this project and suggests future work.
4 CHAPTER 1. INTRODUCTION
Main Contributions:
• A configurable floating-point unit for Xilinx FPGAs.
• A configurable floating-point unit for ASIC and FPGAs from other
vendors.
• Power, area and latency analysis of different floating-point units
with different bit-widths.
• Example of software analysis to optimize floating-point units
Chapter 2
Background
This chapter includes the information needed to understand design choices
and analysis done later in this thesis. It includes theory about the IEEE
754 Standard for floating-point arithmetic, and the definition of fixed-point
numbers. Later, asynchronous design and different floating-point and fixed-
point designs are explored. Finally a selection of FPGAs, processing cores
and testbenches will be presented.
2.1 IEEE Standard 754 for Floating-Point Arith-
metic
This standard specifies formats and methods for floating-point arithmetic in
computer systems [7]. The purpose of this standard is to provide a method
that will yield the same results whether the processing is done in hardware,
software, or a combination of the two. The standard includes the following
specifications:
• Formats for binary and decimal floating-point (FP) data, for compu-
tation and data interchange.
• Addition, subtraction, multiplication, division, fused multiply add, square
root, compare and other operations.
• Conversions between integer and FP formats.
• Conversions between different FP formats.
• Conversions between FP formats and external representations as char-
acter sequences.
5
6 CHAPTER 2. BACKGROUND
• FP exceptions and their handling, including data that are not numbers
(NaN).
This section will discuss how a binary FP format is represented, short how
to convert a decimal number to a binary FP format and how to represent
exceptions. The next section will discuss four FP arithmetic operations;
adding, subtraction, multiplication and division. Other operations as square
root and comparisons will not be discussed because of the limited time aspect
on this thesis.
2.1.1 Formats
The standard specifies five basic floating-point (FP) formats. There are three
binary formats with encoding lengths 32, 64 and 128 bit, and two decimal
formats, with encoding lengths 64 and 128 bit. In this thesis only the binary
formats will be considered since binary numbers are natural represented in
hardware. The representations of FP data in a format consists of a sign, an
exponent, a mantissa, and a radix b. An FP number is represented in Equa-
tion 2.1 and a figure of a 32 bit floating-point number with eight exponent
bit and 23 mantissa is shown in Figure 2.1. Later in this thesis, a 32 bit
floating-point number may be referred to as single precision, while a 64 bit
floating-point number is called double precision.
Figure 2.1: Single Precision Floating-Point Number Representation [8].
X = (−1)sign ∗ bexponent ∗mantissa⇔ X = (−1)S ∗ bE ∗M (2.1)
The S value can either be 0 or 1, which decides if the number is positive
or negative. The b value is the radix and is 2 for binary formats. The E is
an integer limited by Emin and Emax and is represented as e bit. The Emax
values for three different binary formats are listed in Table 2.1 and can be
2.1. IEEE STANDARD 754 FOR FLOATING-POINT ARITHMETIC 7
Table 2.1: Parameters for Binary Floating-Point Formats [7].
Binary format (b=2)
parameter binary32 binary64 binary128
e 8 11 15
Emax +128 +1024 +16384
Emin -127 -1023 -16383
bias 127 1023 16383
m 24 53 113
calculated using the following equation: Emax = 2
e−1. The Emin values for
some binary formats are also listed in Table 2.1 and is calculated as follow:
Emin = 1−Emax. The formula for calculating the exponent value is shown in
Equation 2.2 and the bias value can be calculated using Equation 2.3. The
bias value is used to represent both positive and negative exponent values.
As an example, if the exponent is the following sequence of bit; 01111110,
then E = 26 + 25 + 24 + 23 + 22 + 21 − (28−1 − 1) = −1.
E = (
e−1∑
n=0
bn × 2n)− bias (2.2)
bias = 2e−1 − 1 (2.3)
The M value is a string on the form b02
0 · b12−1b22−2...bm−12m−1 where
bi is a binary number and m is the number of mantissa bit. The number of
mantissa bit for three formats are shown in Table 2.1. The mantissa value
can be calculated in to ways, depending on if the number is normalized or
denormalized. A FP number is denormalized if the exponent value is equal
to the bias value, in other words, all exponent bit are zero. The way to
calculate the mantissa value is shown in Equation 2.4. This results in the
mantissa being a decimal number between one and two if normalized and
between zero and two if denormalized.
M =
{
1 +
∑m
n=1(bn−1 × 2−n) if normalized∑m
n=1(bn−1 × 21−n) if denormalized
(2.4)
2.1.2 Conversion to Binary Format
This thesis will not consider the conversion of decimal numbers to a binary
format since this is often taken care of by the compiler [9]. However, a basic
understanding can make it easier to understand how floating-point numbers
behave. The conversion will be explained by the following example.
8 CHAPTER 2. BACKGROUND
4.875 is the decimal number to convert to a 32 bit binary format with 8
exponent bit and 23 mantissa bit. First, the fraction 0.875 is multiplied with
two until the remainder is zero. If the result of the multiplication is greater
or equal to one the bit on that spot is one, otherwise zero.
0.875× 2 = 1.750⇒ b−1 = 1,
0.750× 2 = 1.500⇒ b−2 = 1,
0.500× 2 = 1.000⇒ b−3 = 1
This results in (0.875)10 being represented in binary as (0.111)2. Second,
(4)10 is converted to binary. It results in the binary representation (100)2, so
the entire decimal number is written as (100.111)2. According to the IEEE
754 Standard real numbers have to be represented in a (1.x1x2...xn)2 × 2y
format. That results in the following conversion: (100.111)2 = (1.00111)2 ×
22. To convert this number to a 32 bit number the exponent has to be biased.
According to Table 2.1 the bias for a 32 bit floating-point number is 127. The
exponent value can be calculated as x− 127 = 2⇒ x = 129 = 27 + 20. Since
the mantissa is represented as a number between one and two the binary
representation of 4.875 is 0 10000001 0011100....... The integer part of the
mantissa is ”hidden” when the number is normalized.
2.1.3 Exceptions
Some of the floating-point bit orders represent special numbers. These are
listed below.
+Infinity All exponent bit are one and others are zero
-Infinity All bit are one, except mantissa bit
NaN All exponent bit are one, one/several of mantissa bit are one
+Zero All bit are 0
-Zero All bit except sign is zero
In addition does many floating-point units have exceptions for overflow, un-
derflow, invalid operations and divide by zero.
2.2 Arithmetic on Floating-Point Numbers
This section will discuss how to perform floating-point arithmetic ”on paper”,
and give a basic understanding of why the algorithms implemented later in
this paper, are implemented the way they are. The arithmetics discussed in
2.2. ARITHMETIC ON FLOATING-POINT NUMBERS 9
this section are addition, subtraction, multiplication and division [10]. The
algorithms are explained by examples.
2.2.1 Addition and Subtraction
The numbers to add and subtract, 100.25 and 0.5, are represented with eight
bit exponent and eight bit mantissa. In binary notation these operands are
written as:
100.25 = 1.0025× 102 = 0 10000101 10010001
0.5000 = 5.0000× 10−1 = 0 01111110 00000000
The first step is to align the radix point, in other words, make sure that
both operands are represented with the same exponent. This can be done
by right-shifting the mantissa of the smallest operand. The number of times
it has to be shifted is set by the difference between the exponents, which in
this example is seven. The resulting binary representation is:
0 01111110 00000000 (original value)
0 01111111 10000000 (shifted 1 place)
0 10000101 00000010 (shifted 7 places)
Notice the ”hidden” bit is shifted into the mantissa. It means the new
representation of the number is denormalized. Also notice that the exponents
of the operands are equal, so the mantissas can easily be added together. The
”hidden” bit of the number that is still normalized has to be added.
0 10000101 1.10010001 (100.25)
+0 10000101 0.00000010 (0.5)
=0 10000101 1.10010011
→0 10000101 10010011 (100.75)
The next step is to normalize the result. For this example the result
is already normalized. However, to normalize a number the ”hidden” bit
has to be one so the mantissa value is greater than one and smaller than
two. This can be done by left or right-shifting the resulting mantissa and
subtracting or adding the exponent with the number of shifted places. The
final step is to round the results. This is only necessary when the precision
of the result exceeds the numbers of mantissa bit. According to the IEEE
754 Standard, the number can either be rounded to nearest even, to zero, up
or down depending on what the programmer wants [7].
10 CHAPTER 2. BACKGROUND
For subtraction the same procedure is followed except instead of adding,
the mantissas are subtracted. Note that the smallest number always is
subtracted from the biggest, otherwise underflow occurs. An example is
0.5−100.25. To avoid underflow this have to be rearranged to−(100.25−0.5).
The calculation is done below.
0 10000101 1.10010001 (100.25)
−1 10000101 0.00000010 (0.5)
=1 10000101 1.10001111
→1 10000101 10010011 (−99.75)
2.2.2 Multiplication
The bit-width for this example is eight exponent bit and five mantissa bit.
The operands are 2.5 and −3.5 and are represented in binary as
2.5 =0 10000000 01000
−3.5 =1 10000000 11000
The first step is to multiply the mantissa, note to include the ”hidden”
bit. This is done as follow:
1.01000
×1.11000
= 10.00110000
Next the exponents are added together and the bias value is subtracted.
Since each exponent contains an exponent value and a bias value, the bias
value has to be subtracted so it is not added twice. As discussed in the
previous section the bias for an eight bit exponent is 127, which results in
10000000
−01111111
+10000000
=10000001
To find the resulting sign, the sign of both operands are XORed and in this
example the resulting sign bit is 1. The result is then 1 10000001 10.00110000.
The mantissa of this number is not normalized. To normalize the number the
mantissa has to be right-shifted one place and the exponent must be added
with one. This finally results in 1 10000010 1.000110000→ 1 10000010 00011(−8.75).
2.3. FIXED-POINT 11
2.2.3 Division
The algorithm for division is quite similar to the one for multiplication. The
only difference is the mantissas are divided instead of multiplied and the
exponent is subtracted instead of added. This example uses eight exponent
bit and five mantissa bit. The operands are 10.0 and 2.5 and are represented
in binary as:
10.0 = 0 10000010 01000
2.5 = 0 10000000 01000
The first step is to subtract the exponents. The bias has to be added so
the bias value is not subtracted twice.
10000010
−10000000
+01111111
=10000001
The next step is to divide the mantissas.
1.01000
/1.01000
=1.00000
This mantissa is already normalized so the final result is 0 10000001 00000.
2.3 Fixed-Point
Fixed-point data do not have a clear specification as floating-point data.
However, it is defined as either fractional data values or data values with
an integer part and a fractional part [11]. Fixed-point data can typically be
used when the dynamic range and precision is less important then the size
and speed of the system.
The binary interchange fixed-point formats are defined in Figure 2.2.
Fixed-point values are represented using a two’s complement number that
is weighted by a fixed power of two [12]. The bit position is labelled with an
index i. The value of a fixed-point number is given by Equation 2.5.
v = (−1)bw−12w−1−wf +
w−2∑
i=0
2i−wf bi (2.5)
12 CHAPTER 2. BACKGROUND
Figure 2.2: Bit Order with Fixed-Point Representation [12].
2.4 Asynchronous Design
Asynchronous design is independent of the clock signal, which can potentially
lead to performance advantages[13]. However, asynchronous design requires
extra logic to detect the completion of a step, and this may lead to the
advantage levelling out. Various from synchronous designs that are either
on if the clock is on or visa versa, asynchronous design only consume power
when active. This result, in most cases, that asynchronous design is less
power hungry then synchronous. The latency in an asynchronous design
depends on the longest path through the design. This may vary in how it is
designed and which platform it is implemented on. This makes it harder to
predict the latency for asynchronous design and more difficult to adapt the
design to different systems.
2.5 Floating-Point and Fixed-Point Unit De-
sign
This section will explore some of the available floating-point (FP) and fixed-
point implementations. Since the SHMAC platform is using a Xilinx FPGA,
FP operators from Xilinx are explored. The other floating-point implemen-
tations discussed are open source.
2.5.1 LogiCORE IP Floating-Point Operator
The Xilinx floating-point core provides a range of floating-point arithmetic
with a high level of user specification [12]. The interface is shown in Figure
2.3. A and B are the operands and OPERATION specifies the operation
when the core is configured for multiple. OPERATION ND is set high to
indicate that the operands and the operation are valid, OPERATION RFD
is set by the core to indicate that it is ready for new operands. SCLR is a
synchronous reset, CE is clock enable and CLK is the clock. RESULT is the
result of the operation, UNDERFLOW is set high when underflow occurs
2.5. FLOATING-POINT AND FIXED-POINT UNIT DESIGN 13
and OVERFLOW is set high when overflow occurs. INVALID OP is set
high by core when operands cause an invalid operation, DIVIDE BY ZERO
is set high if a division by zero was performed and RDY is set high by the
core to indicate that the RESULT is valid. Many of the inputs and outputs
can be removed by the designer.
The IP supports several fraction and exponent bit-width. The minimum
mantissa bit-width is 4 bit and the maximum is 64. The minimum exponent
bit-width is 4 bit and the maximum is 16. The minimum exponent width
is also limited by Equation 2.6. As an example, if the fraction width is
23, the minimum exponent bit-width is five. This is also controlled when
implementing the IP. It is possible to use an asynchronous version of the IP,
which does not need any clock input.
Minimum Exponent Width = dlog2(Fraction Width + 3)e+ 1 (2.6)
Figure 2.3: Block Diagram of Generic Floating-Point Binary Operator Core
from Xilinx[12].
2.5.2 OpenCores Single Precision Floating-Point Unit
The floating-point unit (FPU), provided by OpenCores, is a 32-bit open
source processing unit [8]. It fully complies with the IEEE 754 Standard for
single precision floating-point arithmetic and includes, among others, addi-
tion, subtraction, multiplication and division. Unlike the Xilinx Core pre-
sented in subsection 2.5.1, this is an open source design. As a result, it is
possible to explore the algorithms and functionality of the design. However,
14 CHAPTER 2. BACKGROUND
the design is quite complicated and it does not support generics to set the
functionality in the FPU. This result in doing changes, e.g. to the bit-width,
big parts of the design have to be rewritten.
2.5.3 OpenCores Double Precision Floating-Point Unit
The double precision floating-point core published by OpenCores are de-
signed to meet the IEEE 754 Standard for double precision floating point
arithmetic [14]. This FPU is the same unit implemented on the Amber core
on the SHMAC platform. The Amber core will be discussed later in this
chapter. As shown in Figure 2.4, it includes addition, subtraction, multi-
plication and division, a rounding unit and an exception handler. Like the
single precision FPU, it does not support variable bit-widths.
Figure 2.4: Hierarchy of Double Precision Floating-Point Core [14].
2.5.4 Floating-Point Library
This floating-point library complies with the IEEE 754 Standard [15]. In
addition can the bit-width, rounding style and exception handling be config-
ured and denormalized numbers can be excluded. It is also possible to change
the number of guard bit, which is used in arithmetic operations to maintain
precision. These constants are shown in Listing 2.1. The library introduces
the types float, float32, float64 and float124, which allows the designer do
declare signals and variables with different bit-widths.
2.6. PROCESSING CORES 15
Listing 2.1: Constants for float type in the IEEE Floating-Point Library [15]
package f l o a t pkg i s
constant f l o a t exponent w id th : NATURAL := 11 ;
constant f l o a t f r a c t i o n w i d t h : NATURAL := 52 ;
constant f l o a t d eno rma l i z e : BOOLEAN := f a l s e ;
constant f l o a t c h e c k e r r o r : BOOLEAN := f a l s e ;
constant f l o a t g u a r d b i t s : NATURAL := 0 ;
constant no warning : BOOLEAN := ( f a l s e ) ;
2.5.5 Fixed-Point Library
The fixed-point library is defined as set of types and functions to include in
the design [16]. It introduces the types sfixed and ufixed for signed and un-
signed fixed-point numbers, with a user specified integer and fraction length.
The library contains, among others, operations like addition, subtraction,
multiplication and division. Unlike floating-point numbers the results of an
operation does not have the same bit-width as the operands. The bit-widths
for the results are listed in Table 3.2.
Table 2.2: Bit-Width of Result for Different Operations [16].
Operation Result Bit-Width
A+B Max(A’int, B’int)+1 downto Min(A’frac, B’frac)
A-B Max(A’int, B’int)+1 downto Min(A’frac, B’frac)
A*B A’int+B’int+1 downto A’frac+B’frac
Signed / A’int-B’frac+1 downto A’frac-B’int
Unsigned / A’int-B’frac downto A’frac-B’int-1
2.5.6 SoftFloat
The software floating-point library is a part of the gcc library and is regularly
used in systems that do not have a hardware floating-point unit [9]. In Table
2.3 the latency and area usage of SoftFloat for single precision floating-point
arithmetic on the SHMAC platform are listed. This table will be used for
analysis later in this thesis.
2.6 Processing Cores
This section discusses two different processing cores. By having a processing
core, compiled high level applications and assembly code can be executed
16 CHAPTER 2. BACKGROUND
Table 2.3: Latency and Area Usage for Single Precision Floating-Point in
SoftFloat on the SHMAC Platform [17].
Add/Subtract Multiply Divide
Latency without instruction
1018/1034 3324 2494
and data cache (Cycles)
Latency with instruction
59 193 145
and data cache (Cycles)
Area usage (LUTs) 0 0 0
on the system. It also provides support for accelerators, e.g. a hardware
floating-point unit. The first processing core is the Amber 2, which is the
same core implemented on the SHMAC platform. Unfortunately this core
does not have hardware floating-point support in its compiler. As a result,
a second processing core, OpenRISC, which do have hardware floating-point
support, is explored.
2.6.1 Amber 2 Core
The Amber processor core is an ARM-compatible 32-bit RISC processor.
It is fully compatible with the ARM v2a instruction set architecture (ISA)
and provides a complete embedded system with a number of peripherals like
UARTs, timers and a double data rate (DDR3) memory controller [18, 19].
As shown in Figure 2.5, the system contains a Wishbone bus. This bus is an
open source hardware computer bus, intended to let the parts of an integrated
circuit communicate with each other [20]. It is a logic bus, which means that
it does not specify the electrical characteristics or the bus topology. It is
synchronous and is defined to have 8, 16, 32 or 64-bit buses. The Amber core
has a 32-bit Wishbone system bus, a 5-stage pipeline and separate instruction
and data caches. It has multiple and multiply-accumulate operations with
32-bit inputs and 32-bit output in 32 clock cycles, using the Booth algorithm.
The Amber 2 Core contains a co-processor, which includes a floating-
point unit (FPU). Currently the 64 bit FPU by OpenCores is implemented,
but not supported in the compiler. It requires a total of eight clock cycles
for loading data and four for storing the data to the co-processor [21].
2.6.2 OpenRISC
The OpenRISC 1200 Core is a 32-bit scalar RISC with Harvard microar-
chitecture [23]. As shown in Figure 2.6, contains the core, amongst other
modules, a debug unit, interrupt controller, direct-mapped instruction and
2.6. PROCESSING CORES 17
Figure 2.5: Amber Tile on SHMAC Platform[22].
data cache. Two Wishbone interfaces connect the core to external periph-
erals and external memory systems. The CPU has a 5-stage pipeline and
handles the ORBIS32 instruction set architecture (ISA). It contains every-
thing needed in a CPU including a floating-point unit.
Figure 2.6: OpenRISC 1200 Core Architecture[23].
A reference design for the OpenRISC is the ORPSoC (OpenRISC Refer-
ence Platform System on Chip)[24]. It is a development platform targeted
18 CHAPTER 2. BACKGROUND
at specific hardware. The project is organized in such ways that register
transfer level (RTL) and software can be added or changed by the user. It
also contains a GNU compiler so software can be compiled and run on the
desired hardware. The design can be simulated using standard event-driven
simulators such as Icarus Verilog and Mentor Graphics’ Modelsim or it in-
volves creating a cycle accurate model in C or SystemC using the Verialtor
tool [25].
2.7 FPGAs and Tools
In this section a list of Xilinx FPGAs and available tools are presented. Since
the SHMAC platform is implemented on a Xilinx FPGA, it is desirable to
test the system with the same type of FPGA and tools.
2.7.1 Xilinx Virtex-5 XC5VLX330 FPGA
The Virtex-5 LX platform from Xilinx is a high-performance general logic
applications FPGA [26]. It contains of 51,840 Virtex-5 slices and 3,420 Kb
maximum distributed RAM. Each Virtex-5 slice contains four LUTs and
four flip-flops. It also contains 192 DSP48E Slices, which allow the designers
to implement multiple slower operations using time-multiplexing methods.
They provide, among other, better flexibility and utilization and reduced
power consumption. It contains 288 36Kb block RAM blocks, for a total of
10,368Kb and has a total of 33 I/O banks with a maximum of 1,200 user
I/Os.
2.7.2 Xilinx Spartan -6 LX16
This evaluation kit from Avnet contains , among other components, a Xilinx
Spartan-6 XC6SLX16-2CSG324C FPGA, a Cypress PSoC 3 CY8C3866AXI-
40 Programmable System-On-Chip, 32 Mb × 16 Micron LPDDR Mobile
SDRAM and a 128 Mb Numonyx Multi-I/O SPI Flash [27]. A block diagram
of the board is shown in Figure 2.7. The FPGA contains 2,278 slices and
14,579 logic cells. Each slice contains four LUTs and eight flip-flops. To
access and utilize the various features on the board the software AvProg is
used. The software also has the ability to measure the power consumption
in real-time.
2.7. FPGAS AND TOOLS 19
Figure 2.7: Spartan-6 LX16 Evaluation Board Block Diagram [27].
2.7.3 Xilinx ZynqTM-7000
The ZedBoard from Digilent and Avnet is based on the Xilinx Zynq All
Programmable SoC (AP SoC) and combines a dual Corex-A9 Processing
System with a Artix-7 FPGA [28]. The FPGA contains 53,200 LUTs, 560
KB extensible block RAM and 220 programmable DSP slices. The ZedBoard
also contains 512 MB DDR3 memory and USB-JTAG for easy programming
and debugging. A block diagram of the complete system is shown in Figure
2.8.
20 CHAPTER 2. BACKGROUND
Figure 2.8: ZedBoard Block Diagram [27].
2.7.4 Tools
Xilinx provides a number of tools to analyze the behavior and measure the
size, speed and power of the design. Below is a list containing these tools.
• ISE Design Suite 14.7 [29] is a development environment for all Xil-
inx devices. The tool allows for synthesizing, implementing and pro-
2.8. TESTBENCHES 21
gramming on the FPGA. It also makes it easier to use other analysis
programs on the design.
• XST [30] is the Xilinx Synthesis Technology, which synthesize hardware
description language for Xilinx devices. It also optimizes the design for
the FPGA.
• ISim [31] is a hardware description language simulator that performs
behavioral and timing simulations. It supports power analysis and op-
timization using SAIF, which contains the toggle counts on the signals
of the design.
• XPower [32] analyses the power consumption on the designed data. To
get a reliable report with power consumption for each component in
the design, a ”place & route” file, a physical constraint file and a SAIF
file have to be presented.
2.8 Testbenches
The Standard Performance Evaluation Corporation (SPEC) was formed to
establish, maintain and endorse a standardized set of benchmarks to test
the performance of a system [33]. The most resent benchmark is the SPEC
CPU2006, which includes CINT2006 for measuring and comparing system
with integer computation and CFP2006 for measuring and comparing floating
point performance.
The retired benchmark CFP2000 contains a total of 14 floating point
components. Four of these are written in C, while the other ones are written
in Fortran [34]. Information about the testbenches written in C is listed
below.
• 177.mesa is a free OpenGL work-alike library written by Brian E. Paul.
The input data is a two dimensional scalar field which is mapped to
height creating a three dimensional object with explicit vertex normals.
The contour lines are mapped onto the object as a one dimensional
texture.
• 179.art (adaptive resonance theory) is used to recognize objects in a
thermal image. The input consists of a thermal image of a helicopter
or an airplane and a scan file, which contains other thermal views of
the helicopter and airplane. The output data consists a report of a
match between the learned image and the windowed field of view.
22 CHAPTER 2. BACKGROUND
• 183.equake simulates the propagation of elastic waves in large, highly
heterogeneous valleys. The goal is to recover the time history of the
ground motion everywhere within the valley. The input data contains
the grid topology and the seismic event characteristics and it outputs
a case summary with seismic source data and a characteristic of the
motion at both the hypocenter and epicenter.
• 188.ammp runs molecular dynamics on a protein-inhibitor complex
which is embedded in water. The benchmark is derived from pub-
lishing work on understanding drug resistance in HIV of Weber and
Harrison in 1999. The input is the initial coordinates and velocities of
the atoms. The output is the energy of the final configuration of atoms.
In these testbenches the dominating floating-point operations are addi-
tion, subtraction, multiplication and division. The number of times these are
used in each testbench are listed in Table 2.4 [35].
Table 2.4: Number of Floating-Point Operations in Different Testbenches
[35].
Name Description + - × /
177.mesa Graphics Library 347 102 586 27
179.art Image Recognition/ Neutral Networks 253 14 247 12
183.equake Seismic Wave Propagation Simulation 127 58 236 18
188.ammp Computational Chemistry 479 330 930 42
Chapter 3
Related Work
A lot of research has investigated optimization of floating-point units for
area, delay and power consumption in hardware. Some of the articles sug-
gest changing the design of the units. This is not directly relevant to this
thesis, however it will better explain the complete research done in the field
of floating-point units. Only the directly relevant research articles will be
explained in detail, while the others will be described shortly.
Chong et al. [36] propose a flexible multimode embedded floating-point
unit (FPU) for FPGAs to better utilize the die. They suggest duplicating the
data path for single precision arithmetic, and linking duplicated functional
blocks together to accommodate double precision. This leads to a greater
area utilization and delay improvement because of parallelizing. This ap-
proach complies with the IEEE 754 Standard and is easy to test and validate.
However, with this approach it is required to do changes to the hardware de-
sign and this may cause a longer time-to-market.
Another work by Chong et al. [6] propose custom FPUs for embedded
systems to utilize area and performance. A rapid design space exploration
was explored to balance between hardware-implemented and the software
emulated instructions. Data path merging was also proposed to utilize the
area. It means that the same components (for instance adders and multipli-
ers) can be used with different word lengths. The article shows that adding
more floating-point hardware does not necessarily result in a lower runtime,
and the delay associated with the additional hardware being greater than
the cycle count reduction. The advantages with these approaches are that it
complies with the IEEE 754 Standard. This makes it easy to validate and
test. The design space exploration can be used as a front-end to explore the
best solution for the system. The downside of the approach is that it requires
complicated changes to the floating-point unit design. A bit-alignment algo-
rithm is necessary to design a well working data path merging algorithm and
23
24 CHAPTER 3. RELATED WORK
this may cause a very complicated design.
Liang et al. [37] have outlined a floating-point unit generation approach,
which allows for the creation of a vast collection of floating point units with
differing throughput, latency and area characteristics. Given the constraints,
the algorithm chooses the proper implementation and architecture to create
the compliant floating point unit.
Galal et al. [38] present a method for creating a trade-off curve that can
be used to estimate the maximum floating-point performance given a set of
area and power constraints.
3.1 Bit-Width Optimisation for Fixed-and Floating-
Point
This section presents a method to optimize the bit-width of both fixed-
point and floating-point designs [39]. If Ui represent a floating-point number,
(−1)S ×M × 2E, where S is the sign bit, M is the mantissa and E is the
exponent. The precision in a floating-point number depends on the mantissa
bit-width (m) and the range depends on the exponent bit-width (e). The
error for both fixed-point and floating point is given in Equation 3.1. The l
represents the fraction length for fixed-point numbers. The calculated error
for floating-point numbers is represented in Equation 3.2. For further anal-
ysis the truncation rounding model is chosen. Round-to-nearest will give a
better error bound then truncation, but require additional hardware.
∆Ui =
{
Errflt(m) if Type = Float
Errfix(l) if Type = Fixed
(3.1)
Errflt(m) =
{
2−m × 2E if round-to-nearest
2−(m−1) × 2E if truncation (3.2)
The equation for calculating the mantissa bit-width m is represented in
Equation 3.3. EUi can be found by solving EUi = dlog2(|Ui|)e.
m ≥ EUi − dlog2(| ∆Ui |)e+ 1 (3.3)
The dynamic range of the operation is given by |max(Ui)/min(Ui)|, so
the exponent bit-width of Ui can be calculated with Equation 3.4. To make it
easier to understand this equation, a table containing the range with different
exponent bit-width is presented in Table 3.1. If the floating-point unit is
not supporting denormalized floating-point numbers the minimum exponent
value will increase with one.
3.1. BIT-WIDTH OPTIMISATION 25
e ≥ dlog2(|max(EUi)/min(EUi)|)e (3.4)
Table 3.1: Calculated Range for Floating-Point Numbers with Different Ex-
ponent Bit-Widths(e)
e bias Emax Emin 2
Emax 2Emin
2 1 2 -1 2 0.5
3 3 4 -3 16 0.125
4 7 8 -7 256 1/128
5 15 16 -15 65536 1/32768
6 31 32 -31 4.29E9 4.66E − 10
7 63 64 -63 1.84E19 1.08E − 19
8 127 128 -127 3.40E38 5.88E − 39
9 255 256 -255 1.16E77 1.73E − 77
10 511 512 -511 1.34E154 1.49E − 154
11 1023 1024 -1023 1.80E308 1.11E − 308
12 2047 2048 -2047 1.62E616 3.09E − 617
In the case of fixed-point, the range depends on the integer bit-width,
while the precision depends on the fraction bit-width. Consider the case
where Ui represents a fixed-point number, k is the number of integer bits
and l is the number of fraction bits. The integer bit-width is calculated
according to Equation 3.5 and the first twelve values of k is found in Table
3.2.
k ≥ dlog2(|max(Ui)/min(Ui)|)e (3.5)
Table 3.2: Calculated Range for Fixed-Point Numbers with Different Integer
Bit-Width(k)
k 1 2 3 4 5 6 7 8 9 10 11 12
Max integer value 1 3 7 15 31 63 127 255 511 1023 2047 4095
The precision of a fixed-point number depends on the fraction bit-width
and the error depending on the fraction bit-width is calculated with Equation
3.6. For further calculation the error using the truncation rounding model is
used.
Errfix(l) =
{
2−l if round-to-nearest
2−(l−1) if truncation
(3.6)
26 CHAPTER 3. RELATED WORK
From Equation 3.6 and | ∆Ui | expressed in Equation 3.1, the bit-width
of the fraction part is expressed with Equation 3.7.
l ≥ dlog2(| ∆Ui |)e+ 1 (3.7)
To better understand this optimization process for floating-point and
fixed-point numbers, an example is given below.
Max(Ui) = 200,∆Ui = 0.00005,Min(Ui) = 0.0001
e ≥ dlog2(|max(EUi)/min(EUi)|)e ≥ 5
m ≥ EUi − dlog2(| ∆Ui |)e+ 1 ≥ 8− dlog2(| 0.000005 |)e+ 1 ≥ 8 + 14 + 1 ≥ 23
Total bit = 29
k ≥ dlog2(|max(Ui)/min(Ui)|)e ≥ 8
l ≥ dlog2(| ∆Ui |)e+ 1 ≥ 15
Total bit = 23
In this example the total bit-width for floating-point numbers are greater
then fixed-point. However floating-point numbers has a bigger range for the
same amount of bit.
3.2 Minimizing Floating-Point Power Dissi-
pation via Bit-Width Reduction
Tong et al. [40] proposes four different ways to reduce power consumption.
By reducing the mantissa and exponent bit-widths the precision and range
is lowered, but the switching activity and the necessary normalizing shifting
will reduce. By changing the implied radix, e.g. from 2 to 4, a greater dy-
namic range is provided, but this leads to a lower density. This may result in
the normalization shifts being reduced. Finally the article suggests a simpli-
fication of rounding modes. Full support of rounding modes is very expensive
and some programs may achieve acceptable accuracy with a simple round-
ing algorithm. This article only explores the reduced power consumption
differing the exponent and mantissa bit-width.
The article uses four workloads to proof it’s results. Sphinx III is the
first workload. It is a CMU’s (Communication Management Unit) speech
recognition program based on fully continuous hidden Markov models. The
accuracy is estimated by dividing the number of words recognized correctly
3.2. MINIMIZING FLOATING-POINT POWER DISSIPATION 27
over the total numbers of words in the input set. Second is ALVINN. This
workload takes input from a video camera and a laser range finder to guide a
vehicle on the road. The accuracy is measured as a number of correct travel
directions. Third is the PCASYS, which is a pattern-level finger print clas-
sification program developed at NIST (National Institute of Standards and
Technology). The accuracy is measured as percentage error in putting the
image in the wrong class. The final workload is Bench22. This is a bench-
mark which wraps a random image and measures the accuracy by comparing
the wrapped image with the original.
In Figure 3.1 the accuracy for the different workloads varying the ex-
ponent and mantissa bit-width are showed. For this set of workload the
accuracy does not drop before the exponent bit-width is lower then seven
and mantissa bit-width is lower then 11.
Figure 3.1: Accuracy Compared with Various Exponent and Mantissa Bit-
Widths [40].
According to Figure 3.2 the latency and energy consumption per opera-
tion drops linear decreasing the operand bit-width.
This paper proposes four important ways to reduce the power consump-
tion in floating-point units and also concludes that the power consumption
in the unit highly depends on the operand bit-width. However, the area is
not considered in this article.
28 CHAPTER 3. RELATED WORK
Figure 3.2: Energy and Latency per Operation for Different Operand Bit-
Widths [40].
Chapter 4
Design Space Exploration
This chapter will discuss the optimization options for floating-point arith-
metic and choose which to implement and test. Then the necessary tools
and hardware, to get reliable results, will be analyzed.
4.1 Floating-Point Standard
The suggested optimization options for floating-point arithmetic in the article
about minimizing floating-point power dissipation in Subsection 3.2 are re-
ducing the bit-width, changing the radix and simplifying the rounding modes.
All of these options violates the IEEE standard discussed in Section 2.1, how-
ever great energy reductions is demonstrated by lowering the bit-width. This
thesis will expand the results by also analyzing the area when varying the bit-
width. The consequence of violating the IEEE standard is lack of portability,
however this thesis will consider configurable configurable floating-point units
that are tailored for a specific set of tasks. Therefore, a violation of the IEEE
standard will have a minor influence on the portability.
A set of exceptions, defined in the IEEE Standard 754, is specified in
Subsection 2.1.3. Implementing these will have an influence on the area and
most likely the power consumption. The exceptions representing zero and
infinity is needed to have a functional FPU, however the other can be avoided
by analyzing the applications before executing them. Only zero and infinity
will be be implemented, if optional.
The IEEE Standard 754 states a set of arithmetic operations an FPU
should support. According to the analysis of the testbenches in Section 2.8,
addition, subtraction, multiplication and division are the most frequently
used arithmetic floating-point operations. It is reasonable to believe that
this may apply for many other applications too. As a result will only these
29
30 CHAPTER 4. DESIGN SPACE EXPLORATION
arithmetic floating-point operations be implemented and tested.
The IEEE standard specifies conversion methods, e.g. the conversion
between integers and floating-point formats. These methods will not be
included in this thesis because of the limited time aspect on this thesis.
Included in the standard is also a set of rounding rules. To make design
less complex, the truncation rounding will be used, if optional.
4.2 Floating-Point Implementation
In Section 2.5 a selection of FPU implementations is discussed. The Logi-
CORE IP Floating-Point Operator discussed in Subsection 2.5.1 is optimized
for Xilinx FPGAs and since the SHMAC platform currently is implemented
on a Xilinx FPGA, will this design be implemented and tested. The bit-
width of the mantissa and exponent is user editable, and will produce results
to support the thesis. The FPU supports all exceptions specified in the
IEEE Standard 754, however only the necessary exceptions will be imple-
mented while testing. Both the synchronous and asynchronous designs will
be tested.
The single precision FPU by OpenCores discussed in Subsection 2.5.2
and the double precision FPU in Subsection 2.5.3 complies with the IEEE
754 Standard, and do not support variable bit-widths. These units will be
implemented, and hopefully strengthen the assumption that big gains can be
accomplished by varying the bit-widths.
If the SHMAC platform is going to be adapted for ASIC or an FPGA
from another vendor, it is important to have alternatives to the Xilinx FPU.
Therefore the floating-point library discussed in Subsection 2.5.4 will be im-
plemented and tested for different bit-widths. The library will be set up
according to Listing 2.1, only varying the exponent and mantissa bit-width.
The library supports guard bit and denormalized number. Guard bit can be
used to maintain precision in arithmetic, however when using this library,
guard bit and denormalized numbers will not be implemented to achieve
lower area and power consumption. The consequence of not supporting de-
normalized numbers is that the minimum exponent value is one less then
usual.
To compare the FPU designs with fixed-point the fixed-point library dis-
cussed in Subsection 2.5.5 will be implemented. The unit will be implemented
with 32 bit, 16 bit integer and 16 bit fraction, and the signed fixed-point for-
mat will be used. The software floating-point library in Subsection 2.5.6 will
also be used to compare with the hardware FPU designs.
A customizable floating-point unit will be designed and tested. It is
4.3. PROCESSING CORE 31
designed to be easy to expand and customize for your system regardless of
platform. The unit will only support addition, subtraction and multiplication
with normalized numbers. The divider will not be implemented because of a
limited time aspect on the thesis.
4.3 Processing Core
In Section 2.6 both the Amber 2 core and the OpenRISC are discussed. The
Amber 2 is the same core as implemented in the SHMAC platform. However,
this core does not have a compiler that supports hardware floating-point
operations, while the OpenRISC does.
To measure the speed-up, testbenches can be compiled and executed on
the processing cores with different FPUs and bit-widths. Another option is to
calculate the run time for each operation, including the read and write time
for the architecture. The number of operations performed in each testbench
is estimated and the performance gain can be calculated. In this thesis the
second option is chosen to have more time focusing on design and optimizing
of FPUs, instead of implementing a processing core.
4.4 FPGA
In Section 2.7 three different FPGAs are discussed. The Xilinx Virtex-5
FPGA in Subsection 2.7.1 is the same FPGA used on the SHMAC platform.
Implementing directly onto the SHMAC platform would be ideal, but in
addition to not supporting hardware FPU, many members of the project are
using it and the availability is low.
The two evaluation boards discussed in Subsection 2.7.2 and 2.7.3, were
available. The Spartan-6 LX evaluation board is able to measure the power
consumption in real-time, however the FPGA does not contain enough LUTs
to implement any of the discussed processing cores. The Xilinx Zynq-7000
SoC contains an Artix-7 FPGA. This FPGA has about the same properties as
the Virtex-5 and enough LUTs and memory to handle both cores. However,
since no processing core is implemented, there is a lack of motivation to
perform analysis using a physical FPGA. As a result, will all analysis be
done in software, using the tools discussed in Subsection 2.7.4, targeting the
Virtex-5 FPGA.
32 CHAPTER 4. DESIGN SPACE EXPLORATION
4.5 Testbenches
The benchmark discussed in Section 2.8 includes a good selection of test-
benches. The floating-point data available will be analyzed and the equations
in Section 3.1 will be used to find the optimal bit-width. In addition, will
the option of emulating parts of the resulting FPU in software be explored.
The data analyzed will be taken from the input and output files for each
testbench. However, arithmetic operations executed in the applications may
use numbers with higher range and accuracy than what is presented on input
and output. To compensate for this, an additional bit will be added to the
exponent and mantissa when calculating the bit-width.
Chapter 5
Implementation and Results
This chapter contains two sections. The first section explains how testing
is performed to make sure that all units are functional and tested with the
same parameters and variable. This will ensure that applicable results are
generated. The second section describes individually how each floating-point
unit (FPU) is designed and how well they perform on latency, area and power
consumption.
5.1 Test Plan
5.1.1 Functionality
To test if an FPU has the correct behavior, a Matlab function, in Appendix
A.1, is written. It generates a user specified number of random operands
with user specified exponent and mantissa bit-widths. The operands are
saved to file and executed in a simulator with the Xilinx FPU to generate
the correct results. An example of this testbench is listed in Appendix B.2.
This approach assumes that the Xilinx FPU has the correct behavior. Then
the same testbench is run with a different FPU and another file containing
it’s results are generated. Finally the two generated data files, containing the
results, are compared in a Matlab function, Appendix A.2. The functions
described above can also be used for fixed-point. To make sure the floating-
point operations are performed correctly a third Matlab function is created,
Appendix A.3. This function calculates the decimal value of a floating-point
number, so the user can check the operands and the result of each calculation.
33
34 CHAPTER 5. IMPLEMENTATION AND RESULTS
5.1.2 Performance
The synthesizing tool used is the XST by Xilinx and the optimization goal
is set for area. The input and output pins are placed randomly on the
FPGA. To generate a SAIF file, which describes the switching activity of
the design, all designs are simulated with 100 arithmetic operations with
50 µs to calculate these and a 100MHz clock is used. For all FPUs the
same testbench, only varying the bit-widths, is used. Each arithmetic unit,
addition, subtraction, multiplication and division, has the same work load.
According to Section 3.2 the accuracy of floating-point numbers drops
dramatic when exponent bit-width is less then seven and mantissa bit-width
is less then 11. As a result will the FPUs with configurable bit-width be
tested with exponent bit-width of eight and eleven and mantissa bit-width
of eleven, 23 and 52.
To find the best design the power consumption, area usage and speed
has to be considered. By mapping the design in Xilinx’s tool ISE, the power
analysis tool, XPower, can be used to simulate the expected power consump-
tion. XPower also measures the static power consumption. Since the design
is tested for FPGA the static power consumption will be the same for all
designs. However, when designing for ASIC, parts of the circuit may be turn
off and static power consumption is saved.
The speed of the systems is evaluated by analyzing the run time of each
arithmetic operation individually. The latency is measured by counting the
number of clock cycles from the unit is enabled and operands are presented,
until the result and ready signal are presented on the output. If some oper-
ations use different time with different operands, worst case will be applied.
The run times for asynchronous designs are dependent on the longest path
through the design. The longest path for all asynchronous designs tested is
shorter than the clock period used. This results in a total latency of one
clock cycle. However, the latency may vary dependent on the total system
size and the platform it is designed for.
5.2 Design and Performance
The Amber 2 core uses the double precision floating-point unit (FPU) dis-
cussed in Subsection 2.5.3. The port map for this unit, in the amber co-
processor design, is described in Listing 5.1. The port map should be kept
the same to more easily adapt to the co-processor. The bit-width can be
adjusted by only handling parts of the already implemented 64 bit registers.
5.2. DESIGN AND PERFORMANCE 35
Listing 5.1: Port Map of Double Precision FPU in Amber Co-Processor.
======================================
FPU − Double
======================================
a25 fpu doub le u fpu (
. c l k ( i c l k ) ,
. r s t ( i r s t ) ,
. enable ( fpu doub l e enab l e ) ,
. rmode ( fpu rmode ) ,
. fpu op ( fpu opcode ) ,
. opa ( f pu doub l e da t a i n a ) ,
. opb ( fpu doub l e da ta in b ) ,
. out ( fpu doub l e data out ) ,
. ready ( fpu doub le ready ) ,
. underf low ( fpu doub le under f l ow ) ,
. over f l ow ( fpu doub l e ove r f l ow ) ,
. i n exac t ( f pu doub l e i n exac t ) ,
. except ion ( fpu doub l e excep t i on ) ,
. i n v a l i d ( f pu doub l e i n va l i d )
) ;
5.2.1 LogiCORE IP Floating-Point Operator
The LogiCORE IP Floating-Point discussed in Subsection 2.5.1 was first
implemented. The top-level design using the Xilinx FPU is described in
Figure 5.1 and a bigger image can be found in Appendix C, while the VHDL
code can be found in Appendix B.1.1.
Figure 5.1: Diagram of Top-Level Design for Floating-Point Implementation.
36 CHAPTER 5. IMPLEMENTATION AND RESULTS
The top-level design has the following input: a and b is the operands and
can contain from eight to 80 bit. Next is the clock and reset which resets the
system and sets the speed. The clock is not connected to the floating-point
arithmetic units when an asynchronous design is used. The new data signal
is set high when new operands are present. This signal has to be pulled low
before the next operation is to be performed. The operation signal tells the
unit what arithmetic operation to perform. Zero is for addition, one is for
subtraction, two is for multiplication and three is for division. It is possible
for the user to specify if underflow, overflow, invalid operation and divided
by zero is present, however this will not be implemented while testing.
The signals, except the operands, are routed to a control unit that makes
sure the arithmetic units have the correct input. It also makes sure that when
one operation is performed, the other units are disabled using clock gating
and the output is set to zero to lower the power consumption. The clock
enable signal is not connected using asynchronous designs. The control unit
is also controlled by a ready signal, which signals to turn of the arithmetic
units when it is done. A flow diagram of this system is presented in Figure
5.2.
The final unit in the FPU design controls the ready signal and makes
sure that the correct output is presented. This unit works as a multiplexer
and only arithmetic units which presents a ready signal are allowed to send
output data.
The resulting size, latency and power consumption for the Xilinx FPUs
are listed in Table 5.1. The FPU has been tested with an asynchronous
design with and without DSP slices, and a synchronous design with DSP
slices. The difference in size between the asynchronous and synchronous
design with DSP is small. However, the difference between the asynchronous
designs is noticeable.
The power consumption for the different implementations is listed in the
same order as for size. The power consumption for the clock in the asyn-
chronous designs is having a minor influence on the total dynamic power
consumption, compared with the synchronous. This results in the dynamic
power consumption for asynchronous designs being lower than synchronous.
The difference between the dynamic power consumptions for asynchronous
designs with and without DSPs is quite small. This indicates that DSP slices
have a bigger influence on the size than power consumption.
5.2. DESIGN AND PERFORMANCE 37
Figure 5.2: Flow Diagram from Input is Present to Output is Produced.
38 CHAPTER 5. IMPLEMENTATION AND RESULTS
Table 5.1: Latency, Size and Power Consumption for Xilinx IP
e m Operation Latency
Size
Comments
DSPs LUTs Total DSPs Total LUTs
8 11
add/sub 1 0 212
2 562
Async
mult 1 2 63 DSP
div 1 0 256 (1)
add/sub 1 0 212
0 757
Async
mult 1 0 266 No DSP
div 1 0 256 (2)
add/sub 8 0 211
2 584
Sync
mult 7 2 45 DSP
div 16 0 238 (3)
8 23
add/sub 1 2 245
5 1111
Async
mult 1 3 73 DSP
div 1 0 755 (4)
add/sub 1 0 395
0 1877
Async
mult 1 0 834 No DSP
div 1 0 733 (5)
add/sub 11 2 257
5 1234
Sync
mult 8 3 91 DSP
div 28 0 740 (6)
11 52
add/sub 1 3 705
14 4082
Async
mult 1 11 157 DSP
div 1 0 3150 (7)
add/sub 1 0 715
0 6633
Async
mult 1 0 3569 No DSP
div 1 0 3150 (8)
add/sub 14 3 716
14 4314
Sync
mult 15 11 114 DSP
div 57 0 3090 (9)
e m
Power consumption (mW)
Comments
Clocks Logic Signals IOs DSPs Static Dynamic
8 11
4.47 0.21 0.59 2.19 0.03 3,294.10 7.76 (1)
6.29 0.29 0.49 2.17 0 3,294.10 9.23 (2)
46.42 0.42 0.71 3.78 0.04 3,294.10 51.36 (3)
8 23
7.27 0.49 1.49 3.19 0.12 3,294.10 12.55 (4)
7.88 0.86 1.41 3.16 0 3294.10 13.31 (5)
90.48 0.75 2.29 7.28 0.09 3294.10 100.89 (6)
11 52
7.76 2.07 4.74 5.92 0.32 3294.10 20.81 (7)
12.77 2.71 4.97 7.02 0 3294.10 27.48 (8)
144.25 3.59 4.40 5.03 0.28 3294.10 157.55 (9)
5.2. DESIGN AND PERFORMANCE 39
5.2.2 OpenCores Single Precision Floating-Point Unit
The entity for the single precision FPU by OpenCores, listed in Listing 5.2,
is about the same as the double precision. As a result, it is easy to adapt
this unit to the coprocessor.
In Table 5.2 the size, latency and power consumption are presented. It
is worth noticing that the size of the multiplication unit is much higher then
the other arithmetic units.
Listing 5.2: Entity for Single Precision Floating-Point Unit by OpenCores
en t i t y fpu i s
port (
c l k i : in s t d l o g i c ;
opa i : in s t d l o g i c v e c t o r (FP WIDTH−1 downto 0)
;
opb i : in s t d l o g i c v e c t o r (FP WIDTH−1 downto
0) ;
f pu op i : in s t d l o g i c v e c t o r (2 downto 0) ;
rmode i : in s t d l o g i c v e c t o r (1 downto 0) ;
output o : out s t d l o g i c v e c t o r (FP WIDTH−1 downto
0) ;
s t a r t i : in s t d l o g i c ;
ready o : out s t d l o g i c ;
i n e o : out s t d l o g i c ;
ove r f l ow o : out s t d l o g i c ;
under f low o : out s t d l o g i c ;
d i v z e r o o : out s t d l o g i c ;
i n f o : out s t d l o g i c ;
z e r o o : out s t d l o g i c ;
qnan o : out s t d l o g i c ;
snan o : out s t d l o g i c
) ;
end fpu ;
5.2.3 OpenCores Double Precision Floating-Point Unit
This FPU is already implemented on the Amber 2 core. However, when
simulating and implementing the unit, the software finds numeric operations
that are not supported in Xilinx. In addition to that, when placing and
routing the design, Xilinx software finds that the design is unroutable. As
a result, there is no power analysis for this FPU. The latency and size is
described in Table 5.3.
40 CHAPTER 5. IMPLEMENTATION AND RESULTS
Table 5.2: Latency, Size and Power Consumption for OpenCores Single Pre-
cision Floating-Point Unit
e m Operation Latency
Size
DSPs LUTs Total DSPs Total LUTs
8 23
add/sub 8 0 1174
0 5892mult 13 0 3138
div 35 0 1424
Power consumption (mW)
Clocks Logic Signals IOs DSPs Static Total DPC
84.61 23.02 82.43 21.18 0 3,294.10 211.25
Table 5.3: Latency and Size for OpenCores Double Precision Floating-Point
Unit
e m Operation Latency
Size
DSPs LUTs
11 52
add/sub 21/26
10 10457mult 29
div 71
5.2.4 Floating-Point Library
This library is used as described in Subsection 2.5.4. A functional FPU
design using this library is shown in Appendix B.1.4. This design has the
same input and outputs as the Xilinx design in Figure 5.1, without the extra
exceptions. Since the library is asynchronous, no advanced state machine is
needed. The only state machine implemented is to set the ready signal one
clock cycle after the operation nd is presented. The numbers of clock cycles
the operations takes may vary for different platforms, so this state machine
only works for simulation.
The latency, size and power consumption is presented in Table 5.4. When
testing the library, it is difficult to isolate each arithmetic operation. As a
result, only the total size is presented.
5.2.5 Fixed-Point Library
This library is designed according to descriptions in Subsection 2.5.5. A
fixed-point design using the library is presented in Appendix B.1.5. The
design is quite similar to the top-level design of the floating-point library.
However, the Xilinx synthesizer did not allow the division operand. As a
result is division not implemented in the design. The test results for latency,
5.2. DESIGN AND PERFORMANCE 41
Table 5.4: Latency, Size and Power Consumption for Floating-Point Library
e m Operation Latency
Size
Total DSPs Total LUTs
8 11
add/sub
1 1 1319mult
div
8 23
add/sub
1 2 4159mult
div
11 52
add/sub
1 15 15898mult
div
e m
Power consumption (mW)
Clocks Logic Signals IOs DSPs Static Total DPC
8 11 6.25 1.73 4.49 17.96 0.02 3,294.10 30.45
8 23 12.44 5.81 10.54 30.24 0.02 3,294.10 59.04
11 52 14.49 39.27 69.22 48.72 0.28 3294.10 171.97
size and power consumption are presented in Table 5.5.
Table 5.5: Latency, Size and Power Consumption for Fixed-Point Library
k l Operation Latency
Size
DSPs LUTs
16 16
add/sub
1
4 37mult
div -
Power consumption (mW)
Clocks Logic Signals IOs DSPs Static Total DPC
14.79 0.21 2.18 20.49 0.05 3,294.10 37.72
5.2.6 Configurable Floating-Point Arithmetic Design
This unit is designed to better understand how floating-point arithmetic
works in hardware and have an additional FPU that is more configurable
for the user. It contains an adder, a subtractor and a multiplier. The algo-
rithms are based on the algorithms in Section 2.2 and Subsection 2.5.2, which
describes the single precision FPU from OpenCores. In later occasions will
this unit be referred to as the ”configurable design”.
42 CHAPTER 5. IMPLEMENTATION AND RESULTS
The configurable design is using a truncation rounding mode. This results
in a minor error when comparing the results with the solution. The multi-
plication mode is having a bug that causes the wrong result to be generated
when very big numbers are multiplied with very small. The floating-point
unit do not handle exceptions. Otherwise the FPU is functional. The bugs
and errors will be commented in the future work section later in this thesis.
Adder and Subtracter
When adding or subtracting two operands it is important to always know
which operand that is greatest. Otherwise you are risking underflow when
subtracting. One other aspect is that the arithmetic unit has to handle both
positive and negative values. This leads to the unit handling the following
situations: a + b,−a + b, a + (−b),−a + (−b), a − b,−a − b, a − (−b),−a −
(−b). This is described in the first two steps in the flow chart in Figure
5.3. Next, the size of each operand need to be measured to know which sign
the result will have. Then the difference between the exponents have to be
calculated and the smallest mantissa right shifted the same number as this
difference. Now the mantissas can be added or subtracted. Next step is to
round the resulting mantissa. Before the result is presented, any exceptions
occurring have to be signaled, and the result has to be normalized. One
way to normalize the result is to find the leading zero and then left shift
the mantissa so it is represented as a decimal number between one and two,
and then subtracting the exponent with the number of places shifted. This
is typically an area consuming task, which expands when the mantissa bit-
width is bigger. The VHDL code for this algorithm is presented in Appendix
B.1.6.
Multiplier
The algorithm for multiplying two floating-point numbers are easier then the
algorithm for addition and subtraction. The flow chart for this algorithm is
presented in Figure 5.4 and the VHDL code is presented in Appendix B.1.7.
The first step of the algorithm is to multiply the mantissas together and add
the exponents. To prevent the bias from being added twice, it has to be
subtracted. The next step is to find the resulting sign and normalize the
result. A big difference between this algorithm and the adder and subtractor
is that the normalization of the result is an easier task. If both operands are
normalized, which means that their mantissa value is between one and two,
the resulting mantissa value is between one and four. If the operands are
denormalized, a more advanced algorithm is needed to normalize the result.
5.2. DESIGN AND PERFORMANCE 43
Figure 5.3: Flow Chart of Addition and Subtraction with Floating-Point
Numbers.
44 CHAPTER 5. IMPLEMENTATION AND RESULTS
Figure 5.4: Flow Chart of Multiplication with Floating-Point Numbers.
Results
The resulting latency, area and power consumption is shown in Table 5.6.
Notice that no divider is implementing and this will affect the total number
of LUTs and DSPs.
5.2. DESIGN AND PERFORMANCE 45
Table 5.6: Latency, Size and Power Consumption for Configurable Floating-
Point Unit
e m Operation Latency
Size
DSPs LUTs Total DSPs Total LUTs
8 11
add/sub 8 0 296
1 622mult 5 1 101
div - - -
8 23
add/sub 8 0 1155
2 1353mult 5 2 161
div - - -
11 52
add/sub 8 0 3814
15 4413mult 5 15 530
div - - -
e m
Power consumption (mW)
Clocks Logic Signals IOs DSPs Static Total DPC
8 11 12.12 0.06 0.35 2.20 0 3,294.10 14.72
8 23 29.82 0.08 0.71 3.28 0 3,294.10 33.89
11 52 49.12 0.20 2.89 3.89 0 3,294.10 56.09
46 CHAPTER 5. IMPLEMENTATION AND RESULTS
5.3 Precision and Range in Testbenches
To evaluate the testbenches and find the required range and precision, all
input and output files were analyzed with Matlab scripts, and the largest
and smallest number along with the highest precision is found. These are
presented in Table 5.7 along with the calculated values for exponent (e) and
mantissa (m) bit-width. Also represented are the bit-widths for fixed-point
integer (k) and fraction (l). These numbers have not been added with one
for compensating. The exponent values and integer values have been found
by using Table 3.1 and 3.2, while the other calculations can be found in
Appendix D.
Table 5.7: Range and Precision Analysis of Benchmarks
Benchmark Largest Smallest Highest precision e m k l
177.mesa 9.8658 3.21E − 4 6 5 18 4 20
179.art 99.2831228 28.3296161 7 4 31 7 24
183.equake 32.6156 9.0400E − 35 37 8 20 6 123
188.ammp 20421.656321 0.2290 6 5 35 15 20
Chapter 6
Discussion
This chapter will discuss the results presented in Chapter 5 and analyze the
gains of consistently choosing a floating-point unit (FPU) that suits your
applications. The configurable design will be mentioned, but not compared
with the others, since it do not include a floating-point divider. The double
precision FPU by OpenCores will be compared for size and latency, but
not for power consumption since this data is not available. The chapter is
separated in three sections. Section 6.1 will compare the size and latency for
different FPUs and analyze the gains of varying the bit-width. Section 6.2
compares the power consumption for different FPUs and Section 6.3 describes
how to adapt an FPU to software.
6.1 Size and Latency Analysis
In the previous chapter, the resulting size for both Xilinx FPUs using asyn-
chronous and synchronous design where presented. As discussed in Section
2.4 is asynchronous design independent of the clock signal, and can have
lower power consumption. However, the extra ”handshaking” signals, which
replace the clock, use extra logic. This is not applicable for the Xilinx FPUs.
According to Figure 6.1 the asynchronous design is smaller then the syn-
chronous design using the same amount of DSPs. When increasing the bit-
width the size increases exponentially and the usage of DSPs also increases.
Comparing the asynchronous design with DSP and without DSP, the impor-
tance of exploiting the DSP slices are easily visualized. For a total bit-width
of 64, the asynchronous FPU with DSP is using approximately 38% less
LUTs. For further analysis with the other FPUs, the asynchronous FPU
with DSP will be used.
In Table 6.1 the size and latency for different FPUs are presented. For
47
48 CHAPTER 6. DISCUSSION
Figure 6.1: Number of LUTs for Different Xilinx Floating-Point Unit Designs.
all exponent and mantissa bit-widths the Xilinx FPU has a much lower area
usage then the others. Comparing the Xilinx FPU with the smallest alterna-
tive, using a total bit-width of 20, approximately 57% of the LUTs is saved.
For a total bit-width of 32, the Xilinx FPU uses approximately 73% less
LUTs, and for 64 bit the Xilinx FPU uses 61% less LUTs. By comparing the
smallest FPU with the largest overall, a total of approximately 96% of the
LUTs can be saved.
The latency for asynchronous design is less then synchronous. By using a
Xilinx FPU or the floating-point library instead of the OpenCores FPU for
single precision, 87% less time is used when adding and subtracting, while
for multiplication and division 92% and 97% are saved. For double precision,
compared with the OpenCores FPU, 95% less time is used for addition and
96% less time for subtraction. For multiplication and division the time saved
is 98%.
Implementing an FPU on an ASIC or an FPGA from another FPGA ven-
dor, a combination of the floating-point library and the configurable design
can be a good choice. If there is no need for configurability in the system,
latency is no big concern and high precision and range is required, the double
6.2. POWER ANALYSIS 49
precision FPU from OpenCores is a good alternative.
If an FPU is not suited for the system, a fixed-point unit may be. The
latency is about the same as for the asynchronous FPUs and the numbers of
LUTs for the fixed-point unit is only approximately 7% of the smallest FPU.
Table 6.1: Size and Latency for Different Floating-Point Units
FPU e m Operation Latency
Size
DSPs LUTs Sum DSPs Sum LUTs
Xilinx
8 11
add/sub 1 0 212
2 562mult 1 2 63
div 1 0 256
Floating- add/sub
1 - - 1 1319Point mult
Library div
Xilinx
8 23
add/sub 1 2 245
5 1111mult 1 3 73
div 1 0 755
Floating- add/sub
1 - - 2 4159Point mult
Library div
OpenCores
add/sub 8 0 1174
0 5892mult 13 0 3138
div 35 0 1424
Xilinx
11 52
add/sub 1 3 705
14 4082mult 1 11 157
div 1 0 3150
Floating- add/sub
1 - - 15 15898Point mult
Library div
OpenCores
add/sub 21/26
- - 10 10457mult 29
div 71
6.2 Power Analysis
In Figure 6.2 the dynamic power consumption for Xilinx FPUs designed
asynchronous with and without DSP, and synchronous with DSP, all varying
the bit-width, is shown. The asynchronous designs are using less power then
synchronous, and the reason is that no clock is present in the asynchronous
designs. As discussed in Section 2.7 can DSP slices result in lower power
consumption. This results in the asynchronous design using DSPs having
the lowest power consumption. Also, the increase in power is more linear
50 CHAPTER 6. DISCUSSION
using DSPs then without. For further analysis, comparing with the other
FPUs, the Xilinx FPU with DSPs will be used.
Figure 6.2: Dynamic Power Consumption with Different Xilinx Floating-
Point Unit Designs.
Table 6.2: Power Consumption for Different Floating-Point Units
FPU e m
Dynamic Power
Consumption (mW)
Xilinx
8 11
7.76
FP Library 30.45
Xilinx
8 23
12.55
FP Library 59.04
OpenCores 211.25
Xilinx
11 52
20.81
FP Library 171.97
In Table 6.2 the dynamic power consumption for different FPUs is de-
scribed. For all bit-widths the dynamic power consumption is less for the Xil-
6.3. OPTIMIZATION FOR SOFTWARE 51
inx FPU, than the others. Using a total bit-width of 20, the Xilinx FPU is us-
ing approximately 75% less dynamic power then the other alternative. Using
single precision, a total of 78% can be saved compared with the floating-point
library and 94% compared with the FPU by OpenCores. For double precision
a total of 88% dynamic power consumption can be saved. For single preci-
sion the floating-point library is using 72% less dynamic power consumption
than the FPU by OpenCores. By comparing the least power hungry FPU
with the most, a total of 96% dynamic power consumption can be saved. The
floating-point library is a good alternative when designing for an FPGA from
another vendor or an ASIC. Comparing the floating-point library with total
bit-width of 20 and the single precision FPU by OpenCores, 86% dynamic
power can be saved. The configurable design is also having good results for
power consumption. Since the unit is not having a divider, a combination
with the floating-point library may be a good alternative designing for an
ASIC or another FPGA.
6.3 Optimization for Software
In the two previous sections the area and power consumption for different
FPUs are described. Optimizing the FPU according to the needed preci-
sion and range can give a much better dynamic power consumption and
size. In Section 5.3 the range and precision needed for the four benchmarks
were evaluated. It is reasonable to believe that the benchmarks 177.mesa
and 183.equake are using single precision floating-point numbers, while the
179.art and 188.ammp are using double precision. After compensating with
the extra exponent and mantissa bit, a total of seven bit can be saved for the
177.mesa testbench, 26 bit can be saved for the 179.art testbench, one bit
for the 183.equake and 21 bit for the 188.ammp. The fixed-point bit-width
is less than the floating-point bit-width for the 179.art testbench and the
188.ammp testbench.
According to Table 2.4 in Section 2.8 there is a big difference in the fre-
quency for the different operations. Common for all benchmarks are that
dividing is the least used operation. In Figure 6.3 the percentage of the total
size of each asynchronous arithmetic unit without DSP slices is presented.
This figure shows that the total area occupied by the multiplier and divider
grows when the bit-width increases. As a result, it is worth analyzing the
gains of emulating division in software. According to Table 2.3 in Subsec-
tion 2.5.6 the latency for emulating division in software is 2494 clock cycles
without any caches and 145 clock cycles with caches. These latencies are
for single precision floating-point arithmetic. The extra overhead of using a
52 CHAPTER 6. DISCUSSION
Figure 6.3: Cake Diagram of the Differences Between the Xilinx IP Arith-
metic for Different Bit-Width with no DSP Usage.
single precision FPU in the Amber 2 co-processor, described in Subsection
2.6.1, is eight clock cycles for loading data and four clock cycles for storing
data. The latency and size for the asynchronous Xilinx FPU will be used.
This gives the following calculations:
Emulating division:
Latency/operation =
{
2494 if no caches
145 if caches
Area = 0
Hardware division:
Latency/operations = 1 + 8 + 4 = 13
Area = 733
If the processing core is designed without caches, the floating-point divi-
sion is 99% faster then the software emulated division. Otherwise it is 91%
6.3. OPTIMIZATION FOR SOFTWARE 53
faster. If the system design is not time critical and a small hardware design
is preferred, emulating floating-point division may be an option. Another op-
tion is to use the fixed-point library. The total size of the fixed-point library
is 95% less then only the single precision divider.
54 CHAPTER 6. DISCUSSION
Chapter 7
Conclusion
This thesis implements and tests configurable floating-point unit (FPU) for
the SHMAC platform. These FPUs are useful for applications with a limited
range and precision and when only parts of the comprehensive IEEE Stan-
dard 754 are needed. Software is analyzed and the bit-width is optimized to
reduce area and power consumption.
An accurate area, latency and power analysis is done for different FPUs
with different bit-width to enlighten the user of the gains that can be ac-
complished. The analysis shows that for Xilinx FPGAs the total area and
dynamic power consumption can be reduced by up to 96%. For, ASIC and
other platforms the reduced area and dynamic power consumption are up to
91% and 82%.
For the applications tested, a maximum of 33% of the bit-width for the
floating-point numbers are unnecessary, and removing these leads to great
performance and area gains. By moving parts of the FPU to software, even
greater gains can be accomplished.
By choosing the proper FPU, 95% of the clock cycles can be reduced for
floating-point addition. For subtraction and multiplication 96% of the clock
cycles are reduces, while for division 98% of the clock cycles are reduced.
Future Work
This section contains work that was not done in this project because of time
constrains or limitations in the SHMAC platform. The most important future
work is the implementation and test of the FPUs on the SHMAC Platform.
55
56 CHAPTER 7. CONCLUSION
Implement and Test the FPUs on the SHMAC Platform
To make sure that the different FPUs adapt to the SHMAC platform it is
necessary to implement and test the designs. Since the co-processor in Amber
Core has been modified during the spring of 2014 the FPUs and co-processor
have never been tested together.
Design a Complier for Amber Core that Supports Hard-
ware FPU
As of today, no SHMAC compiler supports hardware FPU. However, this
subject has been studied during the spring of 2014 and there may only be
small adjustments needed to complete.
Continue Design and Verification of the Configurable
FPU
The configurable FPU designed during this project does not include a divider
and the multiplication unit is generating the wrong results when multiplying
very big numbers with very small To be able to fully replace the current
FPU implemented on the Amber Core, this have to be fixed. In addition,
does the configurable FPU not support denormalized numbers or exceptions
like underflow and overflow. This is not critical functionality, but may be
included as to support the reconfigurability of the unit.
Run-Time Reconfiguration FPU
Run-time reconfiguration for FPGA designs is an increasingly important re-
quirement for many markets. By having a run-time reconfigurable FPU, the
unit can adapt, in real-time, to different applications.
Bibliography
[1] S. Borkar, “Thousand Core Chips-A Technology Perspective,” Proceed-
ings of the 44th annual Design Automation Conference, pp. 746–749,
June 2007.
[2] S. Borkar and A. A. Chien, “The Future of Microprocessors,” Commu-
nications of the ACM, vol. 54, pp. 67–77, May 2011.
[3] H. Esmaeilzadeh, E. Blem, R. S. Amant, K. Sankaralingam, and
D. Burger, “Dark Silicon and the End of Multicore Scaling,” SIGARCH
Comput. Archit. News, vol. 39, no. 3, pp. 365–376, 2011.
[4] “Single-ISA Heterogeneous MAny-core Computer (SHMAC) Project
Plan,” Oct. 2013.
[5] T. A. Rodolfo, N. L. V. Calazans, and F. G. Moraes, “Floating Point
Hardware for Embedded Processors in FPGAs: Design Space Explo-
ration for Performance and Area,” International Conference on Recon-
figurable Computing and FPGAs, pp. 24–29, 2009.
[6] Y. J. Chong and S. Parameswaran, “Custom Floating-Point Unit Gen-
eration for Embedded Systems,” Computer-Aided Design of Integrated
Circuits and Systems, IEEE Transactions on, vol. 28, no. 5.
[7] “IEEE Standard for Floating-Point Arithmetic,” Aug. 2008. IEEE Std
754 TM-2008.
[8] J. Al-Eryani, “Floating Point Unit.” http://opencores.org/websvn,
filedetails?repname=fpu100&path=%2Ffpu100%2Ftrunk%2Fdoc%
2FFPU_doc.pdf. [Online; accessed 20-Jan-2014].
[9] ARM EABI, Sourcery G++ Lite, 2010.
[10] “Chapter 7 – floating point arithmetic.” http://pages.cs.wisc.edu/
~smoler/x86text/lect.notes/arith.flpt.html. [Online; accessed
25-May-2014].
57
58 BIBLIOGRAPHY
[11] “Programming languages, their environments and system software inter-
faces — Extensions for the programming language C to support embed-
ded processors,” Apr. 2003. http://www.open-std.org/jtc1/sc22/
wg14/www/docs/n1005.pdf ”[Online; accessed 16-Jan-2014]”.
[12] Xilinx, LogiCORE IP Floating-PointOperator, 5.0 ed., Mar.
2011. http://www.xilinx.com/support/documentation/ip_
documentation/floating_point_ds335.pdf.
[13] C. V. Berkel, M. B. Josephs, and S. M. Nowick, “Applications of Asyn-
chronous Circuits,” Proceeding of The IEEE, vol. 87, Feb 1999.
[14] D. Lundgren, Double Precision Floating Point Core VHDL.
opencores.org, Feb. 2010. http://opencores.org/websvn,
filedetails?repname=openrisc&path=%2Fopenrisc%2Ftrunk%
2Fdocs%2Fopenrisc-arch-1.0-rev0.pdf, ”[Online; accessed 18-May-
2014]”.
[15] D. Bishop, “Floating point package user’s guide.” http://www.vhdl.
org/fphdl/Float_ug.pdf, Oct. [Online; accessed 27-May-2014].
[16] D. Bishop, “Fixed point package user’s guide.” http://www.eda.org/
fphdl/Fixed_ug.pdf, Oct. 2013. [Online; accessed 7-Feb-2014].
[17] H. O. Wikene, “Benchmarking SHMAC,” Master’s thesis, Norwegian
University of Science and Technology, 2013.
[18] C. Santifort, Amber Open Source Project, Amber 2 Core Specification,
May 2013. http://opencores.org/websvn,filedetails?repname=
amber&path=%2Famber%2Ftrunk%2Fdoc%2Famber-core.pdf, ”[Online;
accessed 3-Mar-2014]”.
[19] C. Santifort, Amber Open Source Project, Amber Project User Guide,
May 2013. http://opencores.org/websvn,filedetails?repname=
amber&path=%2Famber%2Ftrunk%2Fdoc%2Famber-user-guide.pdf,
”[Online; accessed 3-Mar-2014]”.
[20] OpenCores Organization, Wishbone B4, WISHBONE System-on-Chip
(SoC)Interconnection Architecturefor Portable IP Cores, 2010. http:
//cdn.opencores.org/downloads/wbspec_b4.pdf, ”[Online; accessed
3-Mar-2014]”.
[21] J. D. Knutsen, “Implementing a SHMAC FPU Tile,” Master’s thesis,
Norwegian University of Science and Technology, 2014.
BIBLIOGRAPHY 59
[22] M. L. Teilg˚ard, “Integration of Hardware Accelerators on the SHMAC
Platform,” Master’s thesis, Norwegian University of Science and Tech-
nology, 2014.
[23] OpenCores Organization, OpenRISC 1000 Architecture Manual, 2012.
http://opencores.org/websvn,filedetails?repname=openrisc&
path=%2Fopenrisc%2Ftrunk%2Fdocs%2Fopenrisc-arch-1.0-rev0.
pdf, ”[Online; accessed 17-Mar-2014]”.
[24] OpenCores Organization, ORPSoC User Guide, 2011.
[25] Embecosm Limited, Or1ksim User Guide, July 2012.
[26] Xilinx, Virtex-5 Family Overview, 5.0 ed., Feb. 2009. http://
www.xilinx.com/support/documentation/data_sheets/ds100.pdf,
”[Online; accessed 6-Feb-2014]”.
[27] Avnet Electronics Marketing, Xilinx R©Spartan R©-6 LX 16, Evaluation
Kit, User Guide, 1.00 ed., Aug. 2010.
[28] ZedBoard.org, ZedBoard (Zynq R©Evaluation and Development Hardware
User’s Guide, 2.2 ed., Jan. 2014.
[29] Xilinx, ISE Design Suite 14: Release Notes, Installation, and Licensing,
Oct. 2013.
[30] Xilinx, XST User Guide, v 11.3 ed., Sept. 2009.
[31] Xilinx, ISim User Guide, 14.3 ed., Oct. 2012.
[32] Xilinx, Xilinx Power Tools Tutorial, 14.5 ed., Mar. 2013.
[33] “Standard Performance Evaluation Corporation.” http://www.spec.
org. [Online; accessed 08-May-2014].
[34] “The Physical Effects of Reliability Simulator.” http://www.persim.
org. [Online; accessed 08-May-2014].
[35] D. Defour, “Collapsing floating-point operations,” Universite de Perpig-
nan, 2012.
[36] Y. J. Chong and S. Parameswaran, “Configurable Multimode Embedded
Floating-Point Units for FPGAs,” Very Large Scale Integration (VLSI)
Systems, IEEE Transactions on, vol. 19, pp. 2033–2044, Sept. 2011.
60 BIBLIOGRAPHY
[37] J. Liang, R. Tessier, and O. Mencer, “Floating Point Unit Generation
and Evaluation for FPGAs,” Field-Programmable Custom Computing
Machines, 2003. FCCM 2003. 11th Annual IEEE Symposium, pp. 185–
194, Apr. 2003.
[38] S. Galal and M. Horowitz, “Energy-Efficient Floationg-Point Unit De-
sign,” Computers, IEEE Transactions on, vol. 60, pp. 913–922, July
2011.
[39] A. A. Gaffar, O. Mencer, W. Luk, and P. Y. Cheung, “Unifying Bit-
width Optimisation for Fixed-point and Floating-point Designs,” Field-
Programmable Custom Computing Machines, 2004. FCCM 2004. 12th
Annual IEEE Symposium, pp. 79–88, Apr. 2004.
[40] Y. F. Tong, R. A. Rutenbar, , and D. F. Nagle, “Minimizing Floating-
Point Power Dissipation Via Bit-Width Reduction,” Power-Driven Mi-
croarchitecture Workshop in conjunction with the 25th International
Symposium on Computer Architecture, 1998.
Appendices
61

Appendix A
Matlab Code
A.1 Floating-Point Unit Testbench Genera-
tor
1 function r e s u l t = generateTestbench (path , exponent length ,
mant i s sa l ength , samples num )
2 f i l e t e s t b e n c h = fopen (path , ’w ’ ) ;
3
4 for i =1: samples num ,
5 % Need t h i s cause Matlab does not handle so b i g numbers
6 i f exponent l ength+mant i s sa l ength+1 > 52
7 a 1 = randi ( [ 0 1 ] , 1 , exponent l ength+mant i s sa l ength
+1) ;
8 a 1 = num2str( a 1 ( 1 , : ) ) ;
9 a = regexprep ( a 1 , ’ [ ˆ\w ’ ’ ] ’ , ’ ’ ) ;
10
11 b 1 = randi ( [ 0 1 ] , 1 , exponent l ength+mant i s sa l ength
+1) ;
12 b 1 = num2str( b 1 ( 1 , : ) ) ;
13 b = regexprep ( b 1 , ’ [ ˆ\w ’ ’ ] ’ , ’ ’ ) ; % Remove a l l spaces
14
15 fpr intf ( f i l e t e s t b e n c h , ’%s %s \n ’ , a , b ) ;
16 else
17 range = 2ˆ( mant i s sa l ength+exponent l ength+1) ;
18 a = randi ( range , 1 ) ;
19 b = randi ( range , 1 ) ;
20 a b in = dec2bin (a , exponent l ength + mant i s sa l ength
+ 1) ;
21 b bin = dec2bin (b , exponent l ength + mant i s sa l ength
+ 1) ;
22 fpr intf ( f i l e t e s t b e n c h , ’%s %s \n ’ , a bin , b bin ) ;
23 end
24 end
63
64 APPENDIX A. MATLAB CODE
25
26 r e s u l t = ’ Generation Complete ! ’ ;
27
28 fc lose ( f i l e t e s t b e n c h ) ;
29 end
A.2. FLOATING-POINT UNIT TESTBENCH CHECK 65
A.2 Floating-Point Unit Testbench Check
1 function r e s u l t = compareResults ( r e su l t 1 , r e su l t 2 , samples num )
2 f i l e 1 = fopen ( r e su l t 1 , ’ r ’ ) ;
3 f i l e 2 = fopen ( r e su l t 2 , ’ r ’ ) ;
4
5 log = fopen ( ’ l og . txt ’ , ’w ’ ) ;
6
7 for i =1: samples num ,
8 f i l e 1 l i n e = fget l ( f i l e 1 ) ;
9 va lues1 = sscanf ( f i l e 1 l i n e , ’%x %d ’ ) ;
10 f i l e 2 l i n e = fget l ( f i l e 2 ) ;
11 va lues2 = sscanf ( f i l e 2 l i n e , ’%x %d ’ ) ;
12
13 i f ( va lues1 (2 ) ˜= va lues2 (2 ) )
14 s t r = sprintf ( ’Wrong operat i on in l i n e %d\n ’ , i ) ;
15 fpr intf ( log , ’%s ’ , s t r ) ;
16 end
17 i f ( va lues1 (1 ) ˜= va lues2 (1 ) )
18 s t r = sprintf ( ’ Various r e s u l t s in l i n e %d\n ’ , i ) ;
19 fpr intf ( log , ’%s ’ , s t r ) ;
20 end
21 end
22 fc lose ( f i l e 1 ) ;
23 fc lose ( f i l e 2 ) ;
24 fc lose ( log ) ;
25
26 r e s u l t = ’Comparing Resu l t s Complete ! ’ ;
27 end
66 APPENDIX A. MATLAB CODE
A.3 Calculate Value of Floating-Point Num-
bers
1 function r e s u l t = ca l cu l a t eOpe ra t i on (a , exponent length ,
mant i s sa l ength )
2
3 format long
4
5 b ia s = 2ˆ( exponent length −1)−1;
6 a b in = dec2bin (hex2dec ( a ) , exponent l ength+mant i s sa l ength
+1) ;
7
8 a exponent = 0 ;
9 a mant i s sa = 0 ;
10
11 denormal ized = 0 ;
12
13 i n f i n i t y c o u n t e r = 0 ;
14
15 % Ca lcu l a t e exponent va lue
16 for i =2: exponent l ength+1
17 i f ( a b in ( i ) == ’ 1 ’ )
18 i n f i n i t y c o u n t e r = i n f i n i t y c o u n t e r + 1 ;
19 a exponent = a exponent+2ˆ( exponent length−i +1) ;
20 end
21 end
22
23 a exponent = a exponent − b ia s ;
24 r e s u l t = 2ˆ a exponent ;
25
26 i f ( a exponent ˜= −127)
27 % Number i s normal ized
28 % Add the hidden mantissa b i t
29 a mant i s sa = 2ˆ0 ;
30 else
31 % Number i s de−normal ized
32 disp ( ’De−Normalized ’ ) ;
33 a mant i s sa = 0 ;
34 denormal ized = 1 ;
35 end
36
37 % Ca lcu l a t e mantissa va lue
38 for i=exponent l ength+2: exponent l ength+1+mant i s sa l ength
39 i f ( a b in ( i ) == ’ 1 ’ )
40 i f ( denormal ized == 1)
41 a mant i s sa = a mant i s sa + (2ˆ(− i+exponent l ength
+2) ) ;
42 else
A.3. CALCULATE VALUE OF FLOATING-POINT NUMBERS 67
43 a mant i s sa = a mant i s sa + (2ˆ(−1− i+
exponent l ength+2) ) ;
44 end
45 end
46 end
47
48 % Is number p o s i t i v e or nega t i v e
49 i f ( a b in (1 ) == ’ 1 ’ )
50 r e s u l t = −1∗ r e s u l t ∗ a mant i s sa ;
51 else
52 r e s u l t = r e s u l t ∗ a mant i s sa ;
53 end
54
55 i f ( i n f i n i t y c o u n t e r == exponent l ength )
56 i f ( a mant i s sa == 1)
57 r e s u l t = ’ I n f i n i t y ’ ;
58 else
59 r e s u l t = ’NaN ’ ;
60 end
61 end
62 return
63
64 end
68 APPENDIX A. MATLAB CODE
Appendix B
HDL Code
B.1 Floating-Point Design
B.1.1 Top-Level Design for Xilinx IP
1
2 l ibrary IEEE ;
3 use IEEE . STD LOGIC 1164 .ALL;
4
5 l ibrary work ;
6 use work . a l l ;
7
8 en t i t y system i s
9 g ene r i c (
10 OPERANDLENGTH : integer := 32 ;
11 EXPONENTLENGTH : integer := 8 ;
12 MANTISSA LENGTH : integer := 23 ;
13
14 EXCEPTIONS : boolean := f a l s e
15 ) ;
16 port (
17 a , b : in s t d l o g i c v e c t o r (OPERANDLENGTH−1 downto
0) ;
18 opera t i on : in s t d l o g i c v e c t o r (2 downto 0) ;
19 operat ion nd : in s t d l o g i c ;
20 −−op e r a t i o n r f d : out s t d l o g i c ;
21 c l k : in s t d l o g i c ;
22 −−c l k a : in s t d l o g i c ;
23 r e s e t : in s t d l o g i c ;
24 r e s u l t i p : out s t d l o g i c v e c t o r (OPERANDLENGTH−1 downto
0) ;
25 −−r e s u l t c o n f : out s t d l o g i c v e c t o r (OPERANDLENGTH−1
downto 0) ;
69
70 APPENDIX B. HDL CODE
26 rdy : out s t d l o g i c ;
27
28 underf low : out s t d l o g i c ;
29 over f l ow : out s t d l o g i c ;
30 i nva l i d op : out s t d l o g i c ;
31 d i v i d e by z e r o : out s t d l o g i c
32 ) ;
33 end system ;
34
35 a r c h i t e c t u r e Behav iora l o f system i s
36
37 s i g n a l ready : s t d l o g i c := ’ 0 ’ ;
38 s i g n a l r eady con f adder sub : s t d l o g i c := ’ 0 ’ ;
39 s i g n a l ready con f mul t : s t d l o g i c := ’ 0 ’ ;
40 s i g n a l r eady ip adder sub : s t d l o g i c := ’ 0 ’ ;
41 s i g n a l r eady ip mul t : s t d l o g i c := ’ 0 ’ ;
42 s i g n a l r e ady i p d i v : s t d l o g i c := ’ 0 ’ ;
43
44
45 s i g n a l r e s u l t c on f add e r s ub : s t d l o g i c v e c t o r (
OPERANDLENGTH−1 downto 0) := ( o the r s => ’ 0 ’ ) ;
46 s i g n a l r e s u l t c on f mu l t : s t d l o g i c v e c t o r (OPERANDLENGTH
−1 downto 0) := ( o the r s => ’ 0 ’ ) ;
47 s i g n a l r e s u l t i p add e r s ub : s t d l o g i c v e c t o r (
OPERANDLENGTH−1 downto 0) := ( o the r s => ’ 0 ’ ) ;
48 s i g n a l r e s u l t i p mu l t : s t d l o g i c v e c t o r (OPERANDLENGTH−1
downto 0) := ( o the r s => ’ 0 ’ ) ;
49 s i g n a l r e s u l t i p d i v : s t d l o g i c v e c t o r (OPERANDLENGTH
−1 downto 0) := ( o the r s => ’ 0 ’ ) ;
50
51 s i g n a l f pu ope ra t i on : s t d l o g i c v e c t o r (2 downto 0) :=
”000” ;
52 s i g n a l adder new data : s t d l o g i c := ’ 0 ’ ;
53 s i g n a l mult new data : s t d l o g i c := ’ 0 ’ ;
54 s i g n a l div new data : s t d l o g i c := ’ 0 ’ ;
55
56 s i g n a l s c l r add e r s ub : s t d l o g i c := ’ 1 ’ ;
57 s i g n a l s c l r mu l t : s t d l o g i c := ’ 1 ’ ;
58 s i g n a l s c l r d i v : s t d l o g i c := ’ 1 ’ ;
59
60 s i g n a l ce adder sub : s t d l o g i c := ’ 0 ’ ;
61 s i g n a l ce mult : s t d l o g i c := ’ 0 ’ ;
62 s i g n a l c e d i v : s t d l o g i c := ’ 0 ’ ;
63
64 −− Comment t h i s i f us ing c on f i gu r ab l e design
65 s i g n a l r e s u l t c o n f : s t d l o g i c v e c t o r (OPERANDLENGTH
−1 downto 0) ;
66
67 −− Used i f EXCEPTIONS i s t rue
B.1. FLOATING-POINT DESIGN 71
68 s i g n a l underf low add , underf low mult , under f l ow d iv :
s t d l o g i c := ’ 0 ’ ;
69 s i g n a l over f low add , over f low mult , ov e r f l ow d iv :
s t d l o g i c := ’ 0 ’ ;
70 s i g n a l inva l id op add , inva l id op mul t , i n v a l i d op d i v :
s t d l o g i c := ’ 0 ’ ;
71 s i g n a l d i v i d e by z e r o d i v : s t d l o g i c
:= ’ 0 ’ ;
72
73 component fpu adder sub
74 port (
75 a : in s t d l o g i c v e c t o r (OPERANDLENGTH−1 downto
0) ;
76 b : in s t d l o g i c v e c t o r (OPERANDLENGTH−1 downto
0) ;
77 opera t i on : in s t d l o g i c v e c t o r (5 downto 0) ; −− 0 for
add i t i on and 1 for sub t ra c t i on
78 operat ion nd : in s t d l o g i c ; −− New Data . Must be s e t
high to i nd i c a t e that operand A, B and opera t ion i s
va l i d
79 −−c l k : in s t d l o g i c ;
80 −−s c l r : in s t d l o g i c ; −− Synchronous Reset . Resets
RDY and OPERATION RFD output . Takes p r i o r i t y over CE
81 −−ce : in s t d l o g i c ; −− Clock enable
82 r e s u l t : out s t d l o g i c v e c t o r (OPERANDLENGTH−1
downto 0) ;
83 rdy : out s t d l o g i c −− Set high when r e s u l t i s v a l i d
84 −−underf low : out s t d l o g i c ;
85 −−over f l ow : out s t d l o g i c ;
86 −−i n v a l i d op : out s t d l o g i c
87 ) ;
88 end component ;
89
90 component f p u mu l t i p l i e r
91 port (
92 a : in s t d l o g i c v e c t o r (OPERANDLENGTH−1 downto
0) ;
93 b : in s t d l o g i c v e c t o r (OPERANDLENGTH−1 downto
0) ;
94 operat ion nd : in s t d l o g i c ;
95 −−c l k : in s t d l o g i c ;
96 −−s c l r : in s t d l o g i c ;
97 −−ce : in s t d l o g i c ;
98 r e s u l t : out s t d l o g i c v e c t o r (OPERANDLENGTH−1
downto 0) ;
99 rdy : out s t d l o g i c
100 −−underf low : out s t d l o g i c ;
101 −−over f l ow : out s t d l o g i c ;
102 −−i n v a l i d op : out s t d l o g i c
72 APPENDIX B. HDL CODE
103 ) ;
104 end component ;
105
106 component f pu d i v i d e r
107 port (
108 a : in s t d l o g i c v e c t o r (OPERANDLENGTH−1 downto
0) ;
109 b : in s t d l o g i c v e c t o r (OPERANDLENGTH−1 downto
0) ;
110 operat ion nd : in s t d l o g i c ;
111 −−c l k : in s t d l o g i c ;
112 −−s c l r : in s t d l o g i c ;
113 −−ce : in s t d l o g i c ;
114 r e s u l t : out s t d l o g i c v e c t o r (OPERANDLENGTH−1
downto 0) ;
115 rdy : out s t d l o g i c
116 −−underf low : out s t d l o g i c ;
117 −−over f l ow : out s t d l o g i c ;
118 −−i n v a l i d op : out s t d l o g i c ;
119 −−d i v i d e by z e r o : out s t d l o g i c
120 ) ;
121 end component ;
122
123
124 begin
125
126 ready <= ready ip adder sub or r eady ip mul t or r e ady i p d i v
;
127 rdy <= ready ;
128 −− r eady con f adder sub or
129
130 fpu ope ra t i on <= operat i on (2 downto 0) ;
131
132 underf low <= underf low add or underf low mult or
under f l ow d iv ;
133 over f l ow <= over f low add or over f low mult or ove r f l ow d iv ;
134 i nva l i d op <= inva l i d op add or i nva l i d op mu l t or
i n v a l i d op d i v ;
135 d i v i d e by z e r o <= d iv i d e by z e r o d i v ;
136
137 −− Control ready s i g n a l
138 proce s s ( c lk , r e s e t , r eady con f adder sub , ready ip adder sub
, ready ip mult , r e ady i p d i v )
139 begin
140 i f ( r e s e t = ’1 ’ ) then
141 −−s c l r add e r s ub <= ’0 ’ ;
142 −− s c l r mu l t <= ’1 ’ ;
143 −− s c l r d i v <= ’1 ’ ;
144 else
B.1. FLOATING-POINT DESIGN 73
145 i f ( r i s i n g e d g e ( c l k ) ) then
146 i f ( r eady con f adder sub = ’1 ’ ) then
147 r e s u l t c o n f <= re su l t c on f add e r s ub ;
148 e l s i f ( r eady con f mul t = ’1 ’ ) then
149 r e s u l t c o n f <= re su l t c on f mu l t ;
150 e l s i f ( r eady ip adder sub = ’1 ’ ) then
151 r e s u l t i p <= re s u l t i p add e r s ub ;
152 −−s c l r add e r s ub <= ’1 ’ ;
153 e l s i f ( r eady ip mul t = ’1 ’ ) then
154 r e s u l t i p <= r e s u l t i p mu l t ;
155 e l s i f ( r e ady i p d i v = ’1 ’ ) then
156 r e s u l t i p <= r e s u l t i p d i v ;
157 else
158 end i f ;
159
160 end i f ;
161 end i f ;
162 end proce s s ;
163
164 proce s s ( c lk , r e s e t , fpu operat ion , operat ion nd , ready )
165 begin
166 i f ( r e s e t = ’1 ’ ) then
167 s c l r add e r s ub <= ’1 ’ ;
168 s c l r mu l t <= ’1 ’ ;
169 s c l r d i v <= ’1 ’ ;
170
171 ce adder sub <= ’0 ’ ;
172 ce mult <= ’0 ’ ;
173 c e d i v <= ’0 ’ ;
174 else
175 i f ( r i s i n g e d g e ( c l k ) ) then
176 i f ( operat ion nd = ’1 ’ ) then
177 −− Addit ion and sub t ra c t i on
178 i f ( f pu ope ra t i on = ”000” or f pu ope ra t i on =”001” )
then
179 adder new data <= ’1 ’ ;
180 mult new data <= ’0 ’ ;
181 div new data <= ’0 ’ ;
182
183 s c l r add e r s ub <= ’0 ’ ;
184 s c l r mu l t <= ’1 ’ ;
185 s c l r d i v <= ’1 ’ ;
186
187 ce adder sub <= ’1 ’ ;
188 ce mult <= ’0 ’ ;
189 c e d i v <= ’0 ’ ;
190 −− Mul t i p l i c a t i o n
191 e l s i f ( f pu ope ra t i on = ”010” ) then
192 mult new data <= ’1 ’ ;
74 APPENDIX B. HDL CODE
193 adder new data <= ’0 ’ ;
194 div new data <= ’0 ’ ;
195
196 s c l r add e r s ub <= ’1 ’ ;
197 s c l r mu l t <= ’0 ’ ;
198 s c l r d i v <= ’1 ’ ;
199
200 ce adder sub <= ’0 ’ ;
201 ce mult <= ’1 ’ ;
202 c e d i v <= ’0 ’ ;
203 −− Div i s i on
204 e l s i f ( f pu ope ra t i on = ”011” ) then
205 div new data <= ’1 ’ ;
206 adder new data <= ’0 ’ ;
207 mult new data <= ’0 ’ ;
208
209 s c l r add e r s ub <= ’1 ’ ;
210 s c l r mu l t <= ’1 ’ ;
211 s c l r d i v <= ’0 ’ ;
212
213 ce adder sub <= ’0 ’ ;
214 ce mult <= ’0 ’ ;
215 c e d i v <= ’1 ’ ;
216 else
217 div new data <= ’0 ’ ;
218 adder new data <= ’0 ’ ;
219 mult new data <= ’0 ’ ;
220
221 s c l r add e r s ub <= ’1 ’ ;
222 s c l r mu l t <= ’1 ’ ;
223 s c l r d i v <= ’1 ’ ;
224
225 ce adder sub <= ’0 ’ ;
226 ce mult <= ’0 ’ ;
227 c e d i v <= ’0 ’ ;
228 end i f ;
229 e l s i f ( ready = ’1 ’ ) then
230 s c l r add e r s ub <= ’1 ’ ;
231 s c l r mu l t <= ’1 ’ ;
232 s c l r d i v <= ’1 ’ ;
233 ce adder sub <= ’0 ’ ;
234 ce mult <= ’0 ’ ;
235 c e d i v <= ’0 ’ ;
236
237 adder new data <= ’0 ’ ;
238 mult new data <= ’0 ’ ;
239 div new data <= ’0 ’ ;
240 else
241
B.1. FLOATING-POINT DESIGN 75
242 end i f ;
243 end i f ;
244 end i f ;
245 end proce s s ;
246
247 l o c a l f pu add e r s ub : fpu adder sub
248 port map (
249 a => a , −− input [ 31 : 0 ] a
250 b => b , −− input [ 31 : 0 ] b
251 opera t i on (2 downto 0) => operat ion , −− input [ 5 : 0 ]
ope ra t i on
252 opera t i on (5 downto 3) => ”000” ,
253 operat ion nd => adder new data , −− input
operat ion nd
254 −−c l k => c lk , −− input c l k
255 −−s c l r => s c l r adde r sub ,
256 −−ce => ce adder sub ,
257 r e s u l t => r e s u l t i p add e r s ub , −− output
[ 31 : 0 ] r e s u l t
258 rdy => r eady ip adder sub −− output
rdy
259 −−underf low => underf low add ,
260 −−over f l ow => over f low add ,
261 −−i n v a l i d op => i nva l i d op add
262 ) ;
263
264
265 l o c a l f p u mu l t i p l i e r : f p u mu l t i p l i e r
266 port map(
267 a => a , −− input [ 31 : 0 ] a
268 b => b , −− input [ 31 : 0 ] b
269 operat ion nd => mult new data , −− input operat ion nd
270 −−c l k => c lk , −− input c l k
271 −−s c l r => s c l r mu l t ,
272 −−ce => ce mult ,
273 r e s u l t => r e s u l t i p mu l t , −− output [ 31 : 0 ]
r e s u l t
274 rdy => r eady ip mul t −− output rdy
275 −−underf low => underf low mult ,
276 −−over f l ow => over f low mult ,
277 −−i n v a l i d op => i nva l i d op mu l t
278 ) ;
279
280 l o c a l f p u d i v i d e r : f p u d i v i d e r
281 port map(
282 a => a , −− input [ 31 : 0 ] a
283 b => b , −− input [ 31 : 0 ] b
284 operat ion nd => div new data , −− input operat ion nd
285 −−c l k => c lk , −− input c l k
76 APPENDIX B. HDL CODE
286 −−s c l r => s c l r d i v ,
287 −−ce => ce d iv ,
288 r e s u l t => r e s u l t i p d i v , −− output [ 31 : 0 ]
r e s u l t
289 rdy => r e ady i p d i v −− output rdy
290 −−underf low => under f low div ,
291 −−over f l ow => over f l ow d iv ,
292 −−i n v a l i d op => i n va l i d op d i v ,
293 −−d i v i d e by z e r o => d i v i d e by z e r o d i v
294 ) ;
295
296
297 end Behaviora l ;
B.1. FLOATING-POINT DESIGN 77
B.1.2 Adder and Subtracter
1
2 l ibrary IEEE ;
3 use IEEE . STD LOGIC 1164 .ALL;
4 use IEEE .NUMERIC STD.ALL;
5 use IEEE . s t d l o g i c un s i g n ed .ALL;
6
7 entity con f i gu rab l e adde r uns i gned i s
8 generic (
9 OPERANDLENGTH : i n t e g e r := 15 ;
10 EXPONENTLENGTH : i n t e g e r := 5 ;
11 MANTISSA LENGTH : i n t e g e r := 9
12 ) ;
13 Port ( a : in STD LOGIC VECTOR (OPERANDLENGTH−1 downto 0)
;
14 b : in STD LOGIC VECTOR (OPERANDLENGTH−1 downto 0)
;
15 c l k : in s t d l o g i c ;
16 r e s e t : in s t d l o g i c ;
17 new data : in s t d l o g i c ;
18 opera t i on : in s t d l o g i c v e c t o r (2 downto 0) ;
19 r e s u l t : out STD LOGIC VECTOR (OPERANDLENGTH−1 downto
0) ;
20 rdy : out s t d l o g i c ) ;
21 end con f i gu rab l e adde r uns i gned ;
22
23 architecture Behaviora l of con f i gu rab l e adde r un s i gned i s
24
25 type s t a t e t yp e i s ( s0 , s1 , s2 , s3 , s4 , s5 , s6 , s7 , s8 ) ; −−type o f
s t a t e machine .
26 signal cu r r en t s , nex t s : s t a t e t yp e ; −−curren t and next s t a t e
d e c l a r a t i on .
27
28 −− S i gna l s f o r adder
29 signal r e su l t exponen t : s t d l o g i c v e c t o r (EXPONENTLENGTH−1
downto 0) := ( others => ’ 0 ’ ) ;
30
31 signal r e s mant i s s a : s t d l o g i c v e c t o r (MANTISSA LENGTH+1 downto
0) := ( others => ’ 0 ’ ) ;
32 signal big exponent : s t d l o g i c v e c t o r (EXPONENTLENGTH−1 downto
0) := ( others => ’ 0 ’ ) ;
33 signal smal l exponent : s t d l o g i c v e c t o r (EXPONENTLENGTH−1
downto 0) := ( others => ’ 0 ’ ) ;
34 signal b ig mant i s sa : s t d l o g i c v e c t o r (MANTISSA LENGTH−1 downto
0) := ( others => ’ 0 ’ ) ;
35 signal sma l l mant i s sa : s t d l o g i c v e c t o r (MANTISSA LENGTH−1
downto 0) := ( others => ’ 0 ’ ) ;
36 signal norma l i za t i on : s t d l o g i c := ’ 0 ’ ;
78 APPENDIX B. HDL CODE
37 signal s i gn : s t d l o g i c := ’ 0 ’ ;
38 signal sub neg : s t d l o g i c := ’ 0 ’ ;
39
40 signal c lk node : s t d l o g i c := ’ 0 ’ ;
41 signal de l a y c l k : s t d l o g i c := ’ 0 ’ ;
42 signal no rma l i z a t i o n f a c t o r : s t d l o g i c v e c t o r (1 downto 0) ;
43 signal s i g o p e r a t i o n : s t d l o g i c := ’ 0 ’ ;
44
45 begin
46
47 process ( c lk , r e s e t )
48 begin
49 i f ( r e s e t = ’1 ’) then
50 cu r r en t s <= s0 ; −−d e f a u l t s t a t e on r e s e t .
51 e l s i f ( r i s i n g e d g e ( c l k ) ) then
52 cu r r en t s <= next s ; −−s t a t e change .
53 end i f ;
54 end process ;
55
56 process ( cu r r en t s , a , b , new data , opera t i on )
57 begin
58 case cu r r en t s i s
59 when s0 =>
60 rdy <= ’0 ’ ;
61 r e s mant i s s a <= ( others => ’ 0 ’ ) ;
62 b ig exponent <= ( others => ’ 0 ’ ) ;
63 smal l exponent <= ( others => ’ 0 ’ ) ;
64 b ig mant i s sa <= ( others => ’ 0 ’ ) ;
65 sma l l mant i s sa <= ( others => ’ 0 ’ ) ;
66 r e s u l t <= ( others => ’ 0 ’ ) ;
67 norma l i za t i on <= ’0 ’ ;
68 s i gn <= ’0 ’ ;
69 sub neg <= ’0 ’ ;
70
71 i f new data = ’1 ’ then
72 next s <= s1 ;
73 else next s <= s0 ;
74 end i f ;
75 when s1 =>
76 r e s u l t <= ( others => ’ 0 ’ ) ;
77 rdy <= ’0 ’ ;
78 r e s mant i s s a <= ( others => ’ 0 ’ ) ;
79 s i gn <= ’0 ’ ;
80
81
82 i f ( operat i on (0 ) = ’0 ’ ) then
83 −− order do not matter
84 next s <= s2 ;
85 e l s i f ( operat i on (0 ) = ’1 ’ ) then
B.1. FLOATING-POINT DESIGN 79
86 −− order matter
87 end i f ;
88
89 −− Al l o ca t e the b i g g e s t number to the b i g g e s t exponent and
mantissa v i s a verca f o r f u r t h e r
90 −− opera t i ons
91 −− B b i g g e r then A
92 i f (b(OPERANDLENGTH−2 downto OPERANDLENGTH−1−
EXPONENTLENGTH) > a (OPERANDLENGTH−2 downto
OPERANDLENGTH−1−EXPONENTLENGTH) ) then
93 big exponent <= b(OPERANDLENGTH−2 downto OPERANDLENGTH
−1−EXPONENTLENGTH) ;
94 smal l exponent <= a (OPERANDLENGTH−2 downto
OPERANDLENGTH−1−EXPONENTLENGTH) ;
95 b ig mant i s sa <= b(MANTISSA LENGTH−1 downto 0) ;
96 sma l l mant i s sa <= a (MANTISSA LENGTH−1 downto 0) ;
97
98 i f ( operat i on (0 ) = ’0 ’ ) then
99 −− order do not matter
100 next s <= s2 ;
101 e l s i f ( operat i on (0 ) = ’1 ’ ) then
102 −− order matter
103 −− I f minus and minus
104 i f (b(OPERANDLENGTH−1) = ’1 ’ ) then
105 s i gn <= ’0 ’ ;
106 sub neg <= ’1 ’ ;
107 else
108 s i gn <= ’1 ’ ;
109 end i f ;
110 else
111 end i f ;
112 −− A b i g g e r or equa l to B
113 else
114
115
116 −− A i s nega t i v e
117 i f ( a (OPERANDLENGTH−1) = ’1 ’ ) then
118 s i gn <= ’1 ’ ;
119 end i f ;
120 big exponent <= a (OPERANDLENGTH−2 downto OPERANDLENGTH
−1−EXPONENTLENGTH) ;
121 smal l exponent <= b(OPERANDLENGTH−2 downto
OPERANDLENGTH−1−EXPONENTLENGTH) ;
122 b ig mant i s sa <= a (MANTISSA LENGTH−1 downto 0) ;
123 sma l l mant i s sa <= b(MANTISSA LENGTH−1 downto 0) ;
124 end i f ;
125
126 next s <= s2 ;
127 when s2 =>
80 APPENDIX B. HDL CODE
128 r e s u l t <= ( others => ’ 0 ’ ) ;
129 rdy <= ’0 ’ ;
130 r e s mant i s s a <= ( others => ’ 0 ’ ) ;
131
132 −− Ca l cu l a t e the d i f f e r e n c e between the expononents
133 r e su l t exponen t <= big exponent − smal l exponent ;
134 −− a adder (EXPONENT LENGTH−1 downto 0) <= big exponen t ;
135 −− b adder (EXPONENT LENGTH−1 downto 0) <= smal l exponent ;
136 −− ce adder <= ’1 ’ ;
137 −− add adder <= ’0 ’ ;
138
139 next s <= s3 ;
140 when s3 =>
141 r e s mant i s s a <= ( others => ’ 0 ’ ) ;
142 r e s u l t <= ( others => ’ 0 ’ ) ;
143 rdy <= ’0 ’ ;
144 norma l i za t i on <= ’0 ’ ;
145
146 −− S h i f t the sma l l e s t mantissa x p l a c e s to the r i g h t i f
the d i f f e r e n c e between the exponents are b i g g e r then 0
147 −− I f the mantissa i s s h i f t e d , i t i s denormalized , go ing
from 1. xxxx to 0 . xxxx
148 i f ( r e su l t exponen t > 0 and r e su l t exponen t <=
MANTISSA LENGTH−1) then
149 smal l mant i s sa <= s t d l o g i c v e c t o r ( unsigned (
sma l l mant i s sa ) sr l t o i n t e g e r ( unsigned (
r e su l t exponen t ) ) ) ;
150 sma l l mant i s sa (MANTISSA LENGTH−t o i n t e g e r ( unsigned (
r e su l t exponen t ) ) ) <= ’1 ’ ;
151 norma l i za t i on <= ’1 ’ ;
152 next s <= s4 ;
153 e l s i f ( r e su l t exponen t > (MANTISSA LENGTH−1) ) then
154 −− a >> b
155 r e su l t exponen t <= big exponent ;
156 r e s mant i s s a (MANTISSA LENGTH−1 downto 0) <= big mant i s sa
;
157 next s <= s7 ;
158 else
159 next s <= s4 ;
160 end i f ;
161
162 when s4 =>
163 r e s u l t <= ( others => ’ 0 ’ ) ;
164 rdy <= ’0 ’ ;
165
166 −− Check f o r add i t i on or s u b t r a c t i on
167 i f ( operat i on (0 ) = ’0 ’ or sub neg = ’1 ’ ) then
168 i f ( norma l i za t i on = ’1 ’ ) then
B.1. FLOATING-POINT DESIGN 81
169 r e s mant i s s a <= (”01” & big mant i s sa ) + ( ”00” &
smal l mant i s sa ) ;
170 else
171 r e s mant i s s a <= (”01” & big mant i s sa ) + ( ”01” &
smal l mant i s sa ) ;
172 end i f ;
173
174 e l s i f ( operat i on (0 ) = ’1 ’ ) then
175 i f ( norma l i za t i on = ’1 ’ ) then
176 r e s mant i s s a <= (”01” & big mant i s sa (MANTISSA LENGTH−1
downto 0) ) − ( ”00” & smal l mant i s sa (
MANTISSA LENGTH−1 downto 0) ) ;
177 else
178 r e s mant i s s a (MANTISSA LENGTH−1 downto 0) <= a (
MANTISSA LENGTH−1 downto 0) − b(MANTISSA LENGTH−1
downto 0) ;
179 end i f ;
180 else
181 end i f ;
182
183 −− i f ( norma l i za t ion = ’0 ’ ) then
184 −− re s mant i s sa (MANTISSA LENGTH) <= ’1 ’ ;
185 −− e l s e
186 −− norma l i za t ion <= ’0 ’ ;
187 −− end i f ;
188
189 next s <= s5 ;
190 when s5 =>
191 rdy <= ’0 ’ ;
192 −−re s mant i s sa <= ( o the r s => ’ 0 ’ ) ;
193
194 −− Test i f the r e s u l t i n g mantissa i s normal ized
195 i f ( r e s mant i s s a (MANTISSA LENGTH+1 downto MANTISSA LENGTH)
> 1) then
196 −− Needs to be normalized , mantissa b i g g e r then 2
197 norma l i za t i on <= ’0 ’ ;
198
199 −− Exponent has to be added
200 r e su l t exponen t <= big exponent + re s mant i s s a (
MANTISSA LENGTH+1 downto MANTISSA LENGTH) − 1 ;
201 −− Resu l t i ng mantissa i s s h i f t e d 1 to r i gh t , normal ized
202 r e s mant i s s a (MANTISSA LENGTH−1 downto 0) <= res mant i s s a
(MANTISSA LENGTH downto 1) ;
203
204
205 −− Resu l t i ng mantissa i s s h i f t e d 1 to r i gh t , normal ized
206 −− add adder <= ’1 ’ ;
207 −− ce adder <= ’1 ’ ;
208 −− a adder (EXPONENT LENGTH−1 downto 0) <= big exponen t ;
82 APPENDIX B. HDL CODE
209 −− i f ( r e s u l t e x ponen t (MANTISSA LENGTH+1 downto
MANTISSA LENGTH) = ”10”) then
210 −− b adder (1 downto 0) <= ”01”;
211 −− e l s e
212 −− b adder (1 downto 0) <= ”10”;
213 −− end i f ;
214 next s <= s7 ;
215 e l s i f ( r e s mant i s s a (MANTISSA LENGTH+1 downto
MANTISSA LENGTH) = 1) then
216 −− Mantissa i s normal ized
217 norma l i za t i on <= ’0 ’ ;
218 r e su l t exponen t <= big exponent ;
219 −−re s mant i s sa <= res mant i s sa (MANTISSA LENGTH−1 downto
0) ;
220 next s <= s7 ;
221
222 else
223 norma l i za t i on <= ’0 ’ ;
224
225 −− Find the p o s i t i o n o f the l e ad in g one
226 −− I s t h i s the b e s t way to do i t ????
227 for i in MANTISSA LENGTH−1 downto 0 loop
228 i f ( r e s mant i s s a ( i ) = ’1 ’ ) then
229 −− Leading one found
230 i f ( (MANTISSA LENGTH−i ) > r e su l t exponen t ) then
231 r e s mant i s s a <= ( others => ’ 0 ’ ) ;
232 r e su l t exponen t <= ( others => ’ 0 ’ ) ;
233 else
234 r e s mant i s s a <= s t d l o g i c v e c t o r ( unsigned (
r e s mant i s s a ) s l l (MANTISSA LENGTH−i ) ) ;
235 r e su l t exponen t <= big exponent − (MANTISSA LENGTH
−i ) ;
236 end i f ;
237
238 −− a adder (EXPONENT LENGTH−1 downto 0) <=
big exponen t ;
239 −− b adder (4 downto 0) <= s t d l o g i c v e c t o r (
to uns i gned (MANTISSA LENGTH−i , 5 ) ) ;
240 −− add adder <= ’0 ’ ;
241 −− ce adder <= ’1 ’ ;
242 exit ;
243 else
244
245 end i f ;
246 end loop ;
247 next s <= s7 ;
248 end i f ;
249
250 −− This s t a t e i s on ly needed when exponent has to be changed
B.1. FLOATING-POINT DESIGN 83
251 −− when s6 =>
252 −− a adder <= ( o the r s => ’ 0 ’ ) ;
253 −− b adder <= ( o the r s => ’ 0 ’ ) ;
254 −− r e s u l t <= ( o the r s => ’ 0 ’ ) ;
255 −− ce adder <= ’0 ’ ;
256 −− add adder <= ’1 ’ ;
257 −− rdy <= ’0 ’ ;
258 −−
259 −− b i g exponen t <= re su l t e x ponen t (EXPONENT LENGTH−1 downto
0) ;
260 −− nex t s <= s7 ;
261
262 when s7 =>
263 rdy <= ’1 ’ ;
264
265 r e s u l t (OPERANDLENGTH−2 downto OPERANDLENGTH−1−
EXPONENTLENGTH) <= re su l t exponen t ;
266 r e s u l t (MANTISSA LENGTH−1 downto 0) <= res mant i s s a (
MANTISSA LENGTH−1 downto 0) ;
267 r e s u l t (OPERANDLENGTH−1) <= s ign ;
268
269 next s <= s0 ;
270
271 when others =>
272 r e s u l t <= ( others => ’ 0 ’ ) ;
273 rdy <= ’0 ’ ;
274 r e s mant i s s a <= ( others => ’ 0 ’ ) ;
275 big exponent <= ( others => ’ 0 ’ ) ;
276 smal l exponent <= ( others => ’ 0 ’ ) ;
277 b ig mant i s sa <= ( others => ’ 0 ’ ) ;
278 sma l l mant i s sa <= ( others => ’ 0 ’ ) ;
279
280 next s <= s0 ;
281 end case ;
282 end process ;
283
284
285 end Behaviora l ;
84 APPENDIX B. HDL CODE
B.1.3 Multiplier
1
2 l ibrary IEEE ;
3 use IEEE . STD LOGIC 1164 .ALL;
4 use IEEE . s t d l o g i c un s i g n ed .ALL;
5 use IEEE .NUMERIC STD.ALL;
6
7 entity c o n f i g u r a b l e mu l t i p l i e r i s
8 generic (
9 OPERANDLENGTH : i n t e g e r := 32 ;
10 EXPONENTLENGTH : i n t e g e r := 8 ;
11 MANTISSA LENGTH : i n t e g e r := 23
12 ) ;
13 Port ( a : in STD LOGIC VECTOR (OPERANDLENGTH−1 downto 0)
;
14 b : in STD LOGIC VECTOR (OPERANDLENGTH−1 downto 0)
;
15 c l k : in s t d l o g i c ;
16 −−c l k a : in s t d l o g i c ;
17 r e s e t : in s t d l o g i c ;
18 new data : in s t d l o g i c ;
19 −−opera t ion : in s t d l o g i c v e c t o r (5 downto 0) ;
20 r e s u l t : out STD LOGIC VECTOR (OPERANDLENGTH−1 downto
0) ;
21 rdy : out s t d l o g i c ) ;
22 end c o n f i g u r a b l e mu l t i p l i e r ;
23
24 architecture Behaviora l of c o n f i g u r a b l e mu l t i p l i e r i s
25
26 signal a mant i s sa : s t d l o g i c v e c t o r (MANTISSA LENGTH downto 0)
:= ( others => ’ 0 ’ ) ;
27 signal b mantissa : s t d l o g i c v e c t o r (MANTISSA LENGTH downto 0)
:= ( others => ’ 0 ’ ) ;
28 signal r e s mant i s s a : s t d l o g i c v e c t o r (MANTISSA LENGTH∗2+1
downto 0) := ( others => ’ 0 ’ ) ;
29 signal r e s exponent : s t d l o g i c v e c t o r (EXPONENTLENGTH−1 downto
0) := ( others => ’ 0 ’ ) ;
30 signal r e s s i g n : s t d l o g i c := ’ 0 ’ ;
31
32 constant ZERO : s t d l o g i c v e c t o r (OPERANDLENGTH−1 downto
0) := ( others => ’ 0 ’ ) ;
33 constant INFINITY : s t d l o g i c v e c t o r (EXPONENTLENGTH−1 downto
0) := ( others => ’ 1 ’ ) ;
34 constant EXPONENTZERO : s t d l o g i c v e c t o r (EXPONENTLENGTH−1
downto 0) := ( others => ’ 0 ’ ) ;
35
36 −− S i gna l s f o r adder
B.1. FLOATING-POINT DESIGN 85
37 −−s i g n a l a adder : s t d l o g i c v e c t o r (23 downto 0) := ( o the r s
=> ’ 0 ’ ) ;
38 −−s i g n a l b adder : s t d l o g i c v e c t o r (23 downto 0) := ( o the r s
=> ’ 0 ’ ) ;
39 −−s i g n a l add adder : s t d l o g i c := ’0 ’ ;
40 −−s i g n a l ce adder : s t d l o g i c := ’0 ’ ;
41 −−s i g n a l r e s u l t a d d e r : s t d l o g i c v e c t o r (24 downto 0) := ( o the r s
=> ’ 0 ’ ) ;
42
43
44 type s t a t e t yp e i s ( s0 , s1 , s2 , s3 , s4 ) ; −−type o f s t a t e machine .
45 signal cu r r en t s , nex t s : s t a t e t yp e ; −−curren t and next s t a t e
d e c l a r a t i on .
46
47 component adder
48 port (
49 a : in s t d l o g i c v e c t o r (23 downto 0) ;
50 b : in s t d l o g i c v e c t o r (23 downto 0) ;
51 c l k : in s t d l o g i c ;
52 add : in s t d l o g i c ;
53 ce : in s t d l o g i c ;
54 s : out s t d l o g i c v e c t o r (24 downto 0)
55 ) ;
56 end component ;
57
58 begin
59
60
61 −− l o c a l f r a c t i o n a d d e r s u b : adder
62 −− por t map(
63 −− a => a adder ,
64 −− b => b adder ,
65 −− c l k => c l k a ,
66 −− add => add adder ,
67 −− ce => ce adder ,
68 −− s => r e s u l t a d d e r
69 −− ) ;
70
71 process ( c lk , r e s e t )
72 begin
73 i f ( r e s e t = ’1 ’) then
74 cu r r en t s <= s0 ; −−d e f a u l t s t a t e on r e s e t .
75 e l s i f ( r i s i n g e d g e ( c l k ) ) then
76 −−c l k node <= c l k ;
77 cu r r en t s <= next s ; −−s t a t e change .
78 −−e l s e
79 −− c l k node <= c l k ;
80 end i f ;
81 end process ;
86 APPENDIX B. HDL CODE
82
83 process ( cu r r en t s , a , b , new data )
84 begin
85 case cu r r en t s i s
86 when s0 =>
87 a mant i s sa <= ( others => ’ 0 ’ ) ;
88 b mantissa <= ( others => ’ 0 ’ ) ;
89 r e s mant i s s a <= ( others => ’ 0 ’ ) ;
90 re s exponent <= ( others => ’ 0 ’ ) ;
91 r e s s i g n <= ’0 ’ ;
92 r e s u l t <= ( others => ’ 0 ’ ) ;
93 r e s s i g n <= ’0 ’ ;
94 rdy <= ’0 ’ ;
95
96 i f new data = ’1 ’ then
97 next s <= s1 ;
98 a mant i s sa (MANTISSA LENGTH−1 downto 0) <= a (
MANTISSA LENGTH−1 downto 0) ;
99 a mant i s sa (MANTISSA LENGTH) <= ’1 ’ ;
100 b mantissa (MANTISSA LENGTH−1 downto 0) <= b(
MANTISSA LENGTH−1 downto 0) ;
101 b mantissa (MANTISSA LENGTH) <= ’1 ’ ;
102
103 −−re s mant i s sa <= a(MANTISSA LENGTH−1 downto 0) ∗ b (
MANTISSA LENGTH−1 downto 0) ;
104 next s <= s1 ;
105 else next s <= s0 ;
106 end i f ;
107 when s1 =>
108 r e s mant i s s a <= ( others => ’ 0 ’ ) ;
109 re s exponent <= ( others => ’ 0 ’ ) ;
110 r e s s i g n <= ’0 ’ ;
111 r e s u l t <= ( others => ’ 0 ’ ) ;
112 r e s s i g n <= ’0 ’ ;
113 rdy <= ’0 ’ ;
114
115 r e s mant i s s a <= a mant i s sa ∗ b mantissa ;
116 re s exponent <= a (OPERANDLENGTH−2 downto OPERANDLENGTH
−1−EXPONENTLENGTH) + b(OPERANDLENGTH−2 downto
OPERANDLENGTH−1−EXPONENTLENGTH) − (2∗∗ (
EXPONENTLENGTH−1)−1) ;
117 next s <= s2 ;
118 −− Check i f o ve r f l ow => INFINITY
119 −− i f ( a (OPERANDLENGTH−1) = ’1 ’ and b (OPERANDLENGTH−1) =
’1 ’ ) then
120 −− −−re s exponent <= a(OPERANDLENGTH−2 downto
OPERANDLENGTH−1−EXPONENTLENGTH) + b (OPERANDLENGTH−2
downto OPERANDLENGTH−1−EXPONENTLENGTH) − (2∗∗(
EXPONENT LENGTH−1)−1) ;
B.1. FLOATING-POINT DESIGN 87
121 −− r e s u l t (OPERANDLENGTH−2 downto OPERANDLENGTH−1−
EXPONENTLENGTH) <= INFINITY ;
122 −− rdy <= ’1 ’ ;
123 −− nex t s <= s0 ;
124 −− −− Check i f r e s u l t i s 0
125 −− e l s i f ( a (OPERANDLENGTH−2 downto OPERANDLENGTH−1−
EXPONENTLENGTH) = EXPONENT ZERO or b (OPERANDLENGTH−2
downto OPERANDLENGTH−1−EXPONENTLENGTH) = EXPONENT ZERO)
then
126 −− −−r e s u l t (OPERANDLENGTH−2 downto OPERANDLENGTH−1−
EXPONENTLENGTH) <= INFINITY ;
127 −− nex t s <= s0 ;
128 −− r e s u l t <= ZERO;
129 −− rdy <= ’1 ’ ;
130 −− e l s e
131 −− r e s u l t <= ZERO;
132 −− nex t s <= s0 ;
133 −− rdy <= ’1 ’ ;
134 −− end i f ;
135 when s2 =>
136 a mant i s sa <= ( others => ’ 0 ’ ) ;
137 b mantissa <= ( others => ’ 0 ’ ) ;
138 −−re s exponent <= ( o the r s => ’ 0 ’ ) ;
139 r e s s i g n <= ’0 ’ ;
140 r e s u l t <= ( others => ’ 0 ’ ) ;
141 rdy <= ’0 ’ ;
142 r e s s i g n <= a (OPERANDLENGTH−1) xor b(OPERANDLENGTH−1) ;
143 i f ( r e s mant i s s a (MANTISSA LENGTH∗2+1 downto MANTISSA LENGTH
∗2) = ”01” ) then
144 next s <= s3 ;
145 e l s i f ( r e s mant i s s a (MANTISSA LENGTH∗2+1 downto
MANTISSA LENGTH∗2) = ”10” ) then
146 r e s mant i s s a <= s t d l o g i c v e c t o r ( unsigned ( r e s mant i s s a )
sr l 1) ;
147 re s exponent <= res exponent + 1 ;
148 next s <= s3 ;
149 e l s i f ( r e s mant i s s a (MANTISSA LENGTH∗2+1 downto
MANTISSA LENGTH∗2) = ”11” ) then
150 r e s mant i s s a <= s t d l o g i c v e c t o r ( unsigned ( r e s mant i s s a )
sr l 1) ;
151 re s exponent <= res exponent + 2 ;
152 next s <= s3 ;
153 else
154 next s <= s0 ;
155 end i f ;
156 when s3 =>
157 a mant i s sa <= ( others => ’ 0 ’ ) ;
158 b mantissa <= ( others => ’ 0 ’ ) ;
159 −−re s mant i s sa <= res mant i s sa ;
88 APPENDIX B. HDL CODE
160 re s exponent <= res exponent ;
161 r e s s i g n <= r e s s i g n ;
162 r e s u l t <= ( others => ’ 0 ’ ) ;
163 −−rdy <= ’1 ’ ;
164
165 −− Check f o r rounding
166 i f ( r e s mant i s s a (MANTISSA LENGTH∗2−MANTISSA LENGTH−1) =
’1 ’ ) then
167 r e s mant i s s a (MANTISSA LENGTH∗2−1 downto MANTISSA LENGTH
∗2−MANTISSA LENGTH) <= res mant i s s a (MANTISSA LENGTH
∗2−1 downto MANTISSA LENGTH∗2−MANTISSA LENGTH) + ’ 1 ’ ;
168 else
169 end i f ;
170
171 next s <= s4 ;
172
173
174 when s4 =>
175 r e s u l t (OPERANDLENGTH−1) <= r e s s i g n ;
176 r e s u l t (OPERANDLENGTH−2 downto OPERANDLENGTH−1−
EXPONENTLENGTH) <= res exponent (EXPONENTLENGTH−1
downto 0) ;
177 r e s u l t (MANTISSA LENGTH−1 downto 0) <= res mant i s s a (
MANTISSA LENGTH∗2−1 downto MANTISSA LENGTH∗2−
MANTISSA LENGTH) ;
178 rdy <= ’1 ’ ;
179
180 next s <= s0 ;
181 when others =>
182
183 end case ;
184 end process ;
185
186 end Behaviora l ;
B.1. FLOATING-POINT DESIGN 89
B.1.4 Top-Level Design for Floating-Point Library
1
2 l ibrary IEEE ;
3 use IEEE . STD LOGIC 1164 .ALL;
4 l ibrary IEEE PROPOSED;
5 use IEEE PROPOSED. f l o a t pkg . a l l ;
6 use IEEE PROPOSED. f i x e d f l o a t t y p e s . a l l ;
7
8 entity s y s t em i e e e f l o a t i s
9 port (
10 c l k : in s t d l o g i c ;
11 r e s e t : in s t d l o g i c ;
12 operat ion nd : in s t d l o g i c ;
13 opera t i on : in s t d l o g i c v e c t o r (5 downto 0) ;
14 a : in s t d l o g i c v e c t o r ( f l oa t exponent w id th+
f l o a t f r a c t i o n w i d t h downto 0) ;
15 b : in s t d l o g i c v e c t o r ( f l oa t exponent w id th+
f l o a t f r a c t i o n w i d t h downto 0) ;
16 sum : out s t d l o g i c v e c t o r ( f l oa t exponent w id th+
f l o a t f r a c t i o n w i d t h downto 0) ;
17 ready : out s t d l o g i c
18 ) ;
19 end s y s t em i e e e f l o a t ;
20
21 architecture Behaviora l of s y s t em i e e e f l o a t i s
22
23 signal afp , bfp , sumfp : f l o a t ( f l oa t exponent w id th downto −
f l o a t f r a c t i o n w i d t h ) ;
24
25 type s t a t e t yp e i s ( s0 , s1 , s2 ) ; −−type o f s t a t e machine .
26 signal cu r r en t s , nex t s : s t a t e t yp e ; −−curren t and next s t a t e
d e c l a r a t i on .
27
28 begin
29 afp <= t o f l o a t ( a , afp ’ high , −afp ’ low ) ;
30 bfp <= t o f l o a t (b , bfp ’ high , −bfp ’ low ) ;
31
32 process ( c lk , r e s e t )
33 begin
34 i f ( r e s e t = ’1 ’) then
35 cu r r en t s <= s0 ; −−d e f a u l t s t a t e on r e s e t .
36 e l s i f ( r i s i n g e d g e ( c l k ) ) then
37 cu r r en t s <= next s ; −−s t a t e change .
38 else
39 end i f ;
40 end process ;
41
42 process ( cu r r en t s , operat ion nd )
90 APPENDIX B. HDL CODE
43 begin
44 case cu r r en t s i s
45 when s0 =>
46 ready <= ’0 ’ ;
47 i f ( operat ion nd = ’1 ’ ) then
48 next s <= s1 ;
49 else
50 next s <= s0 ;
51 end i f ;
52 when s1 =>
53 ready <= ’1 ’ ;
54 when s2 =>
55 end case ;
56 end process ;
57
58 process ( c lk , r e s e t , operat ion nd , operat ion , afp , bfp , sumfp )
59 begin
60 i f ( r e s e t = ’1 ’ ) then
61 sumfp <= ( others => ’ 0 ’ ) ;
62 e l s i f ( r i s i n g e d g e ( c l k ) ) then
63 i f ( operat ion nd = ’1 ’ ) then
64 i f ( operat i on = ”000000” ) then
65 sumfp <= afp + bfp ;
66 e l s i f ( operat i on = ”000001” ) then
67 sumfp <= afp − bfp ;
68 e l s i f ( operat i on = ”000010” ) then
69 sumfp <= afp ∗ bfp ;
70 e l s i f ( operat i on = ”000011” ) then
71 sumfp <= afp / bfp ;
72 else
73 end i f ;
74 end i f ;
75 else
76 end i f ;
77 sum <= t o s l v ( sumfp ) ;
78 end process ;
79
80
81 end Behaviora l ;
B.1. FLOATING-POINT DESIGN 91
B.1.5 Top-Level Design for Fixed-Point Library
1 l ibrary IEEE ;
2 use IEEE . STD LOGIC 1164 .ALL;
3 l ibrary IEEE PROPOSED;
4 use IEEE PROPOSED. f i x e d f l o a t t y p e s . a l l ;
5 use IEEE PROPOSED. f i x ed pkg . a l l ;
6
7
8 entity s y s t em i e e e f i x e d i s
9 generic (
10 BIT WIDTH : i n t e g e r := 32 ;
11 INTEGERWIDTH : i n t e g e r := 10 ;
12 FRACTIONWIDTH : i n t e g e r := 22
13 ) ;
14 port (
15 c l k : in s t d l o g i c ;
16 r e s e t : in s t d l o g i c ;
17 a : in s t d l o g i c v e c t o r (BIT WIDTH−1 downto 0) ;
18 b : in s t d l o g i c v e c t o r (BIT WIDTH−1 downto 0) ;
19 opera t i on : in s t d l o g i c v e c t o r (5 downto 0) ;
20 operat ion nd : in s t d l o g i c ;
21 sum add sub o : out s t d l o g i c v e c t o r (BIT WIDTH downto 0) ;
22 sum mult o : out s t d l o g i c v e c t o r (2∗INTEGERWIDTH−1+2∗
FRACTIONWIDTH downto 0) ;
23 ready : out s t d l o g i c
24 ) ;
25 end s y s t em i e e e f i x e d ;
26
27 architecture Behaviora l of s y s t em i e e e f i x e d i s
28
29 signal afp , bfp : s f i x e d (INTEGERWIDTH−1 downto −
FRACTIONWIDTH) := ( others => ’ 0 ’ ) ;
30 signal sum adder sub : s f i x e d ( afp ’ l e f t + 1 downto afp ’ r i g h t )
:= ( others => ’ 0 ’ ) ;
31 signal sum mult : s f i x e d ( afp ’ l e f t+bfp ’ l e f t+1 downto afp ’
r i g h t+bfp ’ r i g h t ) := ( others => ’ 0 ’ ) ;
32 −−s i g n a l sum div : u f i x e d ( afp ’ l e f t −bfp ’ r i g h t+1 downto afp
’ r i g h t−bfp ’ l e f t ) := ( o the r s => ’ 0 ’ ) ;
33 signal sum div : s f i x e d ( s f i x e d h i g h ( afp , ’ / ’ , bfp )
downto s f i x e d l ow ( afp , ’ / ’ , bfp ) ) ;
34
35 type s t a t e t yp e i s ( s0 , s1 , s2 ) ; −−type o f s t a t e machine .
36 signal cu r r en t s , nex t s : s t a t e t yp e ; −−curren t and next s t a t e
d e c l a r a t i on .
37
38 begin
39
40 afp <= t o s f i x e d (a , afp ’ l e f t , afp ’ r i g h t ) ;
92 APPENDIX B. HDL CODE
41 bfp <= t o s f i x e d (b , bfp ’ l e f t , bfp ’ r i g h t ) ;
42
43 process ( c lk , r e s e t )
44 begin
45 i f ( r e s e t = ’1 ’) then
46 cu r r en t s <= s0 ; −−d e f a u l t s t a t e on r e s e t .
47 e l s i f ( r i s i n g e d g e ( c l k ) ) then
48 cu r r en t s <= next s ; −−s t a t e change .
49 else
50 end i f ;
51 end process ;
52
53 process ( cu r r en t s , operat ion nd )
54 begin
55 case cu r r en t s i s
56 when s0 =>
57 ready <= ’0 ’ ;
58 i f ( operat ion nd = ’1 ’ ) then
59 next s <= s1 ;
60 else
61 next s <= s0 ;
62 end i f ;
63 when s1 =>
64 ready <= ’1 ’ ;
65 when s2 =>
66 end case ;
67 end process ;
68
69 process ( c lk , r e s e t )
70 begin
71 i f ( r e s e t = ’1 ’ ) then
72 sum add sub o <= ( others => ’ 0 ’ ) ;
73 sum mult o <= ( others => ’ 0 ’ ) ;
74 e l s i f ( r i s i n g e d g e ( c l k ) ) then
75 i f ( operat ion nd = ’1 ’ ) then
76 i f ( operat i on = ”000000” ) then
77 sum adder sub <= afp + bfp ;
78 e l s i f ( operat i on = ”000001” ) then
79 sum adder sub <= afp − bfp ;
80 e l s i f ( operat i on = ”000010” ) then
81 sum mult <= afp ∗ bfp ;
82 e l s i f ( operat i on = ”000011” ) then
83 −− sum div <= d i v i d e ( l => afp ,
84 −− r => bfp ,
85 −− r ound s t y l e => f i x ed round ,
86 −− g u a r d b i t s => 3) ;
87 −−sum div <= afp / b fp ;
88 else
89 end i f ;
B.1. FLOATING-POINT DESIGN 93
90 else
91 end i f ;
92 else
93 end i f ;
94 sum add sub o <= t o s l v ( sum adder sub ) ;
95 sum mult o <= t o s l v ( sum mult ) ;
96 end process ;
97
98
99 end Behaviora l ;
94 APPENDIX B. HDL CODE
B.1.6 Configurable Adder and Subtractor
1 l ibrary IEEE ;
2 use IEEE . STD LOGIC 1164 .ALL;
3 use IEEE .NUMERIC STD.ALL;
4 use IEEE . s t d l o g i c un s i g n ed .ALL;
5
6 entity con f i gu rab l e adde r sub i s
7 generic (
8 OPERANDLENGTH : i n t e g e r := 32 ;
9 EXPONENTLENGTH : i n t e g e r := 8 ;
10 MANTISSA LENGTH : i n t e g e r := 23
11 ) ;
12 Port ( a : in STD LOGIC VECTOR (OPERANDLENGTH−1 downto 0)
;
13 b : in STD LOGIC VECTOR (OPERANDLENGTH−1 downto 0)
;
14 c l k : in s t d l o g i c ;
15 r e s e t : in s t d l o g i c ;
16 new data : in s t d l o g i c ;
17 opera t i on : in s t d l o g i c v e c t o r (2 downto 0) ;
18 r e s u l t : out STD LOGIC VECTOR (OPERANDLENGTH−1 downto
0) ;
19 rdy : out s t d l o g i c ) ;
20 end con f i gu rab l e adde r sub ;
21
22 architecture Behaviora l of con f i gu rab l e adde r sub i s
23
24 type s t a t e t yp e i s ( s0 , s1 , s2 , s3 , s4 , s5 , s6 , s7 , s8 ) ; −−type o f
s t a t e machine .
25 signal cu r r en t s , nex t s : s t a t e t yp e ; −−curren t and next s t a t e
d e c l a r a t i on .
26
27 −− S i gna l s f o r adder
28 signal r e s exponent : s t d l o g i c v e c t o r (EXPONENTLENGTH−1
downto 0) := ( others => ’ 0 ’ ) ;
29 signal r e s mant i s s a : s t d l o g i c v e c t o r (MANTISSA LENGTH+1
downto 0) := ( others => ’ 0 ’ ) ;
30 signal r e s s i g n : s t d l o g i c := ’ 0 ’ ;
31 signal d i f f e xponen t : s t d l o g i c v e c t o r (EXPONENTLENGTH−1
downto 0) := ( others => ’ 0 ’ ) ;
32
33 signal exponent 1 : s t d l o g i c v e c t o r (EXPONENTLENGTH−1
downto 0) := ( others => ’ 0 ’ ) ;
34 signal exponent 2 : s t d l o g i c v e c t o r (EXPONENTLENGTH−1
downto 0) := ( others => ’ 0 ’ ) ;
35 signal mant i s sa 1 : s t d l o g i c v e c t o r (MANTISSA LENGTH−1
downto 0) := ( others => ’ 0 ’ ) ;
B.1. FLOATING-POINT DESIGN 95
36 signal mant i s sa 2 : s t d l o g i c v e c t o r (MANTISSA LENGTH−1
downto 0) := ( others => ’ 0 ’ ) ;
37
38 signal l o c a l o p : s t d l o g i c v e c t o r (2 downto 0) := (
others => ’ 0 ’ ) ;
39 signal norma l i za t i on : s t d l o g i c := ’ 0 ’ ;
40 signal de normal i zed 1 : s t d l o g i c := ’ 0 ’ ;
41 signal de normal i zed 2 : s t d l o g i c := ’ 0 ’ ;
42
43
44
45 begin
46
47 process ( c lk , r e s e t )
48 begin
49 i f ( r e s e t = ’1 ’) then
50 cu r r en t s <= s0 ; −−d e f a u l t s t a t e on r e s e t .
51 e l s i f ( r i s i n g e d g e ( c l k ) ) then
52 cu r r en t s <= next s ; −−s t a t e change .
53 end i f ;
54 end process ;
55
56 process ( cu r r en t s , a , b , new data , opera t i on )
57 begin
58 case cu r r en t s i s
59 when s0 =>
60 re s exponent <= ( others => ’ 0 ’ ) ;
61 r e s mant i s s a <= ( others => ’ 0 ’ ) ;
62 r e s s i g n <= ’0 ’ ;
63 d i f f e xponen t <= ( others => ’ 0 ’ ) ;
64 exponent 1 <= ( others => ’ 0 ’ ) ;
65 exponent 2 <= ( others => ’ 0 ’ ) ;
66 mant i s sa 1 <= ( others => ’ 0 ’ ) ;
67 mant i s sa 2 <= ( others => ’ 0 ’ ) ;
68 rdy <= ’0 ’ ;
69 de normal i zed 1 <= ’0 ’ ;
70 de normal i zed 2 <= ’0 ’ ;
71
72
73 i f new data = ’1 ’ then
74 next s <= s1 ;
75 else next s <= s0 ;
76 end i f ;
77 when s1 =>
78
79 i f ( ( a (OPERANDLENGTH−1) = ’0 ’ and b(OPERANDLENGTH−1) =
’0 ’ and opera t ion = ”000” )
80 or ( a (OPERANDLENGTH−1) = ’0 ’ and b(OPERANDLENGTH−1) =
’1 ’ and opera t ion = ”001” ) ) then
96 APPENDIX B. HDL CODE
81
82 exponent 1 <= a (OPERANDLENGTH−2 downto OPERANDLENGTH
−1−EXPONENTLENGTH) ;
83 exponent 2 <= b(OPERANDLENGTH−2 downto OPERANDLENGTH
−1−EXPONENTLENGTH) ;
84 mant i s sa 1 <= a (MANTISSA LENGTH−1 downto 0) ;
85 mant i s sa 2 <= b(MANTISSA LENGTH−1 downto 0) ;
86 l o c a l o p <= ”000” ;
87 r e s s i g n <= ’0 ’ ;
88 next s <= s3 ;
89
90 e l s i f ( ( a (OPERANDLENGTH−1) = ’0 ’ and b(OPERANDLENGTH−1) =
’0 ’ and opera t ion = ”001” )
91 or ( a (OPERANDLENGTH−1) = ’0 ’ and b(OPERANDLENGTH−1) =
’1 ’ and opera t ion = ”000” ) ) then
92
93 exponent 1 <= a (OPERANDLENGTH−2 downto OPERANDLENGTH
−1−EXPONENTLENGTH) ;
94 exponent 2 <= b(OPERANDLENGTH−2 downto OPERANDLENGTH
−1−EXPONENTLENGTH) ;
95 mant i s sa 1 <= a (MANTISSA LENGTH−1 downto 0) ;
96 mant i s sa 2 <= b(MANTISSA LENGTH−1 downto 0) ;
97 l o c a l o p <= ”001” ;
98 r e s s i g n <= ’0 ’ ;
99 next s <= s2 ;
100
101 e l s i f ( ( a (OPERANDLENGTH−1) = ’1 ’ and b(OPERANDLENGTH−1) =
’0 ’ and opera t ion = ”000” )
102 or ( a (OPERANDLENGTH−1) = ’1 ’ and b(OPERANDLENGTH−1) =
’1 ’ and opera t ion = ”001” ) ) then
103
104 exponent 1 <= b(OPERANDLENGTH−2 downto OPERANDLENGTH
−1−EXPONENTLENGTH) ;
105 exponent 2 <= a (OPERANDLENGTH−2 downto OPERANDLENGTH
−1−EXPONENTLENGTH) ;
106 mant i s sa 1 <= b(MANTISSA LENGTH−1 downto 0) ;
107 mant i s sa 2 <= a (MANTISSA LENGTH−1 downto 0) ;
108 l o c a l o p <= ”001” ;
109 r e s s i g n <= ’1 ’ ;
110 next s <= s2 ;
111
112 else
113
114 exponent 1 <= a (OPERANDLENGTH−2 downto OPERANDLENGTH
−1−EXPONENTLENGTH) ;
115 exponent 2 <= b(OPERANDLENGTH−2 downto OPERANDLENGTH
−1−EXPONENTLENGTH) ;
116 mant i s sa 1 <= a (MANTISSA LENGTH−1 downto 0) ;
117 mant i s sa 2 <= b(MANTISSA LENGTH−1 downto 0) ;
B.1. FLOATING-POINT DESIGN 97
118 l o c a l o p <= ”000” ;
119 r e s s i g n <= ’1 ’ ;
120 next s <= s3 ;
121
122 end i f ;
123
124 −− Di f f wi th s i gn b i t
125 when s2 =>
126 i f ( ( exponent 1&mant i s sa 1 ) > ( exponent 2&mant i s sa 2 ) )
then
127 r e s s i g n <= ’0 ’ ;
128 d i f f e xponen t <= exponent 1 − exponent 2 ;
129 next s <= s4 ;
130 else
131 r e s s i g n <= ’1 ’ ;
132 d i f f e xponen t <= exponent 2 − exponent 1 ;
133 next s <= s5 ;
134 end i f ;
135 −− Di f f w i thou t s i gn b i t
136 when s3 =>
137 i f ( ( exponent 1&mant i s sa 1 ) > ( exponent 2&mant i s sa 2 ) ) then
138 d i f f e xponen t <= exponent 1 − exponent 2 ;
139 next s <= s4 ;
140 else
141 d i f f e xponen t <= exponent 2 − exponent 1 ;
142 next s <= s5 ;
143 end i f ;
144
145
146 when s4 =>
147 −− S h i f t the sma l l e s t mantissa x p l a c e s to the r i g h t i f
the d i f f e r e n c e between the exponents are b i g g e r then 0
148 −− I f the mantissa i s s h i f t e d , i t i s denormalized , go ing
from 1. xxxx to 0 . xxxx
149 i f ( d i f f e xponen t > 0 and d i f f e xponen t <= MANTISSA LENGTH
−1) then
150 mant i s sa 2 <= s t d l o g i c v e c t o r ( unsigned ( mant i s sa 2 ) sr l
t o i n t e g e r ( unsigned ( d i f f e xponen t ) ) ) ;
151 mant i s sa 2 (MANTISSA LENGTH−t o i n t e g e r ( unsigned (
d i f f e xponen t ) ) ) <= ’1 ’ ;
152 de normal i zed 2 <= ’1 ’ ;
153 re s exponent <= exponent 1 ;
154 next s <= s6 ;
155 e l s i f ( d i f f e xponen t > (MANTISSA LENGTH−1) ) then
156 −− a >> b
157 re s exponent <= exponent 1 ;
158 r e s mant i s s a (MANTISSA LENGTH−1 downto 0) <= mant i s sa 1 ;
159 next s <= s8 ;
160 else
98 APPENDIX B. HDL CODE
161 next s <= s6 ;
162 end i f ;
163
164
165
166 when s5 =>
167 −− S h i f t the sma l l e s t mantissa x p l a c e s to the r i g h t i f
the d i f f e r e n c e between the exponents are b i g g e r then 0
168 −− I f the mantissa i s s h i f t e d , i t i s denormalized , go ing
from 1. xxxx to 0 . xxxx
169 i f ( d i f f e xponen t > 0 and d i f f e xponen t <= MANTISSA LENGTH
−1) then
170 mant i s sa 1 <= s t d l o g i c v e c t o r ( unsigned ( mant i s sa 1 ) sr l
t o i n t e g e r ( unsigned ( d i f f e xponen t ) ) ) ;
171 mant i s sa 1 (MANTISSA LENGTH−t o i n t e g e r ( unsigned (
d i f f e xponen t ) ) ) <= ’1 ’ ;
172 de normal i zed 1 <= ’1 ’ ;
173 re s exponent <= exponent 2 ;
174 next s <= s6 ;
175 e l s i f ( d i f f e xponen t > (MANTISSA LENGTH−1) ) then
176 −− a >> b
177 re s exponent <= exponent 2 ;
178 r e s mant i s s a (MANTISSA LENGTH−1 downto 0) <= mant i s sa 2 ;
179 next s <= s8 ;
180 else
181 next s <= s6 ;
182 re s exponent <= exponent 2 ;
183 end i f ;
184
185
186 when s6 =>
187 −− Check f o r add i t i on or s u b t r a c t i on
188 i f ( l o c a l o p (0 ) = ’0 ’ ) then
189 i f ( de normal i zed 1 = ’1 ’ ) then
190 r e s mant i s s a <= (”00” & mant i s sa 1 ) + ( ”01” &
mant i s sa 2 ) ;
191 e l s i f ( de normal i zed 2 = ’1 ’ ) then
192 r e s mant i s s a <= (”01” & mant i s sa 1 ) + ( ”00” &
mant i s sa 2 ) ;
193 else
194 r e s mant i s s a <= (”01” & mant i s sa 1 ) + ( ”01” &
mant i s sa 2 ) ;
195 end i f ;
196
197 e l s i f ( l o c a l o p (0 ) = ’1 ’ ) then
198 i f ( de normal i zed 1 = ’1 ’ ) then
199 r e s mant i s s a <= (”01” & mant i s sa 2 )−(”00” & mant i s sa 1
) ;−−(”00” & mantissa 1 ) − (”01” & mantissa 2 ) ;
200 e l s i f ( de normal i zed 2 = ’1 ’ ) then
B.1. FLOATING-POINT DESIGN 99
201 r e s mant i s s a <= (”01” & mant i s sa 1 ) − ( ”00” &
mant i s sa 2 ) ;
202 else
203 r e s mant i s s a (MANTISSA LENGTH−1 downto 0) <= mant i s sa 1
(MANTISSA LENGTH−1 downto 0) − mant i s sa 2 (
MANTISSA LENGTH−1 downto 0) ;
204 end i f ;
205 else
206 end i f ;
207
208 next s <= s7 ;
209
210 when s7 =>
211
212 −− Test i f the r e s u l t i n g mantissa i s normal ized
213 i f ( r e s mant i s s a (MANTISSA LENGTH+1 downto MANTISSA LENGTH)
> 1) then
214 −− Needs to be normalized , mantissa b i g g e r then 2
215 norma l i za t i on <= ’0 ’ ;
216
217 −− Exponent has to be added
218 re s exponent <= res exponent + re s mant i s s a (
MANTISSA LENGTH+1 downto MANTISSA LENGTH) − 1 ;
219 −− Resu l t i ng mantissa i s s h i f t e d 1 to r i gh t , normal ized
220 r e s mant i s s a (MANTISSA LENGTH−1 downto 0) <= res mant i s s a
(MANTISSA LENGTH downto 1) ;
221
222 e l s i f ( r e s mant i s s a (MANTISSA LENGTH+1 downto
MANTISSA LENGTH) = 1) then
223 −− Mantissa i s normal ized
224 norma l i za t i on <= ’0 ’ ;
225 −−re s mant i s sa <= res mant i s sa (MANTISSA LENGTH−1 downto
0) ;
226
227 else
228 norma l i za t i on <= ’0 ’ ;
229
230 −− Find the p o s i t i o n o f the l e ad in g one
231 −− I s t h i s the b e s t way to do i t ????
232 for i in MANTISSA LENGTH−1 downto 0 loop
233 i f ( r e s mant i s s a ( i ) = ’1 ’ ) then
234 −− Leading one found
235 i f ( (MANTISSA LENGTH−i ) > r e s exponent ) then
236 r e s mant i s s a <= ( others => ’ 0 ’ ) ;
237 else
238 r e s mant i s s a <= s t d l o g i c v e c t o r ( unsigned (
r e s mant i s s a ) s l l (MANTISSA LENGTH−i ) ) ;
239 re s exponent <= res exponent − (MANTISSA LENGTH−i )
;
100 APPENDIX B. HDL CODE
240 end i f ;
241
242 −− a adder (EXPONENT LENGTH−1 downto 0) <=
big exponen t ;
243 −− b adder (4 downto 0) <= s t d l o g i c v e c t o r (
to uns i gned (MANTISSA LENGTH−i , 5 ) ) ;
244 −− add adder <= ’0 ’ ;
245 −− ce adder <= ’1 ’ ;
246 exit ;
247 else
248
249 end i f ;
250 end loop ;
251
252 end i f ;
253 next s <= s8 ;
254
255 when s8 =>
256 r e s u l t <= r e s s i g n & res exponent & re s mant i s s a (
MANTISSA LENGTH−1 downto 0) ;
257 rdy <= ’1 ’ ;
258 next s <= s0 ;
259 when others =>
260
261 next s <= s0 ;
262 end case ;
263 end process ;
264
265
266 end Behaviora l ;
B.1. FLOATING-POINT DESIGN 101
B.1.7 Configurable Multiplier
1
2 l ibrary IEEE ;
3 use IEEE . STD LOGIC 1164 .ALL;
4 use IEEE . s t d l o g i c un s i g n ed .ALL;
5 use IEEE .NUMERIC STD.ALL;
6
7 entity c o n f i g u r a b l e mu l t i p l i e r i s
8 generic (
9 OPERANDLENGTH : i n t e g e r := 32 ;
10 EXPONENTLENGTH : i n t e g e r := 8 ;
11 MANTISSA LENGTH : i n t e g e r := 23
12 ) ;
13 Port ( a : in STD LOGIC VECTOR (OPERANDLENGTH−1 downto 0)
;
14 b : in STD LOGIC VECTOR (OPERANDLENGTH−1 downto 0)
;
15 c l k : in s t d l o g i c ;
16 −−c l k a : in s t d l o g i c ;
17 r e s e t : in s t d l o g i c ;
18 new data : in s t d l o g i c ;
19 −−opera t ion : in s t d l o g i c v e c t o r (5 downto 0) ;
20 r e s u l t : out STD LOGIC VECTOR (OPERANDLENGTH−1 downto
0) ;
21 rdy : out s t d l o g i c ) ;
22 end c o n f i g u r a b l e mu l t i p l i e r ;
23
24 architecture Behaviora l of c o n f i g u r a b l e mu l t i p l i e r i s
25
26 signal a mant i s sa : s t d l o g i c v e c t o r (MANTISSA LENGTH downto 0)
:= ( others => ’ 0 ’ ) ;
27 signal b mantissa : s t d l o g i c v e c t o r (MANTISSA LENGTH downto 0)
:= ( others => ’ 0 ’ ) ;
28 signal r e s mant i s s a : s t d l o g i c v e c t o r (MANTISSA LENGTH∗2+1
downto 0) := ( others => ’ 0 ’ ) ;
29 signal r e s exponent : s t d l o g i c v e c t o r (EXPONENTLENGTH−1 downto
0) := ( others => ’ 0 ’ ) ;
30 signal r e s s i g n : s t d l o g i c := ’ 0 ’ ;
31
32 constant ZERO : s t d l o g i c v e c t o r (OPERANDLENGTH−1 downto
0) := ( others => ’ 0 ’ ) ;
33 constant INFINITY : s t d l o g i c v e c t o r (EXPONENTLENGTH−1 downto
0) := ( others => ’ 1 ’ ) ;
34 constant EXPONENTZERO : s t d l o g i c v e c t o r (EXPONENTLENGTH−1
downto 0) := ( others => ’ 0 ’ ) ;
35
36 −− S i gna l s f o r adder
102 APPENDIX B. HDL CODE
37 −−s i g n a l a adder : s t d l o g i c v e c t o r (23 downto 0) := ( o the r s
=> ’ 0 ’ ) ;
38 −−s i g n a l b adder : s t d l o g i c v e c t o r (23 downto 0) := ( o the r s
=> ’ 0 ’ ) ;
39 −−s i g n a l add adder : s t d l o g i c := ’0 ’ ;
40 −−s i g n a l ce adder : s t d l o g i c := ’0 ’ ;
41 −−s i g n a l r e s u l t a d d e r : s t d l o g i c v e c t o r (24 downto 0) := ( o the r s
=> ’ 0 ’ ) ;
42
43
44 type s t a t e t yp e i s ( s0 , s1 , s2 , s3 , s4 ) ; −−type o f s t a t e machine .
45 signal cu r r en t s , nex t s : s t a t e t yp e ; −−curren t and next s t a t e
d e c l a r a t i on .
46
47 component adder
48 port (
49 a : in s t d l o g i c v e c t o r (23 downto 0) ;
50 b : in s t d l o g i c v e c t o r (23 downto 0) ;
51 c l k : in s t d l o g i c ;
52 add : in s t d l o g i c ;
53 ce : in s t d l o g i c ;
54 s : out s t d l o g i c v e c t o r (24 downto 0)
55 ) ;
56 end component ;
57
58 begin
59
60
61 −− l o c a l f r a c t i o n a d d e r s u b : adder
62 −− por t map(
63 −− a => a adder ,
64 −− b => b adder ,
65 −− c l k => c l k a ,
66 −− add => add adder ,
67 −− ce => ce adder ,
68 −− s => r e s u l t a d d e r
69 −− ) ;
70
71 process ( c lk , r e s e t )
72 begin
73 i f ( r e s e t = ’1 ’) then
74 cu r r en t s <= s0 ; −−d e f a u l t s t a t e on r e s e t .
75 e l s i f ( r i s i n g e d g e ( c l k ) ) then
76 −−c l k node <= c l k ;
77 cu r r en t s <= next s ; −−s t a t e change .
78 −−e l s e
79 −− c l k node <= c l k ;
80 end i f ;
81 end process ;
B.1. FLOATING-POINT DESIGN 103
82
83 process ( cu r r en t s , a , b , new data )
84 begin
85 case cu r r en t s i s
86 when s0 =>
87 a mant i s sa <= ( others => ’ 0 ’ ) ;
88 b mantissa <= ( others => ’ 0 ’ ) ;
89 r e s mant i s s a <= ( others => ’ 0 ’ ) ;
90 re s exponent <= ( others => ’ 0 ’ ) ;
91 r e s s i g n <= ’0 ’ ;
92 r e s u l t <= ( others => ’ 0 ’ ) ;
93 r e s s i g n <= ’0 ’ ;
94 rdy <= ’0 ’ ;
95
96 i f new data = ’1 ’ then
97 next s <= s1 ;
98 a mant i s sa (MANTISSA LENGTH−1 downto 0) <= a (
MANTISSA LENGTH−1 downto 0) ;
99 a mant i s sa (MANTISSA LENGTH) <= ’1 ’ ;
100 b mantissa (MANTISSA LENGTH−1 downto 0) <= b(
MANTISSA LENGTH−1 downto 0) ;
101 b mantissa (MANTISSA LENGTH) <= ’1 ’ ;
102
103 −−re s mant i s sa <= a(MANTISSA LENGTH−1 downto 0) ∗ b (
MANTISSA LENGTH−1 downto 0) ;
104 next s <= s1 ;
105 else next s <= s0 ;
106 end i f ;
107 when s1 =>
108 r e s mant i s s a <= ( others => ’ 0 ’ ) ;
109 re s exponent <= ( others => ’ 0 ’ ) ;
110 r e s s i g n <= ’0 ’ ;
111 r e s u l t <= ( others => ’ 0 ’ ) ;
112 r e s s i g n <= ’0 ’ ;
113 rdy <= ’0 ’ ;
114
115 r e s mant i s s a <= a mant i s sa ∗ b mantissa ;
116 re s exponent <= a (OPERANDLENGTH−2 downto OPERANDLENGTH
−1−EXPONENTLENGTH) + b(OPERANDLENGTH−2 downto
OPERANDLENGTH−1−EXPONENTLENGTH) − (2∗∗ (
EXPONENTLENGTH−1)−1) ;
117 next s <= s2 ;
118 −− Check i f o ve r f l ow => INFINITY
119 −− i f ( a (OPERANDLENGTH−1) = ’1 ’ and b (OPERANDLENGTH−1) =
’1 ’ ) then
120 −− −−re s exponent <= a(OPERANDLENGTH−2 downto
OPERANDLENGTH−1−EXPONENTLENGTH) + b (OPERANDLENGTH−2
downto OPERANDLENGTH−1−EXPONENTLENGTH) − (2∗∗(
EXPONENT LENGTH−1)−1) ;
104 APPENDIX B. HDL CODE
121 −− r e s u l t (OPERANDLENGTH−2 downto OPERANDLENGTH−1−
EXPONENTLENGTH) <= INFINITY ;
122 −− rdy <= ’1 ’ ;
123 −− nex t s <= s0 ;
124 −− −− Check i f r e s u l t i s 0
125 −− e l s i f ( a (OPERANDLENGTH−2 downto OPERANDLENGTH−1−
EXPONENTLENGTH) = EXPONENT ZERO or b (OPERANDLENGTH−2
downto OPERANDLENGTH−1−EXPONENTLENGTH) = EXPONENT ZERO)
then
126 −− −−r e s u l t (OPERANDLENGTH−2 downto OPERANDLENGTH−1−
EXPONENTLENGTH) <= INFINITY ;
127 −− nex t s <= s0 ;
128 −− r e s u l t <= ZERO;
129 −− rdy <= ’1 ’ ;
130 −− e l s e
131 −− r e s u l t <= ZERO;
132 −− nex t s <= s0 ;
133 −− rdy <= ’1 ’ ;
134 −− end i f ;
135 when s2 =>
136 a mant i s sa <= ( others => ’ 0 ’ ) ;
137 b mantissa <= ( others => ’ 0 ’ ) ;
138 −−re s exponent <= ( o the r s => ’ 0 ’ ) ;
139 r e s s i g n <= ’0 ’ ;
140 r e s u l t <= ( others => ’ 0 ’ ) ;
141 rdy <= ’0 ’ ;
142 r e s s i g n <= a (OPERANDLENGTH−1) xor b(OPERANDLENGTH−1) ;
143 i f ( r e s mant i s s a (MANTISSA LENGTH∗2+1 downto MANTISSA LENGTH
∗2) = ”01” ) then
144 next s <= s3 ;
145 e l s i f ( r e s mant i s s a (MANTISSA LENGTH∗2+1 downto
MANTISSA LENGTH∗2) = ”10” ) then
146 r e s mant i s s a <= s t d l o g i c v e c t o r ( unsigned ( r e s mant i s s a )
sr l 1) ;
147 re s exponent <= res exponent + 1 ;
148 next s <= s3 ;
149 e l s i f ( r e s mant i s s a (MANTISSA LENGTH∗2+1 downto
MANTISSA LENGTH∗2) = ”11” ) then
150 r e s mant i s s a <= s t d l o g i c v e c t o r ( unsigned ( r e s mant i s s a )
sr l 1) ;
151 re s exponent <= res exponent + 2 ;
152 next s <= s3 ;
153 else
154 next s <= s0 ;
155 end i f ;
156 when s3 =>
157 a mant i s sa <= ( others => ’ 0 ’ ) ;
158 b mantissa <= ( others => ’ 0 ’ ) ;
159 −−re s mant i s sa <= res mant i s sa ;
B.1. FLOATING-POINT DESIGN 105
160 re s exponent <= res exponent ;
161 r e s s i g n <= r e s s i g n ;
162 r e s u l t <= ( others => ’ 0 ’ ) ;
163 −−rdy <= ’1 ’ ;
164
165 −− Check f o r rounding
166 i f ( r e s mant i s s a (MANTISSA LENGTH∗2−MANTISSA LENGTH−1) =
’1 ’ ) then
167 r e s mant i s s a (MANTISSA LENGTH∗2−1 downto MANTISSA LENGTH
∗2−MANTISSA LENGTH) <= res mant i s s a (MANTISSA LENGTH
∗2−1 downto MANTISSA LENGTH∗2−MANTISSA LENGTH) + ’ 1 ’ ;
168 else
169 end i f ;
170
171 next s <= s4 ;
172
173
174 when s4 =>
175 r e s u l t (OPERANDLENGTH−1) <= r e s s i g n ;
176 r e s u l t (OPERANDLENGTH−2 downto OPERANDLENGTH−1−
EXPONENTLENGTH) <= res exponent (EXPONENTLENGTH−1
downto 0) ;
177 r e s u l t (MANTISSA LENGTH−1 downto 0) <= res mant i s s a (
MANTISSA LENGTH∗2−1 downto MANTISSA LENGTH∗2−
MANTISSA LENGTH) ;
178 rdy <= ’1 ’ ;
179
180 next s <= s0 ;
181 when others =>
182
183 end case ;
184 end process ;
185
186 end Behaviora l ;
106 APPENDIX B. HDL CODE
B.2 Floating-Point Unit Testbench
1 ‘timescale 1ns / 1ps
2
3 module system tb ;
4
5 // Inputs
6 reg [ 1 9 : 0 ] a ;
7 reg [ 1 9 : 0 ] b ;
8 reg [ 2 : 0 ] ope ra t i on ;
9 reg operat ion nd ;
10 reg c l k ;
11 reg r e s e t ;
12
13 // Outputs
14 wire [ 1 9 : 0 ] r e s u l t i p ;
15 wire rdy ;
16 wire underf low ;
17 wire over f l ow ;
18 wire i n v a l i d op ;
19 wire d i v i d e by z e r o ;
20
21 integer d a t a f i l e ; // f i l e hand ler
22 integer d a t a f i l e r e s u l t ; // f i l e hand ler
23 integer s c a n f i l e ; // f i l e hand ler
24 reg [ 2 : 0 ] s t a t e ;
25 integer counter ;
26
27 ‘define NULL 0
28
29 parameter zero=0, one=1, two=2, three=3, four =4;
30
31 // I n s t a n t i a t e the Unit Under Test (UUT)
32 system uut (
33 . c l k ( c l k ) ,
34 . r e s e t ( r e s e t ) ,
35 . operat ion nd ( operat ion nd ) ,
36 . opera t i on ( operat i on ) ,
37 . a ( a ) ,
38 . b(b) ,
39 . r e s u l t i p ( r e s u l t i p ) ,
40 . rdy ( rdy ) ,
41 . underf low ( underf low ) ,
42 . over f l ow ( over f l ow ) ,
43 . i n va l i d op ( i nva l i d op ) ,
44 . d i v i d e by z e r o ( d i v i d e by z e r o )
45 ) ;
46
47 i n i t i a l begin
B.2. FLOATING-POINT UNIT TESTBENCH 107
48 // I n i t i a l i z e Inputs
49 c l k = 0 ;
50 r e s e t = 1 ;
51 a = 0 ;
52 b = 0 ;
53 opera t i on = 0 ;
54 operat ion nd = 0 ;
55 counter = 0 ;
56
57 d a t a f i l e = $fopen ( ” fpu custom . dat” , ” r ” ) ;
58 d a t a f i l e r e s u l t = $fopen ( ” . . /MATLAB/
f p u t e s t b e n c h r e s u l t s i p 6 4 . txt ” , ”w” ) ;
59 i f ( d a t a f i l e == ‘NULL) begin
60 $display ( ” d a t a f i l e handle i s NULL” ) ;
61 $finish ;
62 end
63
64 i f ( d a t a f i l e r e s u l t == ‘NULL) begin
65 $display ( ” d a t a f i l e handle i s NULL” ) ;
66 $finish ;
67 end
68 #200;
69
70 r e s e t = 0 ;
71
72 repeat (5000) @ (posedge c l k ) ;
73 $fclose ( d a t a f i l e ) ;
74 $fclose ( d a t a f i l e r e s u l t ) ;
75 r e s e t = 1 ;
76 #100
77 $finish ;
78
79 end
80
81 always @(posedge rdy ) begin
82 #20
83 $fwrite ( d a t a f i l e r e s u l t , ”%h %d\n” , r e s u l t i p , operat i on ) ;
84 end
85
86 always @(posedge s t a t e == three ) begin
87 i f ( counter == 100)
88 r e s e t = 1 ;
89 else
90 counter = counter + 1 ;
91 end
92
93 always @( s t a t e ) begin
94 case ( s t a t e )
95 zero :
108 APPENDIX B. HDL CODE
96 operat ion nd = 0 ;
97 one :
98 s c a n f i l e = $ f s c an f ( d a t a f i l e , ”%b %b\n” , a , b ) ;
99 two :
100 i f ( operat i on == 3)
101 opera t i on = 0 ;
102 else
103 opera t i on = operat ion + 1 ;
104 three :
105 operat ion nd = 1 ;
106 four :
107 operat ion nd = 0 ;
108 endcase
109 end
110
111 always @(posedge c l k or posedge r e s e t ) begin
112 i f ( r e s e t == 1)
113 s t a t e = zero ;
114 else
115 case ( s t a t e )
116 zero :
117 i f ( r e s e t == 1)
118 s t a t e = zero ;
119 else i f ( rdy == 0)
120 s t a t e = one ;
121 else
122 s t a t e = zero ;
123 one :
124 s t a t e = two ;
125 two :
126 s t a t e = three ;
127 three :
128 s t a t e = four ;
129 four :
130 // s t a t e = zero ;
131 i f ( rdy == 0)
132 s t a t e = four ;
133 else
134 s t a t e = zero ;
135 endcase ;
136 end
137
138 always #5 c lk = ! c l k ;
139
140 endmodule
Appendix C
Diagrams
109
110 APPENDIX C. DIAGRAMS
Figure C.1: Diagram of Floating-Point Implementation.
Appendix D
Calculations
D.1 Calculated Mantissa Bit-Width for Floating-
Point Numbers
m ≥ EUi − dlog2(| ∆Ui |)e+ 1
177.mesa:
m ≥ dlog2(9.8658)e − dlog2(1E − 4)e+ 1 = 4− (−13) + 1 = 18
m ≥ dlog2(3.21E − 4)e − dlog2(1E − 6)e+ 1 = (−11)− (−19) + 1 = 9
179.art:
m ≥ dlog2(99.2831228)e − dlog2(1E − 7)e+ 1 = 7− (−23) + 1 = 31
m ≥ dlog2(28.3296161)e − dlog2(1E − 7)e+ 1 = 5− (−23) + 1 = 29
183.equake:
m ≥ dlog2(32.6156)e − dlog2(1E − 4)e+ 1 = 6− (−13) + 1 = 20
m ≥ dlog2(9.04E − 35)e − dlog2(1E − 37)e+ 1 = (−113)− (−122) + 1 = 10
188.ammp:
m ≥ dlog2(20421.656321)e − dlog2(1E − 6)e+ 1 = 15− (−19) + 1 = 35
m ≥ dlog2(0.2290)e − dlog2(1E − 6)e+ 1 = (−2)− (−19) + 1 = 18
111
112 APPENDIX D. CALCULATIONS
D.2 Calculated Fraction Bit-Width for Fixed-
Point Numbers
l ≥ dlog2(| ∆Ui |)e+ 1
177.mesa:
l ≥| dlog2(1E − 6)e | +1 = 19 + 1 = 20
179.art:
l ≥| dlog2(1E − 7)e | +1 = 23 + 1 = 24
183.equake:
l ≥| dlog2(1E − 37)e | +1 = 122 + 1 = 123
188.ammp:
l ≥| dlog2(1E − 6)e | +1 = 19 + 1 = 20
Appendix E
File Hierarchy
master-thesis
fpu core
fpu core.xise
fpu double
fpu100
MATLAB
Presentation
Report
images
Sources
MasterThesis.pdf
Result tests
Attached to this thesis is a zip file containing the file hierarchy shown
above. The fpu core folder contains all HDL design. The file named fpu core.xise
can be opened in Xilinx ISE Design Suite 14.7. The folders fpu double and
fpu100 contain the double and single precision floating-point units by Open-
Cores. The MATLAB folder contains all Matlab scripts and functions. The
Presentation folder contains two presentations that was used, presenting this
thesis to younger students. The Report folder contains all images, some
sources and the LATEXfiles used to generate this article. The Result tests
folder contains all reports generated by XPower for different floating-point
units.
113
