Posits: An Alternative to Floating Point Calculations by Wagner, Matt
Rochester Institute of Technology 
RIT Scholar Works 
Theses 
5-2020 
Posits: An Alternative to Floating Point Calculations 
Matt Wagner 
mw6500@rit.edu 
Follow this and additional works at: https://scholarworks.rit.edu/theses 
Recommended Citation 
Wagner, Matt, "Posits: An Alternative to Floating Point Calculations" (2020). Thesis. Rochester Institute of 
Technology. Accessed from 
This Master's Project is brought to you for free and open access by RIT Scholar Works. It has been accepted for 
inclusion in Theses by an authorized administrator of RIT Scholar Works. For more information, please contact 
ritscholarworks@rit.edu. 




Submitted in partial fulfillment




Mr. Mark A. Indovina, Senior Lecturer
Graduate Research Advisor, Department of Electrical and Microelectronic Engineering
Dr. Sohail A. Dianat, Professor
Department Head, Department of Electrical and Microelectronic Engineering
DEPARTMENT OF ELECTRICAL AND MICROELECTRONIC ENGINEERING
KATE GLEASON COLLEGE OF ENGINEERING
ROCHESTER INSTITUTE OF TECHNOLOGY
ROCHESTER, NEW YORK
MAY, 2020
I dedicate this paper to everyone along the way.
Declaration
Declaration I hereby declare that except where specific reference is made to the work of others,
that all content of this Graduate Paper are original and have not been submitted in whole or
in part for consideration for any other degree or qualification in this, or any other University.
This Graduate Project is the result of my own work and includes nothing which is the outcome




I would like to thank everyone that has helped me get to where I am. I would like to specifically
thank my advisor, Mark Indovina, for helping me put in the hours and always offering help.
Abstract
Floating point arithmetic is one of several methods of performing computations in digital de-
signs; others include integer and fixed point computations. Fixed point utilizes a method com-
parable to scientific notation in the binary domain. In terms of computations, floating point is
by far the most prevalent in today’s digital designs. Between the support offered by compilers,
as well as for ready-to-use IP blocks, floating point units (FPU’s) are a de-facto standard for
most processors. Despite its prevalence in modern designs, floating point has many flaws. One
of the most common is the use of not-a-numbers (NaN’s). These are meant to provide a way of
signaling invalid operation, however the excessive amount of them wastes usable bit patterns.
As an alternative to floating point, a system named "Universal Numbers" or UNUMs was de-
veloped. This system consists of three different types, however for hardware compatibility, the
Type III provides the best stand in for floating point. This system eliminates the NaN problem
by only using one bit pattern, and also provides many other inherent benefits.
Contents
Contents v
List of Figures ix
List of Tables x
1 Introduction 1
1.1 Research Goals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.2 Organization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
2 Floating Point Representation 4
2.1 Development . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
2.2 Format . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.3 Conversion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
2.3.1 Determine Sign . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
2.3.2 Determine Exponent . . . . . . . . . . . . . . . . . . . . . . . . . . 6
2.3.3 Determine Fractional Component . . . . . . . . . . . . . . . . . . . 7
2.4 Floating Point Exceptions . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
2.4.1 Zero and Infinity . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
2.4.2 Not a Number (NaN) . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.4.3 Subnormal Numbers . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.5 Floating Point Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.5.1 Meaningless Exceptions . . . . . . . . . . . . . . . . . . . . . . . . 10
2.5.2 Unnecessary Hardware . . . . . . . . . . . . . . . . . . . . . . . . . 10
2.5.3 Overflow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
2.5.4 Round-off . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
3 Unum Representation 12
3.1 Type I Unums . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
3.1.1 Format . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
3.1.1.1 U-Bit . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
3.1.1.2 Exponent & Fraction Size Bits . . . . . . . . . . . . . . . 13
Contents vi
3.1.2 Type I Unum Characteristics . . . . . . . . . . . . . . . . . . . . . . 14
3.2 Type II Unums . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
3.2.1 Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
3.2.2 Operations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
3.2.3 Accuracy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
3.2.4 Hardware Cost . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
3.3 Interval Arithmetic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
4 Posit Representation 20
4.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
4.2 Format . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
4.3 Regime . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
4.4 Scaling Factor . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
4.5 Exponent . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
4.6 Fraction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
4.7 Exceptions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
4.8 Benefits . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
5 Floating Point Core Design 26
5.1 Hierarchical Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
5.2 Data Path . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
5.2.1 Unpack Module . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
5.2.2 Exception Handler . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
5.2.3 Adder/Subtractor . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
5.2.4 Multiplication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
5.2.4.1 Hardware Multiplier . . . . . . . . . . . . . . . . . . . . . 31
5.2.5 Divider . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
5.2.5.1 Hardware Divider . . . . . . . . . . . . . . . . . . . . . . 32
5.3 Post-Processor . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
5.3.1 Rounding Block . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
5.3.2 Packing Block . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
5.4 Timing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
6 Posit Core Design 36
6.1 Hierarchical Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
6.1.1 Control Logic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
6.1.2 Extraction Block . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
6.1.3 Scaling Block . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
6.1.3.1 Addition/Subtraction Scaling . . . . . . . . . . . . . . . . 38
6.1.3.2 Multiplication/Division Scaling . . . . . . . . . . . . . . . 39
6.2 Operation Block . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
Contents vii
6.3 Decoding Block . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
6.4 Timing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
7 Testbench Design 42
7.1 Testbench . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
7.1.1 UVM Testbench . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
7.1.1.1 Top Level . . . . . . . . . . . . . . . . . . . . . . . . . . 43
7.1.1.2 Test Case . . . . . . . . . . . . . . . . . . . . . . . . . . 44
7.1.1.3 UVM Sequence . . . . . . . . . . . . . . . . . . . . . . . 44
7.1.1.4 UVM Environment . . . . . . . . . . . . . . . . . . . . . 45
7.1.1.5 UVM Scoreboard . . . . . . . . . . . . . . . . . . . . . . 45
7.1.1.6 UVM Agent . . . . . . . . . . . . . . . . . . . . . . . . . 46
7.1.1.7 UVM Sequencer . . . . . . . . . . . . . . . . . . . . . . 46
7.1.1.8 UVM Driver . . . . . . . . . . . . . . . . . . . . . . . . 47
7.1.1.9 UVM Monitor . . . . . . . . . . . . . . . . . . . . . . . . 47
7.2 Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
8 Hardware Results 50
8.1 Hardware Synthesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
8.2 DFT Insertion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
8.3 Analysis Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
9 Conclusions and Further Research 54
9.1 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
9.2 Future Research . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
References 57
I FPU Source Code I-1
I.1 FPU Top Level . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . I-1
I.2 FPU Data Path . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . I-6
I.3 FPU Unpack . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . I-15
I.4 FPU Exception Handler . . . . . . . . . . . . . . . . . . . . . . . . . . . . . I-17
I.5 FPU Adder/Subtractor . . . . . . . . . . . . . . . . . . . . . . . . . . . . . I-22
I.6 FPU Multiplier . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . I-30
I.7 FPU Divider . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . I-38
I.8 FPU Post Processor . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . I-46
I.9 FPU Rounding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . I-49
I.10 FPU Packing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . I-51
I.11 FPU Test Frame . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . I-53
Contents viii
II PAU Source Code II-1
II.1 PAU Top Level . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . II-1
II.2 PAU Extraction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . II-9
II.3 PAU Scaler . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . II-12
II.4 PAU Operator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . II-17
II.5 PAU Decoder . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . II-24
II.6 PAU Test Frame . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . II-28
III Testbench Source Code III-1
III.1 Test Case . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . III-1
III.2 Test Sequence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . III-3
III.3 Test Input Packet . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . III-5
III.4 Test Output Packet . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . III-7
III.5 Test Environment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . III-8
III.6 Test Agent . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . III-10
III.7 Test Sequencer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . III-12
III.8 Test Driver . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . III-13
III.9 Test Monitor . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . III-16
III.10Test Scoreboard . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . III-18
III.11Test Interface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . III-21
III.12Test Package . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . III-22
IV Miscellaneous Code IV-1
IV.1 Analysis Script . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . IV-1
IV.2 Common PAU/FPU Blocks . . . . . . . . . . . . . . . . . . . . . . . . . . . IV-13
IV.2.1 Leading One Detector . . . . . . . . . . . . . . . . . . . . . . . . . IV-13
IV.2.2 Vedic Multiplier . . . . . . . . . . . . . . . . . . . . . . . . . . . . IV-16
IV.2.3 Integer Divider . . . . . . . . . . . . . . . . . . . . . . . . . . . . . IV-22
List of Figures
2.1 Floating Point Bit-Pattern . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
3.1 Type I Unum Bit Pattern . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
3.2 Type II Projected Number Line . . . . . . . . . . . . . . . . . . . . . . . . . 16
3.3 Floating Point vs. Type II Accuracy . . . . . . . . . . . . . . . . . . . . . . 18
4.1 Posit Bit-Pattern . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
4.2 Detailed Posit Bit-Pattern . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
4.3 Floating Point vs. Posit Accuracy [1] . . . . . . . . . . . . . . . . . . . . . . 25
5.1 FPU Top Level Block Diagram . . . . . . . . . . . . . . . . . . . . . . . . . 26
5.2 FPU Data Path . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
5.3 Vedic 2x2 Multiplier . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
5.4 Vedic 8x8 Multiplier Block . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
5.5 FPU Post-Processor . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
5.6 FPU Timing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
6.1 PAU Block Diagram . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
6.2 PAU Pipeline Timing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
7.1 Typical UVM Testbench Architecture . . . . . . . . . . . . . . . . . . . . . 44
8.1 Decimal Error Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
List of Tables
2.1 Floating Point Formats . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
3.1 Type I Unum Sizing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
4.1 Posit Regime Decoding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
4.2 Posit useed . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
4.3 Floating Point vs. Posit Dynamic Range . . . . . . . . . . . . . . . . . . . . 25
5.1 FPU Interface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
5.2 FP Exception Handler Operation . . . . . . . . . . . . . . . . . . . . . . . . 29
8.1 Hardware Synthesis Results . . . . . . . . . . . . . . . . . . . . . . . . . . 51
Chapter 1
Introduction
Digital processors require a format for representing non-integer numbers. The standard for the
past several decades has been floating point; a binary version of scientific notation. This format
has seen few modifications over the years, either different bit-widths or new operations such
as fused operations. The format itself was created by committee in the mid-eighties according
to the technology of the day. That has changed drastically while the format has not leading to
a whole host of optimizations that could be made.
While various other formats exist for specialized applications, fixed point for example, few
exist to challenge fixed point as a standard. Universal Numbers (Unums) were proposed as an
alternative. They were initially developed as two different types. The first type was created as
a superset to floating point, allowing greater range and accuracy, however they would require
even more hardware than floating point. The second type is based upon the bit patterns being
positional rather than on actual data conversion. This allows for extremely fast computations,
however it is based on look up tables which limits the size of the operations. This eventually
led to the formation of a third Unum type: Posits.
This paper discusses a hardware implementation of the Posit number system as compared
1.1 Research Goals 2
to a floating point implementation. Both floating point and posit systems will be tested in a
random test environment and compared for accuracy, power and various other parameters.
1.1 Research Goals
The intent of designing and testing both a floating point and posit unit is to show a side-by-side
comparison in order to understand the costs and benefits of each. A summary of the research
tasks is listed below:
• Gain an understanding of floating point merits and problems.
• Gain an understanding of posit merits and problems.
• Design floating point and posit arithmetic cores.
• Fully test and analyze both cores.
• Compare analysis results and discuss.
1.2 Organization
The organization for the research is listed below:
• Chapter 2: A discussion of the floating point numeric system. This will also discuss the
merits and issues that arise from it.
• Chapter 3: A discussion of the Unum numeric system. This will cover both Type I and
II Unums and their respective properties.
• Chapter 4: A discussion of the Posit numeric system.
1.2 Organization 3
• Chapter 5: The design of the floating point core.
• Chapter 6: The design of the posit core.
• Chapter 7: The design of the testing and verification environment
• Chapter 8: Results of hardware testing and analysis




In this chapter, the Floating Point number system is discussed.
2.1 Development
The Floating Point (FP) number system was developed into a standard by the IEEE in the mid-
eighties with the introduction of IEEE-STD 754 [2]. Before this, it was up to the design team
or company to develop a numeric system for representing decimal non-integers. A common
method still used today for custom purposes is the fixed point system. This essential specifies
a certain number of bits to represent the integer portion of a number, while the rest are devoted
to the fractional part. This system is very hardware friendly but very limited in its dynamic
range. The origins of floating point extend much earlier than the standard, however the im-
plementations are quite similar with bit-widths of the individual components being the major
difference.
2.2 Format 5
Table 2.1: Floating Point Formats
Format Width Sign Exponent Fraction Range
Single 32 1 8 23 3.4∗10±38
Double 64 1 11 52 1.8∗10±308
sign exponent f raction
Figure 2.1: Floating Point Bit-Pattern
2.2 Format
Floating Point numbers are essentially a binary version of scientific notation. The components
are similar with certain bits dedicated to expressing the fraction, the exponent and one to
express the sign of the number. There are currently three commonly used formats for Floating
Point: single and double precision. The bit-widths and properties of the components of each
format are shown in Table 2.1 and the bit pattern in Fig. 2.1.
With the computational requirements developed in recent times, new formats have since
been developed in a new IEEE-754 standard revision [3]. These formats include a half pre-
cision (16-bit) and quad precision (128-bit), however these are highly specialized and not
supported by many compilers and processors.
Floating Point numbers are represented by Eq. (2.1) for normal values. Exceptions are
discussed in Section 2.4.
(−1)S ∗2E−B ∗1.F (2.1)
where:
S is sign bit
2.3 Conversion 6
E is exponent value
B is exponent bias
F is fractional component
2.3 Conversion
The conversion between decimal values and Floating Point is the same for all precision’s, the
only difference being the exponent biases and the sizes of the fraction part.
2.3.1 Determine Sign
Floating Point numbers are represented in a sign-magnitude format with a single sign bit for all
formats. The sign is the same as for most formats: a 1 for negative numbers and 0 for positive.
No other computations are necessary for the sign.
2.3.2 Determine Exponent
To compute the exponent, the decimal number is multiplied by 2±N in order to normalize the
value between 1 and 2. The unbiased exponent is N. The exponent bias changes depending on
the format. The bias is calculated as shown in Eq. (2.2)[3].
B = 2exp_width−1 −1 (2.2)
In the case of single and double precision, the biases are 127 and 1023 respectively. This
bias is then added to the unbiased exponent to get the Floating Point exponent. The reasoning
behind biasing the exponent rather than using basic 2’s-complement notation is that it helps
2.3 Conversion 7
in numeric comparison. If only a small section of bits in the middle of the bit-string are in
2’s-complement, numeric comparisons as required by the IEEE-754 standard [3] require more
hardware to handle.
2.3.3 Determine Fractional Component
The last component that determines a Floating Point bit-pattern is the fraction or mantissa as
it is often referred. This is calculated much the same as a fixed-point fractional component.
For normal numbers, the fraction represents a value between 1 & 2. This normalized decimal
number can be expressed approximately as shown in Eq. (2.3). The leading one to the left of
the decimal is always implied and thus is excluded from the bit pattern. Subnormal numbers











bit0: Fraction bit immediately to right of decimal
bitn−1: Least significant fraction bit
The fraction component cannot express many numbers exactly, which is the bane of nearly
all binary number systems and is unavoidable especially when the number is not even finitely
representable in decimal, e.g. 13 = 0.3333.
The combination of each of the floating point components per Eq. (2.1), gives a binary
approximation to decimal numbers in the form of Fig. 2.1.
2.4 Floating Point Exceptions 8
2.4 Floating Point Exceptions
The Floating Point number system has many exception values which can become an issue
as discussed in Section 2.4.2. Exceptions are handled by the host system as required by the
standard [3], however it is up to the designer to determine how. This can be through any of the
following methods which all have merits and issues:
• Single exception flag and host decodes bit-pattern
• Flag for each exception
• Exception return codes
2.4.1 Zero and Infinity
A necessity of any number system is the ability to represent zero. To represent zero in floating
point, both exponent and fractions are zero. The sign bit can be either positive or negative.
The infinity exception is thrown when it is used as an input, when there is a divide by
zero operation, or in overflow cases. To represent infinity, the exponent value is set to its
maximum of all ones, with the fraction bits all set to zero. As expected, infinity can be positive
or negative.
The reasoning behind a positive and negative zero was to simplify hardware when it came
to division. A divide by positive zero results in positive infinity and vise-versa. A reminder
that Floating Point was developed at a time when computational hardware was at a premium,
a time we no longer live in. This is one of the eccentricities of Floating Point that remain
outdated and unnecessary.
2.5 Floating Point Problems 9
2.4.2 Not a Number (NaN)
A major exception handled by Floating Point is Not a Number (NaN) conditions. These are
represented as invalid calculations caused by any number of issues. The bit-pattern for NaN’s
is an exponent of all ones and a non-zero fraction. This creates quite a waste of bit-patterns;
over 16 million for single precision. Since the fraction bits are considered a waste in NaN
calculations, a popular approach is to use the value to represent an error payload. However,
these kinds of issues can be instead handled by a host system, saving the bit-patterns for valid
data; an issue resolved by the posit system.
2.4.3 Subnormal Numbers
Subnormal numbers, while not an explicit exception, do deserve some mention. Subnormal
numbers are not required by the standard [3], however they are supported by a wide variety of
host systems.
Subnormal numbers were designed as an extension to the dynamic range of Floating Point
numbers, specifically for the lower range. A subnormal number is denoted by an exponent
value of all zeros and a non-zero fractional component. The exponent for subnormal numbers
is computed identically as before and will always be 2−bias. The what makes subnormal num-
bers different is in the fractional component: the implied hidden one is not used. Thus for
calculating the fraction bit-pattern, the 1+ component of Eq. (2.3) is dropped.
2.5 Floating Point Problems
While Floating Point is heavily supported by most digital systems, that remains one of its few
benefits. The standard was written decades ago with few updates as industry and technology
2.5 Floating Point Problems 10
have drastically changed in the meantime leaving many issues for designers to try and handle.
2.5.1 Meaningless Exceptions
A major fault of Floating Point was outlined in Section 2.4 by the way of NaN values. They
are an excessive waste of bit-patterns that can be put to use to increase accuracy or dynamic
range.
2.5.2 Unnecessary Hardware
Another fault that arises from the use of excessive exceptions is the hardware toll. Handling
NaN’s for various operations and comparisons requires dedicated logic as well as with a signed
zero. For a system devised in an era of costly hardware, Floating Point certainly requires a fair
amount of unnecessary logic.
2.5.3 Overflow
One aspect of Floating Point that could have been resolved even while the standard was being
developed is overflow. The standard requires that calculations that overflow, go to infinity. For
most calculations, this isn’t exactly expected behavior despite being defined as such. Most
calculations would rather have significant error as the largest representable value rather than
infinite error in overflow.
This overflow issue has had drastic and even fatal consequences. The Large Hadron Col-
lider used a floating point math library in its software to track collisions to find the Higgs
boson particle. Overflow issues in the library caused certain collisions to be missed or incor-
rectly identified [4].
2.5 Floating Point Problems 11
2.5.4 Round-off
Round-off error occurs in all numeric systems, however it becomes rather problematic in Float-
ing Point. Round-off error is where a decimal number isn’t exactly representable in a binary
system and a bit-pattern is assigned according to a rounding mode [5]. Floating point has four
rounding modes:
• Round to nearest
• Round up (towards + inf)
• Round down (towards − inf)
• Round towards zero
This round-off issue became deadly when during the Gulf War, a U.S. army base utilizing a
missile defense system was attacked by an Iraqi SCUD missile. The tracking system utilized a
clock since power-up to perform velocity calculations with a precision in the tenths of seconds.
This precision isn’t perfectly representable in Floating Point and was subject to round-off.
While normally this didn’t have any negative impact, this system was powered on for over 100
hours resulting in significant rounding error. Thus the counter-missile missed the incoming
missile by over 400 meters resulting in over 20 deaths and 100 injuries.
While these errors are prone to exist in any number system, they don’t have to exist to this
extent. Thus the posit system was developed as discussed in Chapter 4.
Chapter 3
Unum Representation
In this chapter, Universal Numbers (Unums) are discussed. Unums can be divided into two
different types, Type I & Type II.
3.1 Type I Unums
The first type of Unums were developed as a superset of floating point representation. The
main goal was to create a dynamic format that would only use as much data space as needed.
The reasoning behind this is fairly straightforward: don’t waste memory you don’t need. Other
additions include an error flag as well as a system for implementing interval arithmetic.
3.1.1 Format
The format for Type I Unums (abbreviated to T1 from this point), as a superset for Floating
Point, becomes a fair bit more complicated. The following sections describe the individual
components that make of a T1 number. The full bit-pattern is shown in Section 3.1.1.
The sign, exponent and fraction bits are identical to floating point in terms of functionality
3.1 Type I Unums 13
sign exponent f raction ubit es−1 f s−1︸ ︷︷ ︸ ︸ ︷︷ ︸
es bits fs bits
Figure 3.1: Type I Unum Bit Pattern
as described in Sections 2.3.1 to 2.3.3.
3.1.1.1 U-Bit
One of the first steps to help the user of the Type I system (shortened to T1 from this point)
was to implement a flag to easily determine if rounding, and thus error, has occurred. This
bit is called the ubit and is simply appended to the least significant bit of a floating point
style number. This additional bit can prevent costly errors based on the assumption that the
calculation is correct [6].
If the ubit is clear, that indicates that the number is exact and no rounding error has oc-
curred. When the ubit is set, that indicates that the number has been rounded. The definition
of the ubit [7] states that it acts as a flag to indicate that there are more digits than what is
expressable. This mean that the actual value is somewhere between the number represented
and the next higher value.
3.1.1.2 Exponent & Fraction Size Bits
In order for a number to have variable size, it is necessary to include information bits along
with the actual data. For T1 representation, these take the form of the exponent and fraction
size bits (As defined in Section 3.1.1). The sizes of these two values are the only values
necessary to create a computing environment. As defined in [7], a T1 environment is set up by
{x,y}, where x is the bit-width of the es field and y is the bit-width of the f s field known as
esizesize and f sizesize respectively.
3.1 Type I Unums 14
Table 3.1: Type I Unum Sizing
< esizesize, f sizesize > < x,y > < 3,4 > < 4,7 >
Min. Bit-width 1+1+1+1+ x+ y 11 15
Max. Bit-width 1+2x +2y +1+ x+ y 33 157
Dynamic Range 2∗ (2−2−(2y−1))∗222
x−1 ≈ 1.4∗1039 ≈ 5.7∗109864
In this description exponent size will be used, however the fraction size utilizes the same
functionality to detail the fraction bits. The exponent size bit-width is defined by its esizesize.
For esizesize = n, there are 2n possible values for the size of the exponent field. As there will
always be at least one exponent bit, there is no need for a zero value and thus the value can be
biased by one to give an extra value. For example, with an esizesize = 4, the range of possible
values falls between 0 ≤ n ≤ 15, resulting in a range of 1 ≤ es ≤ 16. This offset also applies
to the f s and f sizesize fields.
For all IEEE-754 [3] formats, all exponent sizes can fit with an esizesize = 4 with room for
even more range.
3.1.2 Type I Unum Characteristics
The combination of the ubit, es & f s fields are collected into the utag. The utag is the only
overhead difference in terms of size compared to traditional Floating Point. While this over-
head may seem costly, most utilization’s will not require the full length of the es or f s fields,
which actually ends up saving space [7].
When tested against floating point for addition and multiplication, T1 numbers averaged
23 & 45 bits compared to the necessary 32 & 64 needed for Floating Point [8].
However, despite the space saving benefits granted by T1 numbers, they come with a hard-
ware cost. Not only the overhead of the utag, which is counteracted by the space-saving nature
of average calculations, but the hardware needed to decode the utag and its corresponding
3.2 Type II Unums 15
bit-widths. This makes T1 numbers rather inefficient in terms of area and power due to the
complex decoding logic. However, the benefits are far greater in software, where they can be
implemented in math libraries for memory scarce applications.
3.2 Type II Unums
In order to counteract the hardware intensity of Type I Unums, Type II was developed to
operate much more efficiently in hardware. These operate as a method of pointer arithmetic
rather than operating on actual data. This format completely breaks with traditional floating
point for the purposes of speed.
3.2.1 Design
Type II Unums (now referred to as T2) are based on the idea of the number line centered
around zero with negative infinity on the far left and positive infinity on the far right with all
real values on the line. To get a true idea of T2 numbers, imagine wrapping the number line
into a circle with zero at the bottom and both infinities meeting at the top as shown in Fig. 3.2
[1].
This implementation uses the U-bit as described in Section 3.1.1.1 which is used in the
least significant position. Values with a set ubit are shown as being in between exact values.
As per most systems, the most significant bit is allocated as the sign bit with negative
numbers being on the left half and positive on the right. Thus negating values can be seen as
being flipped around the vertical axis. Two’s complement suffices for the actual negation.
There also exists symmetry along the horizontal axis that lends itself towards reciprocation.
The integer values of ±1 exist at the East and West points of the circular number line with
values greater than one above and values less than one below. Reciprocation of values is
3.2 Type II Unums 16
Figure 3.2: Type II Projected Number Line
3.2 Type II Unums 17
done simply by performing 2’s complement of the value, disregarding the sign bit. Further
simplifying things, finding the negative reciprocal requires only flipping the sign bit.
3.2.2 Operations
The format of T2 numbers allow extreme simplicity when performing operations and requires
similar resources for all four basic mathematical operations [9]. With Floating Point, addi-
tion and subtraction used similar hardware, while multiplication and division use drastically
different hardware. This isn’t so with T2 numbers based on its pointer type structure.
One major benefit is that not all division by zero operations need to throw exceptions.
Normally this would require some algebraic manipulation but that occurs naturally with T2
calculations. Take for example Eq. (3.1) when x = 0. Floating Point would immediately throw
an exception for the divide by zero however that is not the case for T2 numbers. The first 1x










One aspect of the T2 style of pointer arithmetic is that the values represented don’t have to
ascend in an exact linear order but can include "non-standard" values in order to improve ac-
curacy. A desired feature of a number system is to provide tapered accuracy, which essentially
means that as range increases, decimal accuracy isn’t as necessary. For example, most num-
bers on the order of billions or trillions really don’t care about the ones or tens place. The
ability to add in non-linear values helps provide a smooth accuracy transition as shown in
3.3 Interval Arithmetic 18
Figure 3.3: Floating Point vs. Type II Accuracy
Section 3.2.3[9].
3.2.4 Hardware Cost
While the actual arithmetic core for T2 numbers is relatively simple and therefore much faster
than typical arithmetic cores such as Floating Point, getting the actual data result becomes
quite taxing on the hardware.
Because the bit-patterns of T2 numbers represent pointers to the actual data, getting the
result requires look up tables to get the actual value. Considering the four basic operations for
16-bit numbers, a full look up table will require over 32 gigabytes of data space. While these
tables can be reduced quite a bit using symmetries and other methods, the data space required
is still in the megabyte range [9]. For this reason alone, T2 numbers are not suitable for the
majority of applications and may only gain prominence in the software domain.
3.3 Interval Arithmetic
One aspect of both T1 & T2 numbers is that they lend themselves quite handily to performing
interval arithmetic. While interval arithmetic is outside of the scope of this research, it is
3.3 Interval Arithmetic 19
interesting to note the abilities of each type as discussed in [7][9][1].
Chapter 4
Posit Representation
This chapter discusses the Posit version of the Unum number systems.
4.1 Background
In addition to the Type I & II Unums discussed in Chapter 3, there is defined a Type III also
denoted as Posit numbers. The previous types had their benefits, but for practical purposes are
mostly restricted to the software domain. To address this issue, Posits were developed making
certain concessions in order to be hardware friendly [1].
One of the main problems with Floating Point is actually one of its few benefits: it is mas-
sively ingrained into current systems and will be at least for the foreseeable future. Posits build
upon hardware that already exists for floating point with a few changes to improve accuracy,
range, etc.
4.2 Format 21
sign regime exponent f raction
Figure 4.1: Posit Bit-Pattern
4.2 Format
Posits have a similar format to Floating Point with the addition of one extra field known as the
regime. The bit pattern is shown in Section 4.2 where the exponent is similar to Section 2.3.2
and the fraction is identical to Section 2.3.3.
Posits are defined by two sizes: the overall width, and the exponent size es. The dashed
lines at the bit boundaries indicate that, with the exception of the single bit sign, the boundaries
can shift depending on the size of the regime.
The sign bit is typical with zero meaning positive and one meaning negative. Unlike Float-
ing Point, Posits are not represented in sign magnitude notation. Negative numbers require
taking 2’s complement of the entire bit-pattern before any decoding.
4.3 Regime
The regime is a bit sequence unique to Posits that allow for tapered accuracy; this is one the
factors that gave Type I & II Unums an edge over Floating Point. This allows for a decreased
accuracy for extremely large numbers and vice versa. The regime provides a scaling factor,
much like the exponent, that multiplies with the exponent and fraction. This scaling factor isn’t
coded as an integer but rather a count of identical digits followed by an opposite terminating
digit. This can extend from the minimum two digits all the way to the entire bit width. As the
scaling factor increases, fraction bits are "pushed" off of the end of the bit-pattern, replacing
accuracy for range [1]. For exceptionally large regimes, exponent bits can also be "pushed" off
of the bit-pattern. A more descriptive bit-pattern is shown in Section 4.3.
4.4 Scaling Factor 22
s rrrr · · ·r e1e2e3 · · ·ees f1 f2 f3 f4 f5 · · ·
Figure 4.2: Detailed Posit Bit-Pattern
Table 4.1: Posit Regime Decoding
Bits 0000 0001 001x 01xx 10xx 110x 1110 1111
k −4 −3 −2 −1 0 1 2 3
While it may seem counter-intuitive for negative regimes to have leading zeros instead of
ones, this allows for simple comparisons such as equality or greater/less than comparisons.
Comparisons can even be computed without taking 2’s complement of negative values result-
ing in simplified hardware.
An example of decoding the regime is shown in Section 4.3 for four bits.
These k values are determined according to the run length of the leading digit. A leading
zero indicates a negative k and vice versa. The computation for k is shown in Eq. (4.1) where
b0 is the value of the running digits.
k =

−m i f b0 = 0
m−1 i f b0 = 1
(4.1)
A corner case exists when the regime takes up the entire width of the bit-pattern. This is
resolved the same as for any other regime with the only difference being that the terminating
digit isn’t required: the run length is considered the entire bit-width.
4.4 Scaling Factor
In order to convert the regime (k value) into the appropriate scaling factor, it is necessary to
define the useed. The useed is determined by the es exponent size value as shown in Eq. (4.2)
4.5 Exponent 23
Table 4.2: Posit useed















s f = useedk (4.3)
4.5 Exponent
The exponent is similar to how the Floating Point representation with the exponent bits repre-
senting an integer value. The difference is that posits do not require biasing as Floating Point
does. Exponents are represented as unsigned integers; negative exponents are handled by the
regime scaling factor being larger than any possible exponent.
4.6 Fraction
The fraction bit-pattern is identical to that of Floating Point, including the implied hidden one.
The only difference is that subnormal numbers do not exist for Posits, eliminating the extra
hardware required for detection in Floating Point.
4.7 Exceptions 24
4.7 Exceptions
As one of the major issues with Floating Point was its wasteful exceptions including millions
of NaN values and signed zero, Posits were designed to eliminate these. There is no signed
zero; a single infinity value takes the previous bit-pattern of negative zero. Logic to determine
a positive or negative infinity is extremely low cost if necessary. NaN values do not exist in
the Posit system.
Another issue of Floating Point is the overflow/underflow issue: valid calculations either
overflowed or underflowed to infinity or zero causing an exception to be thrown. This leads
to an infinite amount of error in the calculation. During the design of Posits, it was deemed
that significant error is much better than infinite, thus any value that exceeds representation is
rounded to the closest non-exception value [10]. Infinity is only reserved for divide by zero or
operations with infinity as an input.
This nearly eliminates all hardware to detect exception values with infinity being the only
issue which is definitely an improvement over Floating Point.
4.8 Benefits
Compared to Floating Point, Posits have many benefits in their favor. By using a variable
length scaling factor, Posits utilize tapered accuracy which is beneficial to calculation. This
property allows for much greater range at the extremes and much greater accuracy at smaller
ranges for the same bit width [11]. The range of posits is summarized in Section 4.8 [1].
While for higher bit-width such as 128 or 256, which are only ever used in very specialized
computations, Floating Point does have a slightly higher range, however the accuracy performs
worse than Posits. To show accuracy, Section 4.8 shows the tapered effect for both Floating
Point and Posits, however it is easy to see the reducing of accuracy at the extremes that gives
4.8 Benefits 25
Table 4.3: Floating Point vs. Posit Dynamic Range
Width FP Exp Size FP Dynamic Range Posit es Posit Dynamic Range
16 5 6∗10−8 −7∗104 1 4∗10−9 −3∗108
32 8 1∗10−45 −3∗1038 3 6∗10−73 −2∗1072
64 11 5∗10−324 −2∗10308 4 2∗10−299 −4∗10298
Figure 4.3: Floating Point vs. Posit Accuracy [1]
Posits an edge [12].
The only downfall of Posits is the current lack of hardware support as is expected for any
new system. The true difficulty is to gain enough traction to prove Posits outperform Floating
Point for most applications.
Chapter 5
Floating Point Core Design
This chapter describes the design of the Floating Point arithmetic core.
5.1 Hierarchical Design
The design of the Floating Point unit (FPU) is done in a hierarchical manner with specialized
blocks handling each aspect of the calculations as shown in Section 5.1. The top level FPU
module is essentially a container to hold the Data Path (DP) and Post-Processor (PP) blocks.
The complete interface to the FPU is described in Section 5.1 and includes DFT scan-chain
ports for comprehensive test coverage as discussed in Chapter 8. Included in the interface
Figure 5.1: FPU Top Level Block Diagram
5.2 Data Path 27
Table 5.1: FPU Interface
Signal Size Direction Description
clk 1 Input System Clock
reset 1 Input Asynchronous active high reset
start 1 Input Strobe to start calculation
op 2 Input Operation select
a 32 Input Operand A
b 32 Input Operand B
y 32 Output Calculation Result
done 1 Output Strobe to signal calculation completion
exc 1 Output Flag to indicate exception condition
scan_in0 1 Input DFT scan chain input
scan_en 1 Input DFT scan chain enable
test_mode 1 Input DFT test mode select






description is the specification for the operation selection.
5.2 Data Path
The main block inside the FPU is the Data Path (DP) block. This contains the main com-
putational components as shown in Section 5.2. Most of these blocks are implemented as
hierarchical modules with the exception of the control and output logic; these units are imple-
mented as registered control logic for the DP. This control logic essentially performs switching
between the arithmetic modules based on the input operation as well as gating the logic while
waiting for the start flag.
5.2 Data Path 28
Figure 5.2: FPU Data Path
5.2.1 Unpack Module
The first block in the DP is the asynchronous unpacking modules; their purpose is to split
the incoming operands into their respective components: sign, exponent and fraction. It is
also responsible for checking the bit-patterns for exceptions which are passed as flags to the
exception handler.
5.2.2 Exception Handler
The main task of the exception handler is to utilize the exception flags for each operand, if
any, and determine if the output of the operation will be an exception. The exception handler
follows the IEEE-754 [3] standard to determine the result as shown in Section 5.2.2. The
resulting exception is passed to the control logic which can directly pass the exception bit-
pattern to the output in order to expedite the operation.
5.2.3 Adder/Subtractor
Due to the extremely close nature between addition and subtraction, the two computational
units were combined with only minor additional logic to handle sign changes for subtraction.
This is common practice in arithmetic cores [13]. Both addition and subtraction follow the
5.2 Data Path 29
Table 5.2: FP Exception Handler Operation
Operand A Exc. Operand B Exc. Operation Output Exc. Commutative
NaN X All NaN Yes
± inf ∓ inf Addition NaN Yes
± inf X Addition ± inf Yes
± inf X Subtraction ± inf No
X ± inf Subtraction ∓ inf No
± inf ± inf Subtraction NaN Yes
± inf 0 Multiplication NaN Yes
± inf X Multiplication ± inf Yes
0 X Multiplication 0 Yes
0 0 Division NaN Yes
± inf ± inf Division NaN Yes
X 0 Division NaN No
0 X Division 0 No
X ± inf Division 0 No
same algorithm as shown in listing 5.1
5.2.4 Multiplication
The multiplier unit for the FPU is designed according to listing 5.2. This follows usual prac-
tice for multiplying any numbers in scientific notation; exponents are added and fractions are
multiplied. The necessity of subtracting the bias from the added exponents is that both expo-
nents are biased resulting in a sum that has been effectively biased twice. It is also important
to note that, unlike addition and subtraction, multiplication has the inherent property of always
resulting in the property of 1 ≤ f rac < 4 thus providing a maximum normalization shift of one
rather than variable.
5.2 Data Path 30
Listing 5.1: FPU Addition/Subtraction Algorithm
1
2 d e t e r m i n e p r o p e r r e s u l t s i g n
3 prepend h i dd en one t o f r a c t i o n s
4 compare ope rand e x p o n e n t s
5 s h i f t ope rand wi th s m a l l e r e x p o n e n t t o r i g h t by d i f f e r e n c e
6 pe r fo rm a d d i t i o n / s u b t r a c t i o n
7 i f c a r r y
8 s h i f t r e s u l t r i g h t by one
9 i n c r e m e n t e x p o n e n t
10 e l s e i f l e a d i n g one i s t o r i g h t o f h id de n b i t
11 s h i f t r e s u l t l e f t
12 dec remen t e x p o n e n t
13 check e x p o n e n t ove r / u n d e r f l o w
14 remove h i dd en one
15 combine f i e l d s i n t o f u l l r e s u l t v a l u e
Listing 5.2: FPU Multiplication Algorithm
1
2 xor s i g n s t o g e t r e s u l t s i g n
3 add e x p o n e n t s
4 s u b t r a c t e x p o n e n t b i a s
5 p repend f r a c t i o n s wi th h i dd en one
6 m u l t i p l y f r a c t i o n s
7 i f c a r r y
8 s h i f t r e s u l t r i g h t by one
9 i n c r e m e n t e x p o n e n t
10 e l s e i f l e a d i n g one i s t o r i g h t o f h id de n one
11 s h i f t r e s u l t l e f t by one
12 dec remen t e x p o n e n t
13 check e x p o n e n t ove r / u n d e r f l o w
14 remove h i dd en one
15 combine f i e l d s i n t o f u l l r e s u l t v a l u e
5.2 Data Path 31
Figure 5.3: Vedic 2x2 Multiplier
5.2.4.1 Hardware Multiplier
It is necessary to note the algorithm for actually performing the hardware multiplication. For
computing the multiplication, only an integer multiplier is necessary [14]. In this case, Vedic
multiplication is used. Vedic multiplication is a method that utilizes several 2x2 multipliers
connected in a tree formation to create 4x4 multipliers which are connected to create 8x8 and so
on [15]. The 2x2 multiplier building block is shown in Section 5.2.4.1. An example of building
up the multiplier is shown for an 8x8 multiplier in Section 5.2.4.1[15] but is expandable to any
size. The mathematics behind Vedic multiplication is outside of the scope of this research,
however the benefits of the tree-style hierarchy allows for even distribution of delay in the
hardware which is used to build a single cycle multiplier.
5.2.5 Divider
The divider follows a very similar algorithm to the multiplier, however instead of a single
cycle multiplier, the division is handled in many cycles. The algorithm for division is shown in
5.3 Post-Processor 32
Figure 5.4: Vedic 8x8 Multiplier Block
listing 5.3. One difference to be noticed is that the denominator is biased to always be smaller
than the numerator; this is to help with normalization to always ensure the result is always
between 1 and 2.
5.2.5.1 Hardware Divider
The hardware divider is implemented as an integer divider without the remainder. It was
designed to run until the maximum number of cycles or until the remainder becomes zero,
meaning that an exact division was achieved rather than a fixed latency divider. This allows
for a possibility of a speed increase [16]. The divider architecture uses a flag from the hardware
divider to stall execution until the division is complete.
5.3 Post-Processor
The post-processor for the FPU is very simple as shown in Section 5.3; it contains only two
blocks: a rounding block and a packaging block. These blocks convert the results from the
5.3 Post-Processor 33
Listing 5.3: FPU Division Algorithm
1
2 xor s i g n s t o g e t r e s u l t s i g n
3 s u b t r a c t e x p o n e n t s
4 add e x p o n e n t b i a s
5 p repend f r a c t i o n s wi th h i dd en one
6 i f op_b > op_a
7 op_b >> 1
8 d i v i d e f r a c t i o n s
9 i f op_b was b i a s e d
10 dec remen t e x p o n e n t
11 check e x p o n e n t ove r / u n d e r f l o w
12 remove h i dd en one
13 combine f i e l d s i n t o f u l l r e s u l t v a l u e
Figure 5.5: FPU Post-Processor
operations back into proper Floating Point format.
5.3.1 Rounding Block
The rounding block performs rounding as described in Section 2.5.4. Most applications find
that round to nearest is quite adequate so for simplicity of design, that was the only rounding
mode implemented.
To facilitate rounding, during the calculations three extra bits are appended to the fraction
bits and carried out in the calculation as normal: guard, round and sticky bits in that order.
The first two are regular bits however the sticky bit carries a unique property: it is a reduction
5.4 Timing 34
or result of any lower precision bits during the calculation. This bit indicates that the value is
actually greater than what is represented without including all extended bits.
The first rounding case is the round down; this is indicated by a unset round bit. The next
case is the round up which is determined by a set round and sticky bit. The last case is the
round to even which is determined by a set round bit, unset sticky bit and a set guard bit.
The round up requires an addition of one to the fraction bits not including the three round-
ing bits, whereas the round down truncates these values with no addition. The round to even
case selects the closest even value, which is a value that ends in a one. This requires only
truncation and clearing of the LSB.
5.3.2 Packing Block
This packing block is the simplest of all the blocks and merely combines the sign, exponent
and fraction fields into the full Floating Point bit-pattern. The only purpose is to provide a
hierarchical block for concatenation of the components.
5.4 Timing
The FPU was designed to complete one operation at a time; beginning with the setting of
the start flag and completion indicated by the done flag. The timing diagram as shown in





op[1:0] XX Operation Operation
a[31:0] Operand A Operand A
b[31:0] Operand B Operand B
y[31:0] 0000 Y Y
done
exc
Figure 5.6: FPU Timing
Chapter 6
Posit Core Design
This chapter describes the design of the Posit arithmetic core.
6.1 Hierarchical Design
As opposed to the single operation design of the FPU in Chapter 6, the Posit arithmetic unit
(PAU) utilizes a pipelined design as shown in Section 6.1. This effectively splits the design into
several stages: extraction, scaling, operation, and decoding along with the necessary control
logic.
Figure 6.1: PAU Block Diagram
6.1 Hierarchical Design 37
The interface for the PAU is identical to that in Section 5.1 in order to provide a one-to-one
drop in replacement. The main difference is in the timing for pipeline operation as described
in Section 6.4.
6.1.1 Control Logic
The control logic is extremely simple and consists of a single three bit register containing the
operation for that particular stage as well as a stall bit. The only operation requiring a stall is
division as it is done using a variable latency divider.
As there is no control unit for the extraction block, during a stall, it will continuously
update if the input operands change without entering the following stalled scale block. To pass
the stall information to the host system, the done flag is deasserted as the other blocks enter a
stall and must be checked in order to prevent loss of data.
6.1.2 Extraction Block
The extraction block is responsible for taking a posit type input and extracting the sign, regime,
exponent and fraction fields. The algorithm for extracting this data is shown in listing 6.1[17].
This is identical for all operations.
6.1.3 Scaling Block
The scaling block performs the equivalent of Floating Point exponent calculations but for
Posits. This is operation dependent with major differences between addition/subtraction and
multiplication/division. There is only a minor difference between multiplication and division:
adding vs. subtracting the scaling factor which only necessitates a minor logic check rather
than new hardware.
6.1 Hierarchical Design 38
Listing 6.1: PAU Extraction Algorithm
1
2 i f n e g a t i v e
3 t a k e 2 ’ s complement
4 check f o r e x c e p t i o n v a l u e s
5 remove s i g n b i t
6 i f r eg ime > 0
7 c o u n t l e a d i n g z e r o s
8 reg ime = z e r o _ c o u n t −1
9 e l s e
10 n e g a t e i n p u t
11 c o u n t l e a d i n g z e r o s
12 reg ime = −z e r o _ c o u n t
13 temp = i n p u t << ( z e r o _ c o u n t − 1)
14 exp = temp [ t o p : top−es +1]
15 f r a c = {1 ’ b1 , temp [ top−es : end ] }
Identical for all operations, the scaling factor is the combination of the regime scaling
factor and the exponent into a single value for each operand. How this is used is the difference
between operations. Calculating the scaling factor is shown by Eq. (6.1).
scale_ f actor = (regime << es)+ exp (6.1)
6.1.3.1 Addition/Subtraction Scaling
Much like Floating Point, Posits use the fraction with the largest scaling factor and shift the
smaller to the right before performing the operation. This is shown in listing 6.2 where the
greatest scaling factor is the scaling factor passed to the operation stage to be used to calculate
the result scaling factor. The greater/smaller fractions are also passed to the operation block
where the smaller will be shifted. The absolute value of the shift value difference is used to
signal to the operation block how much to shift the smaller fraction.
6.2 Operation Block 39
Listing 6.2: Computing Scaling Factor for Addition/Subtraction
1
2 s h i f t _ v a l u e = | s f _ a − s f _ b |
3 i f op_a > op_b
4 g r e a t e r _ f r a c = a _ f r a c
5 s m a l l e r _ f r a c = b _ f r a c
6 g r e a t e s t _ s c a l i n g _ f a c t o r = s f _ a
7 e l s e
8 g r e a t e r _ f r a c = b _ f r a c
9 s m a l l e r _ f r a c = a _ f r a c
10 g r e a t e s t _ s c a l i n g _ f a c t o r = s f _ b
6.1.3.2 Multiplication/Division Scaling
Scaling for multiplication and division is also rather simple and only requires the addition
or subtraction of the two scaling factors respectively. Unlike Floating Point which biases
the exponents, Posits do not and therefore do not require adding or eliminating biases when
combined.
6.2 Operation Block
Much like the scaling block, the operation block is dependent on the value stored in the control
register: this controls which operation is actually computed. This block handles the fractional
computations in a similar way to the Floating Point unit. Being as the main difference be-
tween Floating Point and Posits is in the regime without any bearing on what happens with the
fraction component.
The computations are set up to operate on the maximum sized fractions, i.e. for the smallest
width regime. The actual fractions are appended with zeros as necessary in order to provide
correct results.
6.3 Decoding Block 40
The algorithm for computing addition/subtraction is the same as described in listing 5.1
with the only exception being the utilization of the greater scaling factor as the value to shift
the smaller fraction. The normalization is also the same, however instead of adjusting the
exponent, the entire scaling factor is adjusted.
Multiplication and division also follow listing 5.2 and listing 5.3 respectively. With divi-
sion, the flag that is used to indicate that the division is not yet complete is used to provide a
stall signal to the other stages by way of the control registers.
6.3 Decoding Block
The decoding block of the PAU provides the means to convert the results of the operation and
scaling blocks into a single Posit bit-pattern. The most laborious of these tasks is converting
the scaling factor back into a regime and exponent.
The first step is to determine the exponent; this is done by extracting the lowest es bits from
the scaling factor which becomes the resulting exponent. The bits that are left form the signed
regime value and the absolute value is taken.
The next step is to remove the hidden one from the fraction bits and prepend it with two
bits depending on the sign of the signed regime: 10 for positive and 01 for negative. This
allows for the correct format of the regime to be implemented by forming the last bit of the
regime and the terminating digit.
To fill out the regime, the prepended fraction bits are shifted by the absolute value of the
regime less one. The value of the bits shifted in match the leading digit of the prepended
fraction.
As the fraction bits are shifted out, they are used in a reduction or in order to determine the











Figure 6.2: PAU Pipeline Timing
The last step is to check if the resulting sign is negative; if so, the 2’s complement of the
bit-pattern is taken. The result is the final answer to the Posit calculation.
6.4 Timing
The timing for the PAU for a single operation is identical to the FPU as shown in Section 5.4.
For the pipelined operation, the start bit is held high as long as the host needs to run operations.
Once the pipeline is full and the first output appears, the done flag is asserted along with
the exception flag if necessary. If a division requires stalling the pipeline, the done flag is
deasserted until valid data is ready again. A typical pipelined sequence is shown in Section 6.4.
Chapter 7
Testbench Design
This chapter describes the design of the testbench and analysis environment for both the FPU
and PAU cores.
7.1 Testbench
In order to provide a thorough testing environment, System Verilog is used. System Verilog
(SV) is a language add-on for the previously standardized Verilog HDL [18]. The SV language
provides an object oriented structure similar to C++ or Java to primarily facilitate complex and
reusable testing environments.
A main advantage of SV in addition to the class structure is the ability to use constrained as
well as random testing [19]. This allows for a much greater range of testing to be done than in
traditional Verilog by being able to target corner cases as well as full operation using random
data. These options help to create more completely verified designs.
A major addition included in SV worth noting is the ability to use test coverage. A test-
bench designer can set test coverage points on various ports to ensure that the complete range
7.1 Testbench 43
of data is being tested for. This ensures that the design doesn’t have unexpected corner cases
not foreseen by the designer which is a distinct possibility for extremely complex designs.
These coverpoints can be grouped into coverage groups and even checked for cross-coverage
between groups.
7.1.1 UVM Testbench
With the creation of the class structure of SV, many libraries were created in order to make
testing even easier and comprehensive. The main library worth noting which is used to create
the testbench for the work in this paper, is the Universal Verification Methodology (UVM).
The library includes pre-defined classes for typical components of the testbench as well as
more user-friendly methods of connecting them such as analysis ports and FIFOs. A UVM
testbench operates in phases: build, connect, run, report and others. Each phase is responsible
for certain tasks of the testbench; their exact uses are outside the scope of this work.
The architecture of a typical UVM testbench is shown in Section 7.1.1 [20] and is fairly
consistent through most testbench designs with small modifications.
For simplicity, the testbench was designed for analysis rather than complex verification.
To ensure a perfect one to one replacement of the FPU for the PAU, only the single operation
method is used in testing rather than the pipelined method.
7.1.1.1 Top Level
The top level of the testbench is defined as a typical testbench module which contains the
design under test (DUT) and an interface. The interface is a SV construct that allows grouping
of signals into a single object. The individual signals are assigned to the DUT port list in a
manner similar to object members. The top level also calls the function to begin running the
test; this can call one or more specific tests depending on the testing needs. This function starts
7.1 Testbench 44
Figure 7.1: Typical UVM Testbench Architecture
the phases to build, connect and run the test.
7.1.1.2 Test Case
In UVM there is a predefined test class that is used to define how the test will be run. Test
objects can be created for different types of testing: a random test, or a corner test for example.
Typically, the test object will contain two instances: an environment and a sequence as detailed
in Section 7.1.1.4 and Section 7.1.1.3 respectively.
The test case utilizes the build phase to create the environment and sequence objects and
the run phase to begin the sequence. Multiple sequences can be run sequentially to create more
comprehensive tests.
7.1.1.3 UVM Sequence
A UVM sequence is responsible for generating the inputs to the DUT; these can be explicitly
defined, random, or constrained depending on the testing needs. Typically, a sequence utilizes
a packet object to hold the data for the inputs. This allows for easier transmission of the data
7.1 Testbench 45
over analysis ports to the driver which is detailed in Section 7.1.1.8 Analysis ports in the most
basic sense are connections between testbench objects to pass data such as packets.
For the testbench being designed, the sequence loops through each operation for a set
number of iterations. For this test, each operation is performed 100,000 times for random
operands. The data for the PAU is completely random with no constraints; the FPU however,
is constrained not to use NaN values for the inputs. This allows for more complete analysis
as there is an exceptions number of NaN values for single precision. If the testbench was
designed for the sole purpose of verification these constraints would not exits, but in order to
get meaningful analysis, they are eliminated from the inputs. This however, does not eliminate
them from the output as certain operations may result in NaN answers which does provide for
meaningful analysis.
7.1.1.4 UVM Environment
The environment in a UVM testbench provides the skeleton for the entire structure: all of
the testbench blocks are instantiated and connected through the environment as well as the
analysis ports. A typical environment, such as the one utilized for testing the FPU and PAU
cores, instantiates agents and scoreboards described in Section 7.1.1.6 and Section 7.1.1.5
respectively.
7.1.1.5 UVM Scoreboard
The UVM scoreboard is responsible for providing the real-time verification of the design.
Usually, a model is used to determine the correct outputs for a given input set; the "golden"
values along with the DUT values are fed into the scoreboard where they are compared. The
designer can implement any reporting method that works for their design.
For the FPU and PAU cores, the scoreboard takes the input operands and operation gener-
7.1 Testbench 46
ated by the sequence, along with the output result and exception flag and writes to a data file.
This removes the requirement for a model as the data file is analyzed by an external MATLAB
script.
7.1.1.6 UVM Agent
A UVM agent is essentially a container that holds blocks necessary for a specific interface
[20]. A testbench can have multiple agents in order to compartmentalize different interfaces
such as communication, memory, control, etc. For the FPU and PAU cores, ony one agent is
necessary.
There are two types of agents used in UVM testbenches, active or passive. At least one
active agent is needed for a testbench whereas passive agents can exist in any number or not at
all. The difference is that active agents drive inputs into the DUT where the passive does not.
Both have the capability to monitor outputs.
The objects contained within the agent is the sequencer, driver and monitor as described in
Sections 7.1.1.7 to 7.1.1.9 respectively.
7.1.1.7 UVM Sequencer
The sequencer is an extremely simple block whose only purpose is to get the data generated
by a sequence as defined in Section 7.1.1.3. The sequencer does not need any active control
during the test. The test block described in Section 7.1.1.2 instantiates the sequence and passes
the object location of the sequencer to which it will be passed. The data from the sequence is
received by the sequencer and is then passed to the driver block.
7.2 Analysis 47
7.1.1.8 UVM Driver
The driver is responsible for converting the data packet generated by the sequence into actual
bit level signals. The driver is directly connected through the top level interface to the DUT.
The point of the driver is to provide a level of abstraction and limit the design specific protocols
to a single block.
In the case of both the FPU and the PAU, the driver is responsible for assigning the input
operands, the operation, as well as triggering the start flag. Once the data is written to the
DUT, it is also written through an analysis port to the scoreboard. The driver only provides
support for the single operation rather than the pipelined method used for the PAU. As the PAU
supports single operations, this is to no detriment of analysis.
7.1.1.9 UVM Monitor
The monitor provides for the output what the driver does for the input: a design specific inter-
face for output signals. The monitor is also directly connected through the top level interface
to the DUT. The ouput signals are converted into an output packet which is passed through an
analysis port to the scoreboard for analysis.
For the FPU and PAU testbench, the monitor waits until the done flag is asserted, then
records the result and exception flag in the output packet.
7.2 Analysis
Mentioned in Section 7.1.1.5, a data file is created holding a list of operands, operations, results
and exception flags, one operation per line. There is also a header that specifies the arithmetic
core used, the width and exponent size. This allows for an analysis script to run on both FPU
and PAU data sets without any other outside information while also being configurable on a
7.2 Analysis 48
Listing 7.1: Analysis Algorithm
1
2 r e a d a l l d a t a from d a t a f i l e
3 decode h e a d e r
4 f o r a l l o p e r a t i o n s
5 c o n v e r t b i t −p a t t e r n s t o n u m e r i c a l
6 c a l c u l a t e e x a c t r e s u l t
7 c a l c u l a t e e r r o r be tween e x a c t and r e s u l t
8 p l o t e r r o r
test by test basis.
Due to the support for very large data sets, MATLAB is used to perform the analysis of
the data. A single script is created to read both data sets, convert to extended precision values,
compute the exact result, determine error and plot. A detailed algorithm for each of these
stages is shown in listing 7.1.
In order to run the analysis without completely filling up the host RAM, only 10,000
data points are loaded at any time. This is done using a feature of MATLAB that allows
reading/writing of data files directly from memory rather than loading into the work space,
wasting valuable RAM.
In order to convert the Floating Point data into numerical values, the decimal values stored
in the data file are typecasted using built-in MATLAB functions. For the Posit data, a custom
conversion function had to be implemented as shown in listing 7.2
Computing the error between the exact value and the computed result is done using decimal
error as shown in Eq. (7.1). This error provides a logarithmic scaling between the two values





Listing 7.2: Posit Conversion
1
2 check f o r e x c e p t i o n v a l u e s
3 i f ( x < 0)
4 n e g a t e x
5 c o n v e r t d e c i m a l v a l u e t o b i n a r y
6 remove s i g n b i t
7 i f (MSB = 1)
8 c o u n t l e a d i n g ones
9 k = ones − 1
10 e l s e
11 c o u n t l e a d i n g z e r o s
12 k = −ones
13 reg ime = useed ^ k
14 i f ( r eg ime i s e n t i r e b i t p a t t e r n )
15 re turn max v a l u e
16 remove reg ime from b i n a r y s t r i n g
17 exp = 2 ^ ( $es$ MSB’ s o f b i n a r y s t r i n g )
18 f r a c t i o n = r e m a i n i n g b i t s
19 compute e x a c t f r a c t i o n
20 pos = s i g n * reg ime * exp * f r a c
Chapter 8
Hardware Results
This chapter discusses the results of the hardware synthesis and analysis.
8.1 Hardware Synthesis
After the FPU and PAU cores were designed and tested, they were synthesized into direct logic
components. The applies accurate timing and power analysis on the design in order to verify
functionality before fabrication. The synthesis for both cores was done using SAED 32nm
technology. This represents a slightly older technology size but still reasonably close to newer
devices. As the goal is to provide a comparison between the two cores, the sizing is essentially
irrelevant.
The results of the hardware synthesis are shown in Section 8.1 for both the FPU and PAU
cores. As can be noted from the data, the PAU core does utilize more area than the FPU
core but the power consumption is significantly less. Power consumption in modern devices
is becoming extremely critical given the mobile battery powered nature of many devices. But
other key factors that lower power achieves is better heat dissipation, higher speed, packaging
8.2 DFT Insertion 51
Table 8.1: Hardware Synthesis Results
Parameter FPU Value PAU Value
Area(µm2) 33592 41882
Gate Count 22259 27752
Worst Case Slack (ns) 7.05 6.7381
Dynamic Power (mW ) 1.1189 0.9665
Leakage Power (µW ) 716.9227 862.0603
Test Coverage (%) 96.52 91.39
cost, as well as reliability [21].
Another result to be noted is the slack: both the FPU and PAU are relatively close in slack
which represents that they could be run at similar frequencies, allowing for an even more
coherent drop-in replacement of an FPU.
8.2 DFT Insertion
For most chip designs, a "flying probe" method is possible to test structural functionality. This
test is essentially a series of microscopic probes that test various point in the semiconductor.
However, this has its limitations as there are many layers that may not be reachable therefore
missing valuable hardware checks [22].
Design-for-Test (DFT) is a methodology that inserts hardware into the existing design
solely for the purpose of testing. A common practice is the use of scan chains which pro-
vides a connection through all of the registers by means of a multiplexed input. This allows for
verification that the register path is operational. This method provides an excellent coverage
rate, however depending on the logic it may not be able to fully test all asynchronous paths.
The results for the DFT scan coverage for the FPU and PAU cores are as shown in Sec-
tion 8.1. Both are above 90% which indicates decent coverage, however the FPU has over 96%
coverage which is very good.
8.3 Analysis Results 52
8.3 Analysis Results
The main goal of this research is to provide an analytical comparison between the two arith-
metic cores. After running a gate-level simulation, the analysis script described in Section 7.2
was run on the results. The decimal error for each operation is shown in Section 8.3.
As can be seen for every operation, the PAU outperforms the FPU in terms of decimal
error. For addition, subtraction and multiplication Sections 8.3 to 8.3, the error tracks the
PAU error until a certain point where the error spikes. This is the point where the calculations
begin to either overflow or have significant rounding error due to the lack of precision. It is
also important to note that the division operation loses track of the outputs relatively early as
shown in Section 8.3. The point at which the error disappears on the plot is where the result
overflows to infinity, as that point is not plottable. This goes to show how error-prone an FPU
truly is and where the tapered accuracy of a PAU comes into play.
8.3 Analysis Results 53
(a) Addition Error (b) Subtraction Error
(c) Multiplication Error (d) Division Error
Figure 8.1: Decimal Error Results
Chapter 9
Conclusions and Further Research
This chapter provides a summary of the research as well as possible future work in this area.
9.1 Conclusions
Throughout this research, it has been argued that a Posit arithmetic core has many benefits that
would provide reason to shift away from the use of Floating Point arithmetic cores. Floating
Point was standardized at a time when hardware was at a relatively high cost resulting in
concessions being made in accuracy and range in order to simplify the logic. With today’s
technology, this is no longer the case. If the hardware has changed drastically, so should the
number system.
The exception system alone is cause to rethink Floating Point: the excessive use of NaN
bit-patterns, the dual representations of infinity and zero are all unnecessary. Many software
tools utilizing floating point, don’t even care which exception is thrown, all the user needs to
know is that it happened. So the implementation of millions of exception bit-patterns simply
doesn’t make sense.
9.2 Future Research 55
With the advent of the Posit number system, all exceptions except for zero are combined
into one resulting in much more bit-patterns to represent valid data. The creation of the regime
field allows for much greater range with a reduced fraction size. This tapered accuracy is a
practical application of the typical uses of computational units: either high precision around
lower numbers or much less precision for very large numbers. Rarely do application require
extreme accuracy for extremely large values.
Through the design of both a Floating Point and Posit core, the operating characteristics
were determined. The PAU required more area, however the total power consumption as lower.
Considering that the PAU was designed for both single operation as well as pipelined versus
the FPU single operation only, the increase is easily explained. Even though the aim of this
research was to design for analysis rather than for verification and manufacturability, test cov-
erage was also measured. The FPU did slightly better but a slightly different design could
provide improvements.
Overall the Posit core performed extremely well, especially during the analysis. There
were far fewer errors and overflows that result in much better computations. This alone should
provide cause for a move away from Floating Point numbers, but for now industry remains
heavily ingrained in its ways.
9.2 Future Research
The work done in this research can easily be expanded upon. A core can be implemented
into a full processor such as ARM and optimized for embedded applications. However, Posits
show exceptional promise in the realm of neural networks. The data processing for neural nets
are very computationally heavy and rely primarily on Floating Point units. Some alternatives
have been used such as a variable precision FPU [23]. There has been some work using Posits
9.2 Future Research 56
which performed significantly better than a comparable fixed-point computational unit [24]
or when using a dedicated Posit multiply-accumulator unit [21]. It would be interesting to
see Posits used in a custom processor core build specifically for neural nets and deep learning
applications.
References
[1] Yanemoto Gustafson. Beating Floating Point at its Own Game: Posit Arithmetic. Super-
computing Frontiers and Innovation, 2017.
[2] IEEE Standard for Binary Floating-Point Arithmetic. ANSI/IEEE Std 754-1985, pages
1–20, 1985.
[3] IEEE Standard for Floating-Point Arithmetic. IEEE Std 754-2008, pages 1–70, 2008.
[4] D. Bailey. Numerical Reproducability in High-Performance Computing. 2015.
[5] F. Hartwig and A. Lacroix. Floating Point Addition Errors and their Effect on the Round-
off Noise in Digital Signal Processing. In Proceedings of IEEE @International Sympo-
sium on Circuits and Systems - ISCAS ’94, volume 2, pages 121–124 vol.2, 1994.
[6] Rebecca Tan. The Oceans are Starting to Boil. Supercomputing Asia, 2017.
[7] John L. Gustafson. The End of Error: Unum Computing. CRC Press, 2015.
[8] J. Hou, Y. Zhu, Y. Shen, M. Li, H. Wu, and H. Song. Tackling Gaps in Floating-Point
Arithmetic: Unum Arithmetic Implementation on FPGA. In 2017 IEEE 19th @In-
ternational Conference on High Performance Computing and Communications; IEEE
15th @International Conference on Smart City; IEEE 3rd @International Conference
on Data Science and Systems (HPCC/SmartCity/DSS), pages 615–616, 2017.
References 58
[9] John L. Gustafson. A Radical Approach to Computation with Real Numbers. A*STAR
Computational Resources Center, 2016.
[10] Posit Standard Documentation. PSD Release 3.2, 2018.
[11] M. K. Jaiswal and H. K. . So. Universal Number Posit Arithmetic Generator on FPGA.
In 2018 Design, Automation Test in Europe Conference Exhibition (DATE), pages 1159–
1162, 2018.
[12] E. Ternovoy, M. G. Popov, D. V. Kaleev, Y. V. Savchenko, and A. L. Pereverzev. Com-
parative Analysis of Floating-Point Accuracy of IEEE 754 and Posit Standards. In 2020
IEEE Conference of Russian Young Researchers in Electrical and Electronic Engineering
(EIConRus), pages 1883–186, 2020.
[13] S. Kukati, D. V. Sujana, S. Udaykumar, P. Jayakrishnan, and R. Dhanabal. Design and
Implementation of Low Power Floating Point Arithmetic Unit. In 2013 @International
Conference on Green Computing, Communication and Conservation of Energy (ICGCE),
pages 205–208, 2013.
[14] S. Anjana and C. Pradeep. High Speed Integer Multiplier Designs for Reconfigurable
Systems. In 2014 @International Conference on Control, @Instrumentation, Communi-
cation and Computational Technologies (ICCICCT), pages 393–397, 2014.
[15] D. K. Kahar and H. Mehta. High Speed Vedic Multiplier Using Vedic Mathematics. In
2017 International Conference on Intelligent Computing and Control Systems (ICICCS),
pages 356–359, 2017.
[16] E. Matthews, A. Lu, Z. Fang, and L. Shannon. Rethinking Integer Divider Design for
FPGA-Based Soft-Processors. In 2019 IEEE 27th Annual @International Symposium on
Field-Programmable Custom Computing Machines (FCCM), pages 289–297, 2019.
References 59
[17] R. Chaurasiya, J. Gustafson, R. Shrestha, J. Neudorfer, S. Nambiar, K. Niyogi, F. Mer-
chant, and R. Leupers. Parameterized Posit Arithmetic Hardware Generator. In 2018
IEEE 36th International Conference on Computer Design (ICCD), pages 334–341, 2018.
[18] IEEE Standard for SystemVerilog–Unified Hardware Design, Specification, and Verifi-
cation Language. IEEE STD 1800-2009, pages 1–1285, 2009.
[19] M. Keaveney, A. McMahon, N. O’Keeffe, K. Keane, and J. O’Reilly. The Development
of Advanced Verification Environments Using System Verilog. In IET Irish Signals and
Systems Conference (ISSC 2008), pages 325–330, 2008.
[20] T. M. Pavithran and R. Bhakthavatchalu. UVM Based Testbench Architecture for Logic
Sub-System Verification. In 2017 @International Conference on Technological Advance-
ments in Power and Energy ( TAP Energy), pages 1–5, 2017.
[21] H. Zhang, J. He, and S. Ko. Efficient Posit Multiply-Accumulate Unit Generator for
Deep Learning Applications. In 2019 IEEE @International Symposium on Circuits and
Systems (ISCAS), pages 1–5, 2019.
[22] H. Fang, K. Chakrabarty, and H. Fujiwara. RTL DFT Techniques to Enhance Defect Cov-
erage for Functional Test Sequences. In 2009 IEEE @International High Level Design
Validation and Test Workshop, pages 160–165, 2009.
[23] M. Franceschi, A. Nannarelli, and M. Valle. Tunable Floating-Point for Artificial Neural
Networks. In 2018 25th IEEE @International Conference on Electronics, Circuits and
Systems (ICECS), pages 289–292, 2018.
[24] S. H. Fatemi Langroudi, T. Pandit, and D. Kudithipudi. Deep Learning Inference on
Embedded Devices: Fixed-Point vs Posit. In 2018 1st Workshop on Energy Efficient
References 60
Machine Learning and Cognitive Computing for Embedded Applications (EMC2), pages
19–23, 2018.
[25] E. Morancho. Unum: Adaptive Floating-Point Arithmetic. In 2016 Euromicro Confer-
ence on Digital System Design (DSD), pages 651–656, 2016.
[26] Y. Zhang, X. Hu, X. Feng, Y. Hu, and X. Tang. An Analysis of Power Dissipation
Analysis and Power Dissipation optimization Methods in Digital Chip Layout Design.




I.1 FPU Top Level
1 module fpu (
2 c lk , r e s e t , s t a r t , done , op , a , b , y , exc ,
3 scan_ in0 , scan_en , t e s t_mode , s c a n _ o u t 0 ) ;
4
5 / / SINGLE VS . DOULE PRECISION CONSTANTS
6 p a r a m e t e r e_wid th = 8 ;
7 p a r a m e t e r f _ w i d t h = 2 3 ;
8 l o c a l p a r a m wid th = 1 + e_wid th + f _ w i d t h ;
9 l o c a l p a r a m t _ w i d t h = f _ w i d t h + 2 ;
10
11 / / OPCODES
12 l o c a l p a r a m op_add = 2 ’ b00 ;
13 l o c a l p a r a m op_sub = 2 ’ b01 ;
I.1 FPU Top Level I-2
14 l o c a l p a r a m op_mul = 2 ’ b10 ;
15 l o c a l p a r a m op_d iv = 2 ’ b11 ;
16
17 / / INPUTS
18 i n p u t c lk , r e s e t , s t a r t ;
19 i n p u t [ 1 : 0 ] op ;
20 i n p u t [ width −1:0] a , b ;
21
22 / / OUTPUTS
23 o u t p u t r e g [ width −1:0] y ;
24 o u t p u t r e g done , exc ;
25
26 i n p u t scan_ in0 , scan_en , t e s t _ m o d e ;
27 o u t p u t s c a n _ o u t 0 ;
28
29 / / STRUCTURAL CONNECTIONS
30 wi re f _ s i g n , f_done , f _ e x c _ f l a g ;
31 wi r e [ 2 : 0 ] f _ e x c ;
32 wi r e [ e_width −1:0] f_exp ;
33 wi r e [ t _ w i d t h −1:0] f _ f r a c ;
34 wi r e [ width −1:0] f ;
35 r e g [ width −1:0] f_a , f_b ;
36 r e g f _ s t a r t ;
37
38 / / STRUCTURAL COMPONENTS
I.1 FPU Top Level I-3
39 fp_dp # ( . e_wid th ( e_wid th ) , . f _ w i d t h ( f _ w i d t h ) ) d a t a _ p a t h (
40 . c l k ( c l k ) ,
41 . r e s e t ( r e s e t ) ,
42 . s c a n _ i n 0 ( ) ,
43 . scan_en ( scan_en ) ,
44 . t e s t _ m o d e ( t e s t _ m o d e ) ,
45 . s c a n _ o u t 0 ( ) ,
46 . s t a r t ( f _ s t a r t ) ,
47 . op ( op ) ,
48 . a ( f _ a ) ,
49 . b ( f_b ) ,
50 . y _ s i g n ( f _ s i g n ) ,
51 . y_exp ( f_exp ) ,
52 . y _ f r a c ( f _ f r a c ) ,
53 . y_exc ( f _ e x c ) ,
54 . done ( f_done ) ) ;
55
56 fp_pp # ( . e_wid th ( e_wid th ) , . f _ w i d t h ( f _ w i d t h ) ) p o s t _ p r o c e s s o r
(
57 . s c a n _ i n 0 ( ) ,
58 . scan_en ( scan_en ) ,
59 . t e s t _ m o d e ( t e s t _ m o d e ) ,
60 . s c a n _ o u t 0 ( ) ,
61 . a _ s i g n ( f _ s i g n ) ,
62 . a_exp ( f_exp ) ,
I.1 FPU Top Level I-4
63 . a _ f r a c ( f _ f r a c ) ,
64 . a_exc ( f _ e x c ) ,
65 . y ( f ) ,
66 . e x c _ f l a g ( f _ e x c _ f l a g ) ) ;
67
68 / / CONTROL UNIT
69 a lways @( posedge c l k o r posedge r e s e t ) b e g i n
70 i f ( r e s e t ) b e g i n
71 f _ a <= 0 ;
72 f_b <= 0 ;
73 f _ s t a r t <= 0 ;
74 done <= 0 ;
75 exc <= 0 ;
76 y <= 0 ;
77 end
78 e l s e b e g i n
79 i f ( s t a r t ) b e g i n
80 f _ a <= a ;
81 f_b <= b ;
82 f _ s t a r t <= s t a r t ;
83 done <= 1 ’ b0 ;
84 exc <= 1 ’ b0 ;
85 end
86 e l s e i f ( f_done ) b e g i n
87 y <= f ;
I.1 FPU Top Level I-5
88 exc <= f _ e x c _ f l a g ;
89 done <= 1 ’ b1 ;
90 end
91 e l s e b e g i n
92 f _ s t a r t <= 1 ’ b0 ;





I.2 FPU Data Path I-6
I.2 FPU Data Path
1 module fp_dp (
2 c lk , s t a r t , r e s e t , op , done ,
3 a , b , y_s ign , y_exp , y _ f r a c , y_exc ,
4 scan_ in0 , scan_en , t e s t_mode , s c a n _ o u t 0 ) ;
5
6 / / SINGLE VS . DOUBLE PRECISION CONSTANTS
7 p a r a m e t e r e_wid th = 8 ;
8 p a r a m e t e r f _ w i d t h = 2 3 ;
9 l o c a l p a r a m wid th = 1 + e_wid th + f _ w i d t h ;
10 l o c a l p a r a m o_wid th = f _ w i d t h + 2 ;
11
12 / / OPCODES
13 l o c a l p a r a m op_add = 2 ’ b00 ;
14 l o c a l p a r a m op_sub = 2 ’ b01 ;
15 l o c a l p a r a m op_mul = 2 ’ b10 ;
16 l o c a l p a r a m op_d iv = 2 ’ b11 ;
17
18 / / INPUTS
19 i n p u t c lk , s t a r t , r e s e t ;
20 i n p u t [ 1 : 0 ] op ;
21 i n p u t [ width −1:0] a , b ;
22
23 / / OUTPUTS
I.2 FPU Data Path I-7
24 o u t p u t done ;
25 o u t p u t y _ s i g n ;
26 o u t p u t [ 2 : 0 ] y_exc ;
27 o u t p u t [ e_width −1:0] y_exp ;
28 o u t p u t [ o_width −1:0] y _ f r a c ;
29
30 i n p u t scan_ in0 , scan_en , t e s t _ m o d e ;
31 o u t p u t s c a n _ o u t 0 ;
32
33 / / I /O STRUCTURAL CONNECTIONS
34 wi re a_s ign , a_NaN , a _ i n f , a _ z e r o ;
35 wi r e b_s ign , b_NaN , b _ i n f , b _z e r o ;
36 wi r e y_NaN , y _ i n f , y_zero , y_ovflw , y_unf lw ;
37 wi r e [ e_width −1:0] a_exp , b_exp ;
38 wi r e [ f_wid th −1:0] a _ f r a c , b _ f r a c ;
39
40 / / EXCEPTION BYPASS STRUCTURAL CONNECTIONS
41 wi re [ e_width −1:0] e_exp ;
42 wi r e [ o_width −1:0] e _ f r a c ;
43
44 / / MULTIPLIER STRUCTURAL CONNECTIONS
45 wi re m _ s t a r t , m_done , m_sign , m_ovflw , m_unflw ;
46 wi r e [ e_width −1:0] m_exp ;
47 wi r e [ o_width −1:0] m_frac ;
48
I.2 FPU Data Path I-8
49 / / DIVIDER STRUCTURAL CONNECTIONS
50 wi re d _ s t a r t , d_done , d_s ign , d_ovflw , d_unf lw ;
51 wi r e [ e_width −1:0] d_exp ;
52 wi r e [ o_width −1:0] d _ f r a c ;
53
54 / / ADDER STRUCTURAL CONNECTIONS
55 wi re a _ s t a r t , a_done , a_s , a _ s _ s i g n ;
56 wi r e [ e_width −1:0] a_s_exp ;
57 wi r e [ o_width −1:0] a _ s _ f r a c ;
58
59 / / EXCEPTION BITMASK
60 a s s i g n y_exc = {y_NaN , y _ i n f , y _z e r o } ;
61 a s s i g n y_unf lw = m_unflw | d_unf lw ;
62 a s s i g n y_ovf lw = m_ovflw | d_ovf lw ;
63
64 / / ARITHMETIC UNIT START FLAGS
65 a s s i g n a_s = ( op == op_add ) ? 1 ’ b0 : 1 ’ b1 ;
66 a s s i g n m _ s t a r t = ( op == op_mul ) ? s t a r t : 1 ’ b0 ;
67 a s s i g n d _ s t a r t = ( op == op_d iv ) ? s t a r t : 1 ’ b0 ;
68 a s s i g n a _ s t a r t = ( op == op_add | op == op_sub ) ? s t a r t : 1 ’
b0 ;
69
70 / / OUTPUT ASSIGNMENTS
71 a s s i g n y _ s i g n =
72 ( y_exc ) ? 1 ’ b0 :
I.2 FPU Data Path I-9
73 ( op == op_mul ) ? m_sign :
74 ( op == op_d iv ) ? d _ s i g n :
75 a _ s _ s i g n ;
76 a s s i g n y_exp =
77 ( y_exc ) ? e_exp :
78 ( op == op_mul ) ? m_exp :
79 ( op == op_d iv ) ? d_exp :
80 a_s_exp ;
81 a s s i g n y _ f r a c =
82 ( y_exc ) ? e _ f r a c :
83 ( op == op_mul ) ? m_frac :
84 ( op == op_d iv ) ? d _ f r a c :
85 a _ s _ f r a c ;
86 a s s i g n done =
87 ( y_exc ) ? 1 ’ b1 :
88 ( op == op_mul ) ? m_done :
89 ( op == op_d iv ) ? d_done :
90 a_done ;
91
92 / / STRUCTURAL COMPONENTS
93 fp_unpack # ( . e_wid th ( e_wid th ) , . f _ w i d t h ( f _ w i d t h ) ) unpack_a (
94 . a ( a ) ,
95 . s i g n ( a _ s i g n ) ,
96 . f r a c ( a _ f r a c ) ,
97 . exp ( a_exp ) ,
I.2 FPU Data Path I-10
98 . NaN ( a_NaN ) ,
99 . i n f ( a _ i n f ) ,
100 . z e r o ( a _ z e r o ) ,
101 . s c a n _ i n 0 ( ) ,
102 . scan_en ( scan_en ) ,
103 . t e s t _ m o d e ( t e s t _ m o d e ) ,
104 . s c a n _ o u t 0 ( ) ) ;
105
106 fp_unpack # ( . e_wid th ( e_wid th ) , . f _ w i d t h ( f _ w i d t h ) ) unpack_b (
107 . a ( b ) ,
108 . s i g n ( b _ s i g n ) ,
109 . f r a c ( b _ f r a c ) ,
110 . exp ( b_exp ) ,
111 . NaN ( b_NaN ) ,
112 . i n f ( b _ i n f ) ,
113 . z e r o ( b _z e r o ) ,
114 . s c a n _ i n 0 ( ) ,
115 . scan_en ( scan_en ) ,
116 . t e s t _ m o d e ( t e s t _ m o d e ) ,
117 . s c a n _ o u t 0 ( ) ) ;
118
119 f p_ ex c # ( . e_wid th ( e_wid th ) , . f _ w i d t h ( f _ w i d t h ) ) e x c _ h a n d l e r (
120 . op ( op ) ,
121 . a _ s i g n ( a _ s i g n ) ,
122 . a_NaN ( a_NaN ) ,
I.2 FPU Data Path I-11
123 . a _ i n f ( a _ i n f ) ,
124 . a _ z e r o ( a _ z e r o ) ,
125 . b _ s i g n ( b _ s i g n ) ,
126 . b_NaN ( b_NaN ) ,
127 . b _ i n f ( b _ i n f ) ,
128 . b _z e r o ( b _z e r o ) ,
129 . ovf lw ( y_ovf lw ) ,
130 . unf lw ( y_unf lw ) ,
131 . y_NaN ( y_NaN ) ,
132 . y _ i n f ( y _ i n f ) ,
133 . y _z e r o ( y _z e r o ) ,
134 . y_exp ( e_exp ) ,
135 . y _ f r a c ( e _ f r a c ) ,
136 . s c a n _ i n 0 ( ) ,
137 . scan_en ( scan_en ) ,
138 . t e s t _ m o d e ( t e s t _ m o d e ) ,
139 . s c a n _ o u t 0 ( ) ) ;
140
141
142 f p _ m u l t # ( . e_wid th ( e_wid th ) , . f _ w i d t h ( f _ w i d t h ) ) m u l t i p l i e r (
143 . c l k ( c l k ) ,
144 . r e s e t ( r e s e t ) ,
145 . s t a r t ( m _ s t a r t ) ,
146 . done ( m_done ) ,
147 . a _ s i g n ( a _ s i g n ) ,
I.2 FPU Data Path I-12
148 . a_exp ( a_exp ) ,
149 . a _ f r a c ( a _ f r a c ) ,
150 . b _ s i g n ( b _ s i g n ) ,
151 . b_exp ( b_exp ) ,
152 . b _ f r a c ( b _ f r a c ) ,
153 . y _ s i g n ( m_sign ) ,
154 . y_exp ( m_exp ) ,
155 . y _ f r a c ( m_frac ) ,
156 . y_of lw ( m_ovflw ) ,
157 . y_uf lw ( m_unflw ) ,
158 . s c a n _ i n 0 ( ) ,
159 . scan_en ( scan_en ) ,
160 . t e s t _ m o d e ( t e s t _ m o d e ) ,
161 . s c a n _ o u t 0 ( ) ) ;
162
163 f p _ d i v # ( . e_wid th ( e_wid th ) , . f _ w i d t h ( f _ w i d t h ) ) d i v i d e r (
164 . c l k ( c l k ) ,
165 . r e s e t ( r e s e t ) ,
166 . s t a r t ( d _ s t a r t ) ,
167 . done ( d_done ) ,
168 . a _ s i g n ( a _ s i g n ) ,
169 . a_exp ( a_exp ) ,
170 . a _ f r a c ( a _ f r a c ) ,
171 . b _ s i g n ( b _ s i g n ) ,
172 . b_exp ( b_exp ) ,
I.2 FPU Data Path I-13
173 . b _ f r a c ( b _ f r a c ) ,
174 . y _ s i g n ( d _ s i g n ) ,
175 . y_exp ( d_exp ) ,
176 . y _ f r a c ( d _ f r a c ) ,
177 . y_of lw ( d_ovf lw ) ,
178 . y_uf lw ( d_unf lw ) ,
179 . s c a n _ i n 0 ( ) ,
180 . scan_en ( scan_en ) ,
181 . t e s t _ m o d e ( t e s t _ m o d e ) ,
182 . s c a n _ o u t 0 ( ) ) ;
183
184 fp_add # ( . e_wid th ( e_wid th ) , . f _ w i d t h ( f _ w i d t h ) ) add_sub (
185 . c l k ( c l k ) ,
186 . r e s e t ( r e s e t ) ,
187 . s t a r t ( a _ s t a r t ) ,
188 . done ( a_done ) ,
189 . add_sub ( a_s ) ,
190 . a _ s i g n ( a _ s i g n ) ,
191 . a_exp ( a_exp ) ,
192 . a _ f r a c ( a _ f r a c ) ,
193 . b _ s i g n ( b _ s i g n ) ,
194 . b_exp ( b_exp ) ,
195 . b _ f r a c ( b _ f r a c ) ,
196 . y _ s i g n ( a _ s _ s i g n ) ,
197 . y_exp ( a_s_exp ) ,
I.2 FPU Data Path I-14
198 . y _ f r a c ( a _ s _ f r a c ) ,
199 . s c a n _ i n 0 ( ) ,
200 . scan_en ( scan_en ) ,
201 . t e s t _ m o d e ( t e s t _ m o d e ) ,
202 . s c a n _ o u t 0 ( ) ) ;
203
204 endmodule
I.3 FPU Unpack I-15
I.3 FPU Unpack
1 module fp_unpack (
2 a , s ign , exp , f r a c , NaN , i n f , ze ro ,
3 scan_ in0 , scan_en , t e s t_mode , s c a n _ o u t 0 ) ;
4
5 / / SINGLE VS . DOUBLE PRECISION CONSTANTS
6 p a r a m e t e r e_wid th = 8 ;
7 p a r a m e t e r f _ w i d t h = 2 3 ;
8 l o c a l p a r a m wid th = 1 + e_wid th + f _ w i d t h ;
9
10 / / INPUTS
11 i n p u t [ width −1:0] a ;
12
13 / / OUTPUTS
14 o u t p u t s ign , NaN , i n f , z e r o ;
15 o u t p u t [ e_width −1:0] exp ;
16 o u t p u t [ f_wid th −1:0] f r a c ;
17
18 i n p u t scan_ in0 , scan_en , t e s t _ m o d e ;
19 o u t p u t s c a n _ o u t 0 ;
20
21 / / SPLIT INPUT INTO COMPONENTS
22 a s s i g n s i g n = a [ width −1];
23 a s s i g n exp = a [ width −2: f _ w i d t h ] ;
I.3 FPU Unpack I-16
24 a s s i g n f r a c = a [ f_wid th −1 : 0 ] ;
25
26 / / CHECK FOR EXCEPTION CONDITIONS
27 a s s i g n i n f = ( ( exp == { ( e_wid th ) {1 ’ b1 } } ) && ( f r a c == 0) ) ?
1 ’ b1 : 1 ’ b0 ;
28 a s s i g n z e r o = ( a [ width −2:0] == { ( width −1) {1 ’ b0 } } ) ? 1 ’ b1 :
1 ’ b0 ;
29 a s s i g n NaN = ( ( exp == { ( e_wid th ) {1 ’ b1 } } ) && ( f r a c [ f_wid th
−2:0] != 0 ) ) ? 1 ’ b1 : 1 ’ b0 ;
30 endmodule
I.4 FPU Exception Handler I-17
I.4 FPU Exception Handler
1 module f p_ ex c (
2 op ,
3 a_s ign , a_NaN , a _ i n f , a_ze ro ,
4 b_s ign , b_NaN , b _ i n f , b_zero ,
5 ovflw , unflw , y_exp , y _ f r a c ,
6 y_NaN , y _ i n f , y_zero ,
7 scan_ in0 , scan_en , t e s t_mode , s c a n _ o u t 0 ) ;
8
9 / / SINGLE VS . DOUBLE PRECISION CONSTANTS
10 p a r a m e t e r e_wid th = 8 ;
11 p a r a m e t e r f _ w i d t h = 2 3 ;
12 l o c a l p a r a m o_wid th = f _ w i d t h + 2 ;
13
14 / / OPCODES
15 l o c a l p a r a m op_add = 2 ’ b00 ;
16 l o c a l p a r a m op_sub = 2 ’ b01 ;
17 l o c a l p a r a m op_mul = 2 ’ b10 ;
18 l o c a l p a r a m op_d iv = 2 ’ b11 ;
19
20 / / INPUTS
21 i n p u t [ 1 : 0 ] op ;
22 i n p u t ovflw , unf lw ;
23 i n p u t a_s ign , a_NaN , a _ i n f , a _ z e r o ;
I.4 FPU Exception Handler I-18
24 i n p u t b_s ign , b_NaN , b _ i n f , b _z e r o ;
25
26 / / OUTPUTS
27 o u t p u t r e g y_NaN , y _ i n f , y _z e r o ;
28 o u t p u t r e g [ e_width −1:0] y_exp ;
29 o u t p u t r e g [ o_width −1:0] y _ f r a c ;
30
31 i n p u t scan_ in0 , scan_en , t e s t _ m o d e ;
32 o u t p u t s c a n _ o u t 0 ;
33
34 a lways @* b e g i n
35 / / INITIALIZE EXCEPTIONS
36 y_NaN = 0 ;
37 y _ i n f = 0 ;
38 y_ z e r o = 0 ;
39 y_exp = 0 ;
40 y _ f r a c = 0 ;
41
42 / / CHECK OPCODE
43 c a s e ( op )
44
45 / / MULTIPLICATION EXCEPTIONS
46 op_mul : b e g i n
47 i f ( a_NaN | b_NaN | ( a _ i n f & b _z e r o ) | ( a _ z e r o & b _ i n f
) )
I.4 FPU Exception Handler I-19
48 y_NaN = 1 ’ b1 ;
49 e l s e i f ( a _ i n f | b _ i n f | ovf lw )
50 y _ i n f = 1 ’ b1 ;
51 e l s e i f ( a _ z e r o | b _z e r o | unf lw )
52 y _z e r o = 1 ’ b1 ;
53 end
54
55 / / DIVISION EXCEPTIONS
56 op_d iv : b e g i n
57 i f ( a_NaN | b_NaN | ( a _ z e r o & b _z e r o ) | ( a _ i n f & b _ i n f
) )
58 y_NaN = 1 ’ b1 ;
59 e l s e i f ( b _z e r o | ovf lw )
60 y _ i n f = 1 ’ b1 ;
61 e l s e i f ( a _ z e r o | b _ i n f | unf lw )
62 y _z e r o = 1 ’ b1 ;
63 end
64
65 / / ADDITION EXCEPTIONS
66 op_add : b e g i n
67 i f ( a_NaN | b_NaN | ( ( a _ s i g n ^ b _ s i g n ) & a _ i n f & b _ i n f
) )
68 y_NaN = 1 ’ b1 ;
69 e l s e i f ( a _ i n f | b _ i n f )
70 y _ i n f = 1 ’ b1 ;
I.4 FPU Exception Handler I-20
71 end
72
73 / / SUBTRACTION EXCEPTIONS
74 op_sub : b e g i n
75 i f ( a_NaN | b_NaN | ( ! ( a _ s i g n ^ b _ s i g n ) & a _ i n f &
b _ i n f ) )
76 y_NaN = 1 ’ b1 ;
77 e l s e i f ( a _ i n f | b _ i n f )
78 y _ i n f = 1 ’ b1 ;
79 end
80
81 / / DEFAULT STATE
82 d e f a u l t : b e g i n
83 y_NaN = 1 ’ b0 ;
84 y _ i n f = 1 ’ b0 ;
85 y _z e r o = 1 ’ b0 ;
86 end
87 e n d c a s e
88 i f ( y_NaN ) b e g i n
89 y_exp = { ( e_wid th ) {1 ’ b1 } } ;
90 y _ f r a c = {1 ’ b1 , { ( o_width −1) {1 ’ b0 } } } ;
91 end
92 e l s e i f ( y _ i n f ) b e g i n
93 y_exp = { ( e_wid th ) {1 ’ b1 } } ;
94 y _ f r a c = { ( o_wid th ) {1 ’ b0 } } ;
I.4 FPU Exception Handler I-21
95 end
96 e l s e i f ( y _z e r o ) b e g i n
97 y_exp = 0 ;




I.5 FPU Adder/Subtractor I-22
I.5 FPU Adder/Subtractor
1 module fp_add (
2 c lk , r e s e t , s t a r t , done , add_sub ,
3 a_s ign , a_exp , a _ f r a c ,
4 b_s ign , b_exp , b _ f r a c ,
5 y_s ign , y_exp , y _ f r a c ,
6 scan_ in0 , scan_en , t e s t_mode , s c a n _ o u t 0 ) ;
7
8 / / SINGLE VS . DOUBLE PRECISION CONSTANTS
9 p a r a m e t e r e_wid th = 8 ;
10 p a r a m e t e r f _ w i d t h = 2 3 ;
11 l o c a l p a r a m o_wid th = f _ w i d t h + 2 ;
12 l o c a l p a r a m wid th = 1 + e_wid th + f _ w i d t h ;
13 l o c a l p a r a m l o d _ w i d t h = $c lo g2 ( wid th ) + 1 ;
14
15 / / FSM STATES
16 l o c a l p a r a m s t _ s t a r t = 2 ’ b00 ;
17 l o c a l p a r a m s t _ o p = 2 ’ b01 ;
18 l o c a l p a r a m st_norm = 2 ’ b10 ;
19 l o c a l p a r a m s t _ d o n e = 2 ’ b11 ;
20
21 / / INPUTS
22 i n p u t c lk , r e s e t , s t a r t , add_sub ;
23 i n p u t a_s ign , b _ s i g n ;
I.5 FPU Adder/Subtractor I-23
24 i n p u t [ e_width −1:0] a_exp , b_exp ;
25 i n p u t [ f_wid th −1:0] a _ f r a c , b _ f r a c ;
26
27 / / OUTPUTS
28 o u t p u t r e g done , y _ s i g n ;
29 o u t p u t r e g [ e_width −1:0] y_exp ;
30 o u t p u t r e g [ o_width −1:0] y _ f r a c ;
31
32 i n p u t scan_ in0 , scan_en , t e s t _ m o d e ;
33 o u t p u t s c a n _ o u t 0 ;
34
35 / / INTERNAL CONNECTIONS
36 wi re a_norm , b_norm ;
37 wi r e [ lod_wid th −1:0] l o d _ i n d e x ;
38
39 / / INTERNAL VARIABLES
40 r e g add_x_s ign , a d d _ y _ s i g n ;
41 r e g [ o_wid th + 1 : 0 ] add_x , add_y ;
42 r e g [ o_wid th + 1 : 0 ] a d d _ r e s u l t ;
43 r e g [ o_wid th : 0 ] n o r m _ r e s u l t ;
44 r e g [ e_width −1:0] norm_exp ;
45
46 / / STATE REGISTER
47 r e g [ 1 : 0 ] c u r r _ s t a t e = s t _ s t a r t ;
48
I.5 FPU Adder/Subtractor I-24
49 / / LEADING ONE DETECTOR FOR NORMALIZATION
50 f i n d _ f i r s t # ( o_wid th +2) a _ l o d (
51 . i n ( a d d _ r e s u l t ) ,
52 . i n d e x ( l o d _ i n d e x ) ,
53 . s c a n _ i n 0 ( ) ,
54 . scan_en ( scan_en ) ,
55 . t e s t _ m o d e ( t e s t _ m o d e ) ,
56 . s c a n _ o u t 0 ( ) ) ;
57
58 / / ASSIGN HIDDEN BIT
59 a s s i g n a_norm = ( a_exp != 0) ? 1 ’ b1 : 1 ’ b0 ;
60 a s s i g n b_norm = ( b_exp != 0) ? 1 ’ b1 : 1 ’ b0 ;
61
62 / / ADDITION UNIT STATE MACHINE
63 a lways @( posedge c l k o r posedge r e s e t ) b e g i n
64 i f ( r e s e t ) b e g i n
65 add_x <= 0 ;
66 a d d _ x _ s i g n <= 0 ;
67 add_y <= 0 ;
68 a d d _ y _ s i g n <= 0 ;
69 a d d _ r e s u l t <= 0 ;
70 n o r m _ r e s u l t <= 0 ;
71 norm_exp <= 0 ;
72 done <= 0 ;
73 y _ s i g n <= 0 ;
I.5 FPU Adder/Subtractor I-25
74 y_exp <= 0 ;
75 y _ f r a c <= 0 ;
76 c u r r _ s t a t e <= s t _ s t a r t ;
77 end
78 e l s e b e g i n
79 c a s e ( c u r r _ s t a t e )
80 / / START STATE
81 s t _ s t a r t : b e g i n
82 done <= 0 ;
83 i f ( s t a r t == 1) b e g i n
84 / / DISTRIBUTE SIGNS
85 a d d _ x _ s i g n <= a _ s i g n ;
86 a d d _ y _ s i g n <= add_sub ^ b _ s i g n ;
87 / / A+B CASE
88 i f ( a_exp == b_exp ) b e g i n
89 / / ASSIGN FULL FRACTION
90 add_x <= {1 ’ b0 , a_norm , a _ f r a c , 2 ’ b0 } ;
91 add_y <= {1 ’ b0 , b_norm , b _ f r a c , 2 ’ b0 } ;
92 norm_exp <= a_exp ;
93 end
94 / / A−B CASE
95 e l s e i f ( a_exp > b_exp ) b e g i n
96 / / EQUALIZE EXPONENTS
97 add_x <= {1 ’ b0 , a_norm , a _ f r a c , 2 ’ b0 } ;
I.5 FPU Adder/Subtractor I-26
98 add_y <= {1 ’ b0 , b_norm , b _ f r a c , 2 ’ b0 } >> (
a_exp − b_exp ) ;
99 norm_exp <= a_exp ;
100 end
101 / / B−A CASE
102 e l s e b e g i n
103 / / EQUALIZE EXPONENTS
104 add_x <= {1 ’ b0 , a_norm , a _ f r a c , 2 ’ b0 } >> (
b_exp − a_exp ) ;
105 add_y <= {1 ’ b0 , b_norm , b _ f r a c , 2 ’ b0 } ;
106 norm_exp <= b_exp ;
107 end
108 / / SET NEXT STATE
109 done <= 1 ’ b0 ;
110 c u r r _ s t a t e <= s t _ o p ;
111 end
112 / / CLEAR VALUES WHEN IDLE
113 e l s e b e g i n
114 a d d _ x _ s i g n <= 0 ;
115 a d d _ y _ s i g n <= 0 ;
116 add_x <= 0 ;
117 add_y <= 0 ;
118 a d d _ r e s u l t <= 0 ;
119 n o r m _ r e s u l t <= 0 ;
120 norm_exp <= 0 ;




124 / / OPERATION STATE
125 s t _ o p : b e g i n
126 / / A+B OPERATION
127 i f ( a d d _ x _ s i g n == a d d _ y _ s i g n ) b e g i n
128 y _ s i g n <= a d d _ x _ s i g n ;
129 a d d _ r e s u l t <= add_x + add_y ;
130 end
131 / / A−B OPERATION
132 e l s e i f ( add_x > add_y ) b e g i n
133 y _ s i g n <= a d d _ x _ s i g n ;
134 a d d _ r e s u l t <= add_x − add_y ;
135 end
136 / / A−B=0 OPERATION
137 e l s e i f ( add_x == add_y ) b e g i n
138 y _ s i g n <= 0 ;
139 a d d _ r e s u l t <= 0 ;
140 end
141 / / B−A OPERATION
142 e l s e b e g i n
143 y _ s i g n <= a d d _ y _ s i g n ;
144 a d d _ r e s u l t <= add_y − add_x ;
145 end
I.5 FPU Adder/Subtractor I-28
146 / / SET NEXT STATE
147 c u r r _ s t a t e <= s t_norm ;
148 end
149
150 / / NORMALIZATION STATE
151 s t_norm : b e g i n
152 / / LEADING ONE IS HIDDEN BIT
153 i f ( l o d _ i n d e x == ( o_wid th ) )
154 n o r m _ r e s u l t <= a d d _ r e s u l t [ o_wid th : 0 ] ;
155
156 / / LEADING ONE IS CARRY BIT ( SHIFT RIGHT BY ONE)
157 e l s e i f ( l o d _ i n d e x > f _ w i d t h ) b e g i n
158 norm_exp <= norm_exp + 1 ’ b1 ;
159 n o r m _ r e s u l t <= a d d _ r e s u l t [ o_wid th + 1 : 1 ] ;
160 end
161
162 / / LEADING ONE IS TO RIGHT OF HIDDEN BIT ( SHIFT LEFT BY
DIFFERENCE )
163 e l s e b e g i n
164 norm_exp <=
165 norm_exp − ( o_wid th [ e_width −1:0]− l o d _ i n d e x ) ;
166 n o r m _ r e s u l t <=
167 a d d _ r e s u l t [ o_wid th + 1 : 0 ] << ( o_width−l o d _ i n d e x ) ;
168 end
169 / / SET NEXT STATE
I.5 FPU Adder/Subtractor I-29
170 c u r r _ s t a t e <= s t _ d o n e ;
171 end
172
173 / / FINISHED STATE
174 s t _ d o n e : b e g i n
175 / / CONVERT NORMALIZED EXPONENT TO HIGHER PRECISION
176 y_exp <= norm_exp ;
177 / / CONVERT FRACTION TO HIGHER PRECISION
178 y _ f r a c <=
179 n o r m _ r e s u l t [ o_width −1 : 0 ] ;
180 / / SET DONE FLAG AND RESET STATE
181 done <= 1 ’ b1 ;
182 c u r r _ s t a t e <= s t _ s t a r t ;
183 end




I.6 FPU Multiplier I-30
I.6 FPU Multiplier
1 module f p _ m u l t ( c lk , s t a r t , done , r e s e t ,
2 a_s ign , a_exp , a _ f r a c ,
3 b_s ign , b_exp , b _ f r a c ,
4 y_s ign , y_exp , y _ f r a c ,
5 y_oflw , y_uflw ,
6 scan_ in0 , scan_en , t e s t_mode , s c a n _ o u t 0 ) ;
7
8 / / SINGLE VS . DOUBLE PRECISION CONSTANTS
9 p a r a m e t e r e_wid th = 8 ;
10 p a r a m e t e r f _ w i d t h = 2 3 ;
11 l o c a l p a r a m wid th = 1 + e_wid th + f _ w i d t h ;
12 l o c a l p a r a m o u t _ w i d t h = wid th + 2 ;
13 l o c a l p a r a m o_wid th = f _ w i d t h + 2 ;
14 l o c a l p a r a m v e d i c _ s i z e = 2 ** $c lo g2 ( f _ w i d t h ) ;
15 l o c a l p a r a m b i a s _ c o n v = (2 ** e_wid th ) − 1 ;
16 l o c a l p a r a m n o r m _ f r a c _ w i d t h = 2 * f _ w i d t h ;
17 l o c a l p a r a m l o d _ w i d t h = $c lo g2 ( wid th ) + 1 ;
18
19 / / FSM STATES
20 l o c a l p a r a m s t _ s t a r t = 2 ’ b00 ;
21 l o c a l p a r a m s t _ o p = 2 ’ b01 ;
22 l o c a l p a r a m st_norm = 2 ’ b10 ;
23 l o c a l p a r a m s t _ d o n e = 2 ’ b11 ;
I.6 FPU Multiplier I-31
24
25 / / INPUTS
26 i n p u t c lk , s t a r t , r e s e t ;
27 i n p u t a_s ign , b _ s i g n ;
28 i n p u t [ e_width −1:0] a_exp , b_exp ;
29 i n p u t [ f_wid th −1:0] a _ f r a c , b _ f r a c ;
30
31 / / OUTPUTS
32 o u t p u t r e g done , y_s ign , y_oflw , y_uf lw ;
33 o u t p u t r e g [ e_width −1:0] y_exp ;
34 o u t p u t r e g [ o_width −1:0] y _ f r a c ;
35
36 i n p u t scan_ in0 , scan_en , t e s t _ m o d e ;
37 o u t p u t s c a n _ o u t 0 ;
38
39 / / INTERNAL CONNECTIONS
40 wi re a_norm , b_norm ;
41 wi r e [ l o d _ w i d t h : 0 ] l o d _ i n d e x ;
42 r e g [ norm_f rac_wid th −1:0] n o r m _ r e s u l t ;
43 r e g [ e_wid th + 1 : 0 ] norm_exp ;
44
45 / / VEDIC MULTIPLIER CONNECTIONS
46 r e g [ v e d i c _ s i z e −1:0] v e d i c _ i n _ a , v e d i c _ i n _ b ;
47 wi r e [2* v e d i c _ s i z e −1:0] v e d i c _ o u t ;
48
I.6 FPU Multiplier I-32
49 / / STATE REGISTER
50 r e g [ 2 : 0 ] c u r r _ s t a t e ;
51
52 / / VEDIC MULTIPLIER WITH EXTENDED INPUTS
53 v e d i c _ r e c # ( v e d i c _ s i z e ) mul t (
54 . a ( v e d i c _ i n _ a ) ,
55 . b ( v e d i c _ i n _ b ) ,
56 . y ( v e d i c _ o u t ) ,
57 . s c a n _ i n 0 ( ) ,
58 . scan_en ( scan_en ) ,
59 . t e s t _ m o d e ( t e s t _ m o d e ) ,
60 . s c a n _ o u t 0 ( ) ) ;
61
62 / / LEADING ONE DETECTOR FOR NORMALIZATION
63 f i n d _ f i r s t # (2* v e d i c _ s i z e ) m_lod (
64 . i n ( v e d i c _ o u t ) ,
65 . i n d e x ( l o d _ i n d e x ) ,
66 . s c a n _ i n 0 ( ) ,
67 . scan_en ( scan_en ) ,
68 . t e s t _ m o d e ( t e s t _ m o d e ) ,
69 . s c a n _ o u t 0 ( ) ) ;
70
71 / / ASSIGN HIDDEN BIT
72 a s s i g n a_norm = ( a_exp != 0) ? 1 ’ b1 : 1 ’ b0 ;
73 a s s i g n b_norm = ( b_exp != 0) ? 1 ’ b1 : 1 ’ b0 ;
I.6 FPU Multiplier I-33
74
75 / / MULTIPLIER STATE MACHINE
76 a lways @( posedge c l k o r posedge r e s e t ) b e g i n
77 / / RESET STATE
78 i f ( r e s e t ) b e g i n
79 c u r r _ s t a t e <= s t _ s t a r t ;
80 v e d i c _ i n _ a <= 0 ;
81 v e d i c _ i n _ b <= 0 ;
82 norm_exp <= 0 ;
83 n o r m _ r e s u l t <= 0 ;
84 y_of lw <= 0 ;
85 y_uf lw <= 0 ;
86 y _ s i g n <= 0 ;
87 y_exp <= 0 ;
88 y _ f r a c <= 0 ;
89 done <= 0 ;
90 end
91 e l s e c a s e ( c u r r _ s t a t e )
92 / / START STATE
93 s t _ s t a r t : b e g i n
94 y_of lw <= 0 ;
95 y_uf lw <= 0 ;
96 done <= 0 ;
97 i f ( s t a r t == 1) b e g i n
98 / / ASSIGN FULL FRACTION
I.6 FPU Multiplier I-34
99 v e d i c _ i n _ a <= { a_norm , a _ f r a c } ;
100 v e d i c _ i n _ b <= {b_norm , b _ f r a c } ;
101 / / SET NEXT STATE




106 / / OPERATION STATE
107 s t _ o p : b e g i n
108 / / DETERMINE RESULT SIGN
109 y _ s i g n <= a _ s i g n ^ b _ s i g n ;
110 / / ADD EXPONENTS
111 norm_exp <=
112 {2 ’ b0 , a_exp } + {2 ’ b0 , b_exp } − b i a s _ c o n v ;
113 / / SET NEXT STATE
114 c u r r _ s t a t e <= s t_norm ;
115 end
116
117 / / NORMALIZATION STATE
118 s t_norm : b e g i n
119 / / LEADING ONE IS HIDDEN BIT
120 i f ( l o d _ i n d e x == ( n o r m _ f r a c _ w i d t h ) )
121 n o r m _ r e s u l t <= v e d i c _ o u t [ no rm_f rac_wid th −1 : 0 ] ;
122
I.6 FPU Multiplier I-35
123 / / LEADING ONE IS TO LEFT OF HIDDEN BIT ( SHIFT RIGHT BY
DIFFERENCE )
124 e l s e i f ( l o d _ i n d e x > n o r m _ f r a c _ w i d t h ) b e g i n
125 norm_exp <=
126 norm_exp + 1 ’ b1 ;
127 / / norm_exp + ( l o d _ i n d e x − n o r m _ f r a c _ w i d t h ) ;
128 n o r m _ r e s u l t <=
129 v e d i c _ o u t [ n o r m _ f r a c _ w i d t h : 1 ] >> 1 ;
130 / / v e d i c _ o u t [ n o r m _ f r a c _ w i d t h : 1 ] >> ( l o d _ i n d e x −
n o r m _ f r a c _ w i d t h ) ;
131 end
132
133 / / LEADING ONE IS TO RIGHT OF HIDDEN BIT ( SHIFT LEFT BY
DIFFERENCE )
134 e l s e b e g i n
135 norm_exp <=
136 norm_exp − 1 ’ b1 ;
137 / / norm_exp − ( no rm_f rac_wid th−l o d _ i n d e x ) ;
138 n o r m _ r e s u l t <=
139 v e d i c _ o u t [ no rm_f rac_wid th −1:0] << 1 ;
140 / / v e d i c _ o u t [ no rm_f rac_wid th −1:0] << (
norm_f rac_wid th−l o d _ i n d e x ) ;
141 end
142 / / SET NEXT STATE
143 c u r r _ s t a t e <= s t _ d o n e ;
I.6 FPU Multiplier I-36
144 end
145
146 / / FINISHED STATE
147 s t _ d o n e : b e g i n
148 / / CHECK FOR EXPONENT UNDERFLOW
149 i f ( norm_exp [ e_wid th + 1 ] ) b e g i n
150 y_exp <= 0 ;
151 y _ f r a c <= 0 ;
152 y_uf lw <= 1 ’ b1 ;
153 end
154 / / CHECK FOR EXPONENT OVERFLOW
155 e l s e i f ( norm_exp [ e_wid th ] ) b e g i n
156 / / ASSIGN INF AND SET OVERFLOW FLAG
157 y_exp <= { ( e_wid th ) {1 ’ b1 } } ;
158 y _ f r a c <= 0 ;
159 y_of lw <= 1 ’ b1 ;
160 end
161 e l s e b e g i n
162 / / OUTPUT EXPONENT
163 y_exp <= norm_exp [ e_width −1 : 0 ] ;
164 / / OUTPUT FRACTION
165 y _ f r a c <= n o r m _ r e s u l t [ no rm_f rac_wid th −1:
no rm_f rac_wid th−o_wid th ] ;
166 end
167 / / SET DONE FLAG AND RESET STATE
I.6 FPU Multiplier I-37
168 done <= 1 ’ b1 ;
169 c u r r _ s t a t e <= s t _ s t a r t ;
170
171 end
172 e n d c a s e
173 end
174 endmodule
I.7 FPU Divider I-38
I.7 FPU Divider
1 module f p _ d i v ( c lk , s t a r t , done , r e s e t ,
2 a_s ign , a_exp , a _ f r a c ,
3 b_s ign , b_exp , b _ f r a c ,
4 y_s ign , y_exp , y _ f r a c ,
5 y_uflw , y_oflw ,
6 scan_ in0 , scan_en , t e s t_mode , s c a n _ o u t 0 ) ;
7
8 / / SINGLE VS . DOUBLE PRECISION CONSTANTS
9 p a r a m e t e r e_wid th = 8 ;
10 p a r a m e t e r f _ w i d t h = 2 3 ;
11 l o c a l p a r a m o_wid th = f _ w i d t h + 2 ;
12 l o c a l p a r a m d i v _ w i d t h = f _ w i d t h + 4 ;
13 l o c a l p a r a m wid th = 1 + e_wid th + f _ w i d t h ;
14 l o c a l p a r a m o u t _ w i d t h = wid th + 2 ;
15 l o c a l p a r a m b i a s _ c o n v = (2 ** e_wid th ) − 1 ;
16 l o c a l p a r a m l o d _ w i d t h = $c lo g2 ( wid th ) ;
17
18 / / FSM STATES
19 l o c a l p a r a m s t _ s t a r t = 2 ’ b00 ;
20 l o c a l p a r a m s t _ o p = 2 ’ b01 ;
21 l o c a l p a r a m st_norm = 2 ’ b10 ;
22 l o c a l p a r a m s t _ d o n e = 2 ’ b11 ;
23
I.7 FPU Divider I-39
24 / / INPUTS
25 i n p u t c lk , s t a r t , r e s e t ;
26 i n p u t a_s ign , b _ s i g n ;
27 i n p u t [ e_width −1:0] a_exp , b_exp ;
28 i n p u t [ f_wid th −1:0] a _ f r a c , b _ f r a c ;
29
30 / / OUTPUTS
31 o u t p u t r e g done , y_s ign , y_uflw , y_of lw ;
32 o u t p u t r e g [ e_width −1:0] y_exp ;
33 o u t p u t r e g [ o_width −1:0] y _ f r a c ;
34
35 i n p u t scan_ in0 , scan_en , t e s t _ m o d e ;
36 o u t p u t s c a n _ o u t 0 ;
37
38 / / INTERNAL CONNECTIONS
39 wi re a_norm , b_norm ;
40 wi r e [ l o d _ w i d t h : 0 ] l o d _ i n d e x ;
41 r e g l t _ f l a g ;
42 r e g [ o_wid th : 0 ] n o r m _ r e s u l t ;
43 r e g [ e_wid th + 1 : 0 ] norm_exp ;
44
45 / / DIVIDER CONNECTIONS
46 r e g d i v_ en ;
47 wi r e d iv_done ;
48 r e g [ o_wid th : 0 ] d iv_ in_n , d i v _ i n _ d ;
I.7 FPU Divider I-40
49 wi r e [ o_wid th : 0 ] d iv_ou t_q , d i v _ o u t _ r ;
50
51 / / STATE REGISTER
52 r e g [ 2 : 0 ] c u r r _ s t a t e = s t _ s t a r t ;
53
54 / / COMBINATIONAL DIVIDER UNIT
55 d i v i d e r # ( o_wid th +1) d i v (
56 . c l k ( c l k ) ,
57 . r e s e t ( r e s e t ) ,
58 . s c a n _ i n 0 ( ) ,
59 . scan_en ( scan_en ) ,
60 . t e s t _ m o d e ( t e s t _ m o d e ) ,
61 . s c a n _ o u t 0 ( ) ,
62 . en ( d i v_ en ) ,
63 . done ( d iv_done ) ,
64 . num ( d i v _ i n _ n ) ,
65 . den ( d i v _ i n _ d ) ,
66 . q ( d i v _ o u t _ q ) ,
67 . r ( d i v _ o u t _ r ) ) ;
68
69 / / LEADING ONE DETECTOR FOR NORMALIZATION
70 f i n d _ f i r s t # ( o_wid th +1 , " l e a d " ) d_ lod (
71 . s c a n _ i n 0 ( ) ,
72 . scan_en ( scan_en ) ,
73 . t e s t _ m o d e ( t e s t _ m o d e ) ,
I.7 FPU Divider I-41
74 . s c a n _ o u t 0 ( ) ,
75 . i n ( d i v _ o u t _ q ) ,
76 . i n d e x ( l o d _ i n d e x ) ) ;
77
78 / / ASSIGN HIDDEN BIT
79 a s s i g n a_norm = ( a_exp != 0) ? 1 ’ b1 : 1 ’ b0 ;
80 a s s i g n b_norm = ( b_exp != 0) ? 1 ’ b1 : 1 ’ b0 ;
81
82 / / DIVIDER STATE MACHINE
83 a lways @( posedge c l k o r posedge r e s e t ) b e g i n
84 / / RESET STATE
85 i f ( r e s e t ) b e g i n
86 c u r r _ s t a t e <= s t _ s t a r t ;
87 d i v _ i n _ d <= 0 ;
88 d i v _ i n _ n <= 0 ;
89 d i v_ en <= 0 ;
90 n o r m _ r e s u l t <= 0 ;
91 norm_exp <= 0 ;
92 l t _ f l a g <= 0 ;
93 y_uf lw <= 0 ;
94 y_of lw <= 0 ;
95 y _ s i g n <= 0 ;
96 y_exp <= 0 ;
97 y _ f r a c <= 0 ;
98 done <= 0 ;
I.7 FPU Divider I-42
99 end
100 e l s e c a s e ( c u r r _ s t a t e )
101 / / START STATE
102 s t _ s t a r t : b e g i n
103 done <= 0 ;
104 y_uf lw <= 0 ;
105 y_of lw <= 0 ;
106 i f ( s t a r t == 1) b e g i n
107 / / ASSIGN FULL FRACTION ; SHIFT SO NUM > DEN
108 d i v _ i n _ n <= { a_norm , a _ f r a c , 2 ’ b0 } ;
109 i f ( a _ f r a c < b _ f r a c ) b e g i n
110 d i v _ i n _ d <= {1 ’ b0 , b_norm , b _ f r a c , 1 ’ b0 } ;
111 l t _ f l a g <= 1 ;
112 end
113 e l s e
114 d i v _ i n _ d <= {b_norm , b _ f r a c , 2 ’ b0 } ;
115 d i v_ en <= 1 ;
116 / / SET NEXT STATE




121 / / OPERATION STATE
122 s t _ o p : b e g i n
123 i f ( d iv_done ) b e g i n
I.7 FPU Divider I-43
124 / / DETERMINE RESULT SIGN
125 y _ s i g n <= a _ s i g n ^ b _ s i g n ;
126 / / SUB EXPONENTS
127 norm_exp <=
128 {2 ’ b0 , a_exp } − {2 ’ b0 , b_exp } + b i a s _ c o n v ;
129 n o r m _ r e s u l t <= d i v _ o u t _ q ;
130 / / SET NEXT STATE
131 c u r r _ s t a t e <= s t_norm ;
132 end
133 e l s e
134 c u r r _ s t a t e <= s t _ o p ;
135 end
136
137 / / NORMALIZATION STATE
138 s t_norm : b e g i n
139 / / NORMALIZE FRACTION
140 n o r m _ r e s u l t <=
141 n o r m _ r e s u l t [ o_wid th : 0 ] << ( o_width−l o d _ i n d e x ) ;
142 / / CHECK IF DENOMINATOR WAS BIASED
143 i f ( l t _ f l a g ) b e g i n
144 / / NORMALIZE FRACTION AND UNBIAS DENOMINATOR
145 norm_exp <= norm_exp − 1 ’ b1 ;
146 end
147 d i v_ en <= 0 ;
148 / / SET NEXT STATE
I.7 FPU Divider I-44
149 c u r r _ s t a t e <= s t _ d o n e ;
150 end
151
152 / / FINISHED STATE
153 s t _ d o n e : b e g i n
154 / / CHECK FOR EXPONENT UNDERFLOW
155 i f ( norm_exp [ e_wid th + 1 ] ) b e g i n
156 / / ASSIGN ZERO AND SET UNDERFLOW FLAG
157 y_exp <= 0 ;
158 y _ f r a c <= 0 ;
159 y_uf lw <= 1 ’ b1 ;
160 end
161 / / CHECK FOR EXPONENT OVERFLOW
162 e l s e i f ( norm_exp [ e_wid th ] ) b e g i n
163 y_exp <= { ( e_wid th ) {1 ’ b1 } } ;
164 y _ f r a c <= 0 ;
165 y_of lw <= 1 ’ b1 ;
166 end
167 e l s e b e g i n
168 / / CONVERT EXPONENT TO HIGHER PRECISION
169 y_exp <= norm_exp [ e_width −1 : 0 ] ;
170 / / CONVERT FRACTION TO HIGHER PRECISION [ 2 6 : 0 ]
171 y _ f r a c <= n o r m _ r e s u l t [ o_width −1 : 0 ] ;
172 end
173 / / SET DONE FLAG AND RESET STATE
I.7 FPU Divider I-45
174 done <= 1 ’ b1 ;
175 c u r r _ s t a t e <= s t _ s t a r t ;
176 end
177 e n d c a s e
178 end
179 endmodule
I.8 FPU Post Processor I-46
I.8 FPU Post Processor
1 module fp_pp (
2 a_s ign , a_exp , a _ f r a c , a_exc ,
3 y , e x c _ f l a g ,
4 scan_ in0 , scan_en , t e s t_mode , s c a n _ o u t 0 ) ;
5
6 / / SINGLE VS . DOUBLE PRECISION CONSTANTS
7 p a r a m e t e r e_wid th = 8 ;
8 p a r a m e t e r f _ w i d t h = 2 3 ;
9 l o c a l p a r a m wid th = 1 + e_wid th + f _ w i d t h ;
10 l o c a l p a r a m i _ w i d t h = f _ w i d t h + 2 ;
11
12 / / INPUTS
13 i n p u t a _ s i g n ;
14 i n p u t [ 2 : 0 ] a_exc ;
15 i n p u t [ e_width −1:0] a_exp ;
16 i n p u t [ i _ w i d t h −1:0] a _ f r a c ;
17
18 / / OUTPUTS
19 o u t p u t e x c _ f l a g ;
20 o u t p u t r e g [ width −1:0] y ;
21
22 i n p u t scan_ in0 , scan_en , t e s t _ m o d e ;
23 o u t p u t s c a n _ o u t 0 ;
I.8 FPU Post Processor I-47
24
25 / / INDIVIDUAL EXCEPTION FLAGS
26 wi re y_NaN , y _ i n f , y _z e r o ;
27
28 / / INTERNAL CONNECTIONS
29 wi re [ e_width −1:0] y_exp ;
30 wi r e [ f_wid th −1:0] y _ f r a c ;
31
32 / / SPLIT EXCEPTION MASK INTO COMPONENTS
33 a s s i g n y_NaN = a_exc [ 2 ] ;
34 a s s i g n y _ i n f = a_exc [ 1 ] ;
35 a s s i g n y _z e r o = a_exc [ 0 ] ;
36
37 / / SET OUTPUT EXCEPTION FLAG
38 a s s i g n e x c _ f l a g = y_NaN | y _ i n f | y _z e r o ;
39
40 / / PRECISION DOWNCONVERTER
41 f p _ r o u n d # ( . e_wid th ( e_wid th ) , . f _ w i d t h ( f _ w i d t h ) ) round_y (
42 . a_exp ( a_exp ) ,
43 . a _ f r a c ( a _ f r a c ) ,
44 . y_exp ( y_exp ) ,
45 . y _ f r a c ( y _ f r a c ) ,
46 . s c a n _ i n 0 ( ) ,
47 . scan_en ( scan_en ) ,
48 . t e s t _ m o d e ( t e s t _ m o d e ) ,
I.8 FPU Post Processor I-48
49 . s c a n _ o u t 0 ( ) ) ;
50
51 / / DATA PACKING
52 fp_pack # ( . e_wid th ( e_wid th ) , . f _ w i d t h ( f _ w i d t h ) ) pack_y (
53 . s i g n ( a _ s i g n ) ,
54 . exp ( y_exp ) ,
55 . f r a c ( y _ f r a c ) ,
56 . o u t ( y ) ,
57 . s c a n _ i n 0 ( ) ,
58 . scan_en ( scan_en ) ,
59 . t e s t _ m o d e ( t e s t _ m o d e ) ,
60 . s c a n _ o u t 0 ( ) ) ;
61
62 endmodule
I.9 FPU Rounding I-49
I.9 FPU Rounding
1 module f p _ r o u n d (
2 a_exp , a _ f r a c , y _ f r a c , y_exp ,
3 scan_ in0 , scan_en , t e s t_mode , s c a n _ o u t 0 ) ;
4
5 / / SINGLE VS . DOUBLE PRECISION CONSTANTS
6 p a r a m e t e r e_wid th = 8 ;
7 p a r a m e t e r f _ w i d t h = 2 3 ;
8 l o c a l p a r a m i _ w i d t h = f _ w i d t h + 2 ;
9
10 / / INPUTS
11 i n p u t [ e_width −1:0] a_exp ;
12 i n p u t [ i _ w i d t h −1:0] a _ f r a c ;
13
14 / / OUTPUTS
15 o u t p u t [ e_width −1:0] y_exp ;
16 o u t p u t [ f_wid th −1:0] y _ f r a c ;
17
18 i n p u t scan_ in0 , scan_en , t e s t _ m o d e ;
19 o u t p u t s c a n _ o u t 0 ;
20
21 / / FRACTION ROUNDING TO NEAREST
22 a s s i g n y _ f r a c =
23 / / ROUND DOWN CASE
I.9 FPU Rounding I-50
24 ( ! a _ f r a c [ 1 ] ) ? a _ f r a c [ i _ w i d t h −1:2] :
25 / / ROUND UP CASE
26 ( a _ f r a c [ 0 ] ) ? a _ f r a c [ i _ w i d t h −1:2] + 1 ’ b1 :
27 / / ROUND TO EVEN CASE
28 ( a _ f r a c [ 2 ] ) ? a _ f r a c [ i _ w i d t h −1:2] + 1 ’ b1 : a _ f r a c [ i _ w i d t h
−1 : 2 ] ;
29
30 / / ADJUST EXPONENT IF ROUND UP RESULTS IN CARRY
31 a s s i g n y_exp =
32 / / CHECK FOR ROUND UP
33 ( ( a _ f r a c [ 1 ] & a _ f r a c [ 0 ] ) | ( a _ f r a c [ 1 ] & ! a _ f r a c [ 0 ] &
a _ f r a c [ 2 ] ) )
34 / / AND CHECK FOR FRACTION OF ALL 1 ’S
35 & ( a _ f r a c [ i _ w i d t h −1:0] == { ( i _ w i d t h ) {1 ’ b1 } } ) ?
36 a_exp + 1 ’ b1 : a_exp ;
37
38 endmodule
I.10 FPU Packing I-51
I.10 FPU Packing
1 module fp_pack (
2 s ign , exp , f r a c , out ,
3 scan_ in0 , scan_en , t e s t_mode , s c a n _ o u t 0 ) ;
4
5 / / SINGLE VS . DOUBLE PRECISION CONSTANTS
6 p a r a m e t e r e_wid th = 8 ;
7 p a r a m e t e r f _ w i d t h = 2 3 ;
8 l o c a l p a r a m wid th = 1 + e_wid th + f _ w i d t h ;
9
10 / / INPUTS
11 i n p u t s i g n ;
12 i n p u t [ e_width −1:0] exp ;
13 i n p u t [ f_wid th −1:0] f r a c ;
14
15 / / OUTPUTS
16 o u t p u t [ width −1:0] o u t ;
17
18 i n p u t scan_ in0 , scan_en , t e s t _ m o d e ;
19 o u t p u t s c a n _ o u t 0 ;
20
21 / / COMBINE COMPONENTS
22 a s s i g n o u t = { s ign , exp , f r a c } ;
23
I.10 FPU Packing I-52
24 endmodule
I.11 FPU Test Frame I-53
I.11 FPU Test Frame
1 ‘ i n c l u d e " uvm_macros . svh "
2 ‘ i n c l u d e " t b _ p o s i t _ f p _ p k g . sv "
3 ‘ i n c l u d e " t b _ p o s i t _ f p _ i n t f . sv "
4
5 module t e s t ;
6 i m p o r t uvm_pkg : : * ;
7 i m p o r t t b _ p o s i t _ f p _ p k g : : * ;
8
9 b i t c l k ;
10 i n p u t _ i f i n ( c l k ) ;
11
12 fpu t o p (
13 ‘ i f n d e f LAYOUT
14 . e_wid th ( ‘ e s ) ,
15 . f _ w i d t h ( ‘ f r a c _ w i d t h ) )
16 ‘ e n d i f
17 . r e s e t ( i n . r e s e t ) ,
18 . c l k ( i n . c l k ) ,
19 . s c a n _ i n 0 ( i n . s c a n _ i n 0 ) ,
20 . scan_en ( i n . s can_en ) ,
21 . t e s t _ m o d e ( i n . t e s t _ m o d e ) ,
22 . s c a n _ o u t 0 ( i n . s c a n _ o u t 0 ) ,
23 . s t a r t ( i n . s t a r t ) ,
I.11 FPU Test Frame I-54
24 . done ( i n . done ) ,
25 . op ( i n . op ) ,
26 . a ( i n . a ) ,
27 . b ( i n . b ) ,
28 . y ( i n . y ) ,
29 . exc ( i n . exc ) ) ;
30
31 i n i t i a l b e g i n
32 $ t i m e f o r m a t ( −9 ,2 , " ns " , 16) ;
33 $se t_cove rage_db_name ( " fpu " ) ;
34 ‘ i f d e f SDFSCAN
35 $ s d f _ a n n o t a t e ( " s d f / fpu_saed32nm_scan . s d f " , t o p ) ;
36 ‘ e n d i f
37
38 uvm_resource_db #( i n p u t _ v i f ) : : s e t ( . s cope ( " i f s " ) , . name ( "
i n p u t _ v i f " ) , . v a l ( i n ) ) ;
39
40 r u n _ t e s t ( ) ;
41 end
42
43 / / 50 MHz c l o c k
44 a lways
45 #10 c l k = ~ c l k ;
46
47 endmodule
I.11 FPU Test Frame I-55
Appendix II
PAU Source Code
II.1 PAU Top Level
1 module pau (
2 c lk , r e s e t , s t a r t , done ,
3 op , a , b , y , exc ,
4 scan_ in0 , scan_en , t e s t_mode , s c a n _ o u t 0 ) ;
5
6 p a r a m e t e r wid th = 3 2 ;
7 p a r a m e t e r e s = 2 ;
8 l o c a l p a r a m r _ w i d t h = $c lo g2 ( wid th ) ;
9 l o c a l p a r a m f _ w i d t h = wid th − es − 3 ;
10 l o c a l p a r a m s _ w i d t h = es + r _ w i d t h ;
11
12 / / OPCODES
13 l o c a l p a r a m op_add = 2 ’ b00 ;
II.1 PAU Top Level II-2
14 l o c a l p a r a m op_sub = 2 ’ b01 ;
15 l o c a l p a r a m op_mul = 2 ’ b10 ;
16 l o c a l p a r a m op_d iv = 2 ’ b11 ;
17
18 / / INPUTS
19 i n p u t c lk , r e s e t , s t a r t ;
20 i n p u t [ 1 : 0 ] op ;
21 i n p u t [ width −1:0] a , b ;
22
23 / / OUTPUTS
24 o u t p u t r e g done , exc ;
25 o u t p u t r e g [ width −1:0] y ;
26
27 / / DFT SCAN PORTS
28 i n p u t scan_ in0 , scan_en , t e s t _ m o d e ;
29 o u t p u t s c a n _ o u t 0 ;
30
31 / / CONTROL SIGNALS
32 r e g [ 2 : 0 ] c t r l _ s c , c t r l _ o p , c t r l _ d e c ;
33
34 / / INTERNAL CONNECTIONS
35 wi re a_s ign , b_s ign , add_s ign , ho ld ;
36 wi r e g_s ign , s _ s i g n , a_gr_b ;
37 wi r e [ r _ w i d t h : 0 ] a_reg , b_ reg ;
38 wi r e [ es −1:0] a_exp , b_exp ;
II.1 PAU Top Level II-3
39 wi r e [ f_wid th −1:0] a _ f r a c , b _ f r a c ;
40 wi r e [ width −1:0] a_abs , b_abs ;
41
42 wi r e [ f_wid th −1:0] g r _ f r a c , sm_f rac ;
43 wi r e [ s _ w i d t h : 0 ] s h i f t , gs f , a s f ;
44 wi r e [ width −1:0] a d d _ r e s u l t ;
45
46 wi r e [ f _ w i d t h : 0 ] ans ;
47
48 / / EXTRACT OPERAND COMPONENTS
49 p o s _ e x t r a c t i o n # ( . wid th ( wid th ) , . e s ( e s ) ) e x t r a c t _ a (
50 . c l k ( c l k ) ,
51 . r e s e t ( r e s e t ) ,
52 . a ( a ) ,
53 . y _ s i g n ( a _ s i g n ) ,
54 . y_ reg ( a _ r e g ) ,
55 . y_exp ( a_exp ) ,
56 . y _ f r a c ( a _ f r a c ) ,
57 . y_abs ( a_abs ) ,
58 . y_exc ( a_exc ) ,
59 . s c a n _ i n 0 ( ) ,
60 . scan_en ( scan_en ) ,
61 . t e s t _ m o d e ( t e s t _ m o d e ) ,
62 . s c a n _ o u t 0 ( ) ) ;
63
II.1 PAU Top Level II-4
64 p o s _ e x t r a c t i o n # ( . wid th ( wid th ) , . e s ( e s ) ) e x t r a c t _ b (
65 . c l k ( c l k ) ,
66 . r e s e t ( r e s e t ) ,
67 . a ( b ) ,
68 . y _ s i g n ( b _ s i g n ) ,
69 . y_ reg ( b_reg ) ,
70 . y_exp ( b_exp ) ,
71 . y _ f r a c ( b _ f r a c ) ,
72 . y_abs ( b_abs ) ,
73 . y_exc ( b_exc ) ,
74 . s c a n _ i n 0 ( ) ,
75 . scan_en ( scan_en ) ,
76 . t e s t _ m o d e ( t e s t _ m o d e ) ,
77 . s c a n _ o u t 0 ( ) ) ;
78
79 / / DETERMINE ADDITION / SUBTRACTION SCALING FACTORS
80 p o s _ s c a l e # ( . w id th ( wid th ) , . e s ( e s ) ) s c a l e r (
81 . c l k ( c l k ) ,
82 . r e s e t ( r e s e t ) ,
83 . en ( c t r l _ s c [ 2 ] ) ,
84 . op ( c t r l _ s c [ 1 : 0 ] ) ,
85 . a _ s i g n ( a _ s i g n ) ,
86 . a _ r e g ( a _ r e g ) ,
87 . a_exp ( a_exp ) ,
88 . a _ f r a c ( a _ f r a c ) ,
II.1 PAU Top Level II-5
89 . a_abs ( a_abs ) ,
90 . b _ s i g n ( b _ s i g n ) ,
91 . b_ reg ( b_reg ) ,
92 . b_exp ( b_exp ) ,
93 . b _ f r a c ( b _ f r a c ) ,
94 . b_abs ( b_abs ) ,
95 . g r _ f r a c ( g r _ f r a c ) ,
96 . sm_f rac ( sm_f rac ) ,
97 . g _ s i g n ( g _ s i g n ) ,
98 . s _ s i g n ( s _ s i g n ) ,
99 . s h i f t ( s h i f t ) ,
100 . g s f ( g s f ) ,
101 . s c a n _ i n 0 ( ) ,
102 . scan_en ( scan_en ) ,
103 . t e s t _ m o d e ( t e s t _ m o d e ) ,
104 . s c a n _ o u t 0 ( ) ) ;
105
106 / / ADD OPERANDS
107 pos_op # ( . wid th ( wid th ) , . e s ( e s ) ) o p e r a t o r (
108 . c l k ( c l k ) ,
109 . r e s e t ( r e s e t ) ,
110 . ho ld ( ho ld ) ,
111 . en ( c t r l _ o p [ 2 ] ) ,
112 . op ( c t r l _ o p [ 1 : 0 ] ) ,
113 . s h i f t ( s h i f t ) ,
II.1 PAU Top Level II-6
114 . g s f ( g s f ) ,
115 . a _ s i g n ( g _ s i g n ) ,
116 . b _ s i g n ( s _ s i g n ) ,
117 . g r _ f r a c ( g r _ f r a c ) ,
118 . sm_f rac ( sm_f rac ) ,
119 . r e s u l t ( a d d _ r e s u l t ) ,
120 . r e s u l t _ s i g n ( a d d _ s i g n ) ,
121 . a s f ( a s f ) ,
122 . s c a n _ i n 0 ( ) ,
123 . scan_en ( scan_en ) ,
124 . t e s t _ m o d e ( t e s t _ m o d e ) ,
125 . s c a n _ o u t 0 ( ) ) ;
126
127 / / DECODE RESULTS
128 pos_decode # ( . wid th ( wid th ) , . e s ( e s ) ) d e c o d e r (
129 . c l k ( c l k ) ,
130 . r e s e t ( r e s e t ) ,
131 . en ( c t r l _ d e c [ 2 ] ) ,
132 . z e r o ( exc ) ,
133 . a s f ( a s f ) ,
134 . f r a c _ s i g n ( a d d _ s i g n ) ,
135 . f r a c ( a d d _ r e s u l t ) ,
136 . ans ( y ) ,
137 . rdy ( done ) ,
138 . s c a n _ i n 0 ( ) ,
II.1 PAU Top Level II-7
139 . scan_en ( scan_en ) ,
140 . t e s t _ m o d e ( t e s t _ m o d e ) ,
141 . s c a n _ o u t 0 ( ) ) ;
142
143 / / CONTROL UNIT
144 a lways @( posedge c l k o r posedge r e s e t ) b e g i n
145 i f ( r e s e t ) b e g i n
146 / / PIPELINE CONTROL
147 c t r l _ s c <= 3 ’ b100 ;
148 c t r l _ o p <= 3 ’ b100 ;
149 c t r l _ d e c <= 3 ’ b100 ;
150 end
151 e l s e b e g i n
152 / / SET CONTROL SIGNALS ON START
153 i f ( s t a r t ) b e g i n
154 c t r l _ s c <= {1 ’ b0 , op [ 1 : 0 ] } ;
155 end
156 e l s e b e g i n
157 c t r l _ s c [ 2 ] <= 1 ’ b1 ;
158 end
159 / / STALL FOR DIVISION
160 i f ( c t r l _ o p [ 2 : 0 ] == 3 ’ b011 && ho ld ) b e g i n
161 c t r l _ d e c [ 2 ] <= 1 ;
162 end
163 / / RUN PIPE
II.1 PAU Top Level II-8
164 e l s e b e g i n
165 c t r l _ o p <= c t r l _ s c ;





II.2 PAU Extraction II-9
II.2 PAU Extraction
1 module p o s _ e x t r a c t i o n (
2 c lk , r e s e t , a ,
3 y_s ign , y_reg , y_exp , y _ f r a c , y_abs , y_exc ,
4 scan_ in0 , scan_en , t e s t_mode , s c a n _ o u t 0 ) ;
5
6 p a r a m e t e r wid th = 3 2 ;
7 p a r a m e t e r e s = 2 ;
8 l o c a l p a r a m f _ w i d t h = wid th − es − 3 ;
9 l o c a l p a r a m r _ w i d t h = $c lo g2 ( wid th ) ;
10
11 / / INPUTS
12 i n p u t c lk , r e s e t ;
13 i n p u t [ width −1:0] a ;
14
15 / / OUTPUTS
16 o u t p u t r e g y_exc , y _ s i g n ;
17 o u t p u t r e g [ r _ w i d t h : 0 ] y_ reg ;
18 o u t p u t r e g [ es −1:0] y_exp ;
19 o u t p u t r e g [ f_wid th −1:0] y _ f r a c ;
20 o u t p u t r e g [ width −1:0] y_abs ;
21
22 / / DFT SCAN PORTS
23 i n p u t scan_ in0 , scan_en , t e s t _ m o d e ;
II.2 PAU Extraction II-10
24 o u t p u t s c a n _ o u t 0 ;
25
26 / / INTERNAL VARIABLES
27 wi re [ width −2:0] twos_comp ;
28 wi r e [ width −2:0] a b s _ r e g ;
29 wi r e [ r _ w i d t h : 0 ] i n d e x ;
30 wi r e [ r_wid th −1:0] z e r o _ c n t ;
31 wi r e [ width −4:0] temp ;
32
33 / / SET INTERNAL VARIABLES
34 a s s i g n twos_comp = ( { ( wid th ) { a [ width −1]}} ^ a [ width −1 :0 ] ) +
a [ width −1];
35 a s s i g n a b s _ r e g = { ( width −1){ twos_comp [ width −2]}} ^
twos_comp [ width −2 : 0 ] ;
36 a s s i g n z e r o _ c n t = wid th − i n d e x − 2 ;
37 a s s i g n temp = twos_comp [ width −4:0] << ( z e r o _ c n t − 1) ;
38
39 / / LEADING ONE DETECTOR
40 f i n d _ f i r s t # ( width −1) r e g _ d e c o d e (
41 . i n ( a b s _ r e g ) ,
42 . i n d e x ( i n d e x ) ,
43 . s c a n _ i n 0 ( ) ,
44 . scan_en ( scan_en ) ,
45 . t e s t _ m o d e ( t e s t _ m o d e ) ,
46 . s c a n _ o u t 0 ( ) ) ;
II.2 PAU Extraction II-11
47
48 / / REGISTER OUTPUTS
49 a lways @( posedge c l k o r posedge r e s e t ) b e g i n
50 i f ( r e s e t ) b e g i n
51 y_exc <= 0 ;
52 y _ s i g n <= 0 ;
53 y_reg <= 0 ;
54 y_exp <= 0 ;
55 y _ f r a c <= 0 ;
56 y_abs <= 0 ;
57 end
58 e l s e b e g i n
59 y_exc <= a [ width −1] & ~ | a [ width −2 : 0 ] ;
60 y _ s i g n <= a [ width −1];
61 y_reg <= ( twos_comp [ width −2]) ? z e r o _ c n t − 1 : −
z e r o _ c n t ;
62 y_exp <= temp [ width −4: f _ w i d t h ] ;
63 y _ f r a c <= temp [ f_wid th −1 : 0 ] ;




II.3 PAU Scaler II-12
II.3 PAU Scaler
1 module p o s _ s c a l e (
2 c lk , r e s e t , en , op ,
3 a_s ign , a_reg , a_exp , a _ f r a c ,
4 b_s ign , b_reg , b_exp , b _ f r a c ,
5 a_abs , b_abs , g_s ign , s _ s i g n ,
6 g r _ f r a c , sm_frac , s h i f t , gs f ,
7 scan_ in0 , scan_en , t e s t_mode , s c a n _ o u t 0 ) ;
8
9 p a r a m e t e r wid th = 3 2 ;
10 p a r a m e t e r e s = 2 ;
11 l o c a l p a r a m r _ w i d t h = $c lo g2 ( wid th ) ;
12 l o c a l p a r a m f _ w i d t h = width−es −3;
13
14 / / INPUTS
15 i n p u t c lk , r e s e t , en , a_s ign , b _ s i g n ;
16 i n p u t [ 1 : 0 ] op ;
17 i n p u t [ r _ w i d t h : 0 ] a_reg , b_ reg ;
18 i n p u t [ es −1:0] a_exp , b_exp ;
19 i n p u t [ f_wid th −1:0] a _ f r a c , b _ f r a c ;
20 i n p u t [ width −1:0] a_abs , b_abs ;
21
22 / / OUTPUTS
23 o u t p u t r e g g_s ign , s _ s i g n ;
II.3 PAU Scaler II-13
24 o u t p u t r e g [ f_wid th −1:0] g r _ f r a c , sm_f rac ;
25 o u t p u t r e g [ e s + r _ w i d t h : 0 ] s h i f t , g s f ;
26
27 / / DFT SCAN PORTS
28 i n p u t scan_ in0 , scan_en , t e s t _ m o d e ;
29 o u t p u t s c a n _ o u t 0 ;
30
31 / / INTERNAL VARIABLES
32 / / w i r e a_gr_b ;
33 wi r e [ e s + r _ w i d t h : 0 ] s f_a , s f_b , s f_a2 , s f _ b 2 ;
34
35 a s s i g n a_gr_b = ( a_abs > b_abs ) ? 1 ’ b1 : 1 ’ b0 ;
36
37 / / GET SIGNED SCALING FACTORS
38 a s s i g n s f _ a = ( a _ r e g << es ) + a_exp ;
39 a s s i g n s f _ b = ( b_reg << es ) + b_exp ;
40
41 / / GET ABSOLUTE SCALING FACTORS
42 a s s i g n s f _ a 2 = ( { ( e s + r _ w i d t h +1) { s f _ a [ e s + r _ w i d t h ] } } ^ s f _ a [ e s
+ r _ w i d t h : 0 ] ) + s f _ a [ e s + r _ w i d t h ] ;
43 a s s i g n s f _ b 2 = ( { ( e s + r _ w i d t h +1) { s f _ b [ e s + r _ w i d t h ] } } ^ s f _ b [ e s
+ r _ w i d t h : 0 ] ) + s f _ b [ e s + r _ w i d t h ] ;
44
45 a lways @( posedge c l k o r posedge r e s e t ) b e g i n
46 / / RESET STATE
II.3 PAU Scaler II-14
47 i f ( r e s e t ) b e g i n
48 g r _ f r a c <= 0 ;
49 sm_f rac <= 0 ;
50 s h i f t <= 0 ;
51 g s f <= 0 ;
52 g _ s i g n <= 0 ;
53 s _ s i g n <= 0 ;
54 end
55 / / OPERATION STATE
56 e l s e i f ( ! en ) b e g i n
57 / / MULTIPLICATION / DIVISION
58 i f ( op [ 1 ] ) b e g i n
59 i f ( op [ 0 ] ) b e g i n
60 g s f <= s f _ a − s f _ b ;
61 end
62 e l s e b e g i n
63 g s f <= s f _ a + s f _ b ;
64 end
65 g r _ f r a c <= a _ f r a c ;
66 sm_f rac <= b _ f r a c ;
67 g _ s i g n <= a _ s i g n ;
68 s _ s i g n <= b _ s i g n ;
69 end
70 / / ADDITION / SUBTRACTION
71 e l s e b e g i n
II.3 PAU Scaler II-15
72 / / CHECK FOR LARGER OPERAND
73 i f ( a_gr_b ) b e g i n
74 / / SET FRACTION SIZES
75 g r _ f r a c <= a _ f r a c ;
76 sm_f rac <= b _ f r a c ;
77 g _ s i g n <= a _ s i g n ;
78 s _ s i g n <= b _ s i g n ;
79 / / SET GREATEST SCALING FACTOR
80 g s f <= s f _ a ;
81 / / PERFORM | SF_A − SF_B |
82 i f ( ! s f _ a [ e s + r _ w i d t h ] & ! s f _ b [ e s + r _ w i d t h ] ) b e g i n
83 s h i f t <= s f _ a − s f _ b ;
84 end
85 e l s e i f ( ! s f _ a [ e s + r _ w i d t h ] & s f _ b [ es + r _ w i d t h ] ) b e g i n
86 s h i f t <= s f _ a 2 + s f _ b 2 ;
87 end
88 e l s e b e g i n
89 s h i f t <= s f _ b 2 − s f _ a 2 ;
90 end
91 end
92 e l s e b e g i n
93 / / SET FRACTION SIZES
94 g r _ f r a c <= b _ f r a c ;
95 sm_f rac <= a _ f r a c ;
96 g _ s i g n <= b _ s i g n ;
II.3 PAU Scaler II-16
97 s _ s i g n <= a _ s i g n ;
98 / / SET GREATEST SCALING FACTOR
99 g s f <= s f _ b ;
100 / / PERFORM | SF_A − SF_B |
101 i f ( ! s f _ a [ e s + r _ w i d t h ] & ! s f _ b [ e s + r _ w i d t h ] ) b e g i n
102 s h i f t <= s f _ b − s f _ a ;
103 end
104 e l s e i f ( s f _ a [ e s + r _ w i d t h ] & ! s f _ b [ e s + r _ w i d t h ] ) b e g i n
105 s h i f t <= s f _ a 2 + s f _ b 2 ;
106 end
107 e l s e b e g i n







II.4 PAU Operator II-17
II.4 PAU Operator
1 module pos_op (
2 c lk , r e s e t , en , op , a_s ign , b_s ign ,
3 g r _ f r a c , sm_frac , s h i f t , gs f ,
4 r e s u l t , r e s u l t _ s i g n , a s f , hold ,
5 scan_ in0 , scan_en , t e s t_mode , s c a n _ o u t 0 ) ;
6
7 p a r a m e t e r wid th = 3 2 ;
8 p a r a m e t e r e s = 2 ;
9 l o c a l p a r a m r _ w i d t h = $c lo g2 ( wid th ) ;
10 l o c a l p a r a m f _ w i d t h = wid th − es −3;
11 l o c a l p a r a m s _ w i d t h = es + r _ w i d t h ;
12 l o c a l p a r a m y_wid th = 2 * wid th ;
13 l o c a l p a r a m i _ w i d t h = $c lo g2 ( y_wid th +1) ;
14 l o c a l p a r a m d_wid th = wid th − es ;
15
16 / / INPUTS
17 i n p u t c lk , r e s e t , en , a_s ign , b _ s i g n ;
18 i n p u t [ 1 : 0 ] op ;
19 i n p u t [ f_wid th −1:0] g r _ f r a c , sm_f rac ;
20 i n p u t [ s _ w i d t h : 0 ] s h i f t , g s f ;
21
22 / / OUTPUTS
23 o u t p u t ho ld ;
II.4 PAU Operator II-18
24 o u t p u t r e g r e s u l t _ s i g n ;
25 o u t p u t r e g [ width −1:0] r e s u l t ;
26 o u t p u t r e g [ s _ w i d t h : 0 ] a s f ;
27
28 / / DFT SCAN PORTS
29 i n p u t scan_ in0 , scan_en , t e s t _ m o d e ;
30 o u t p u t s c a n _ o u t 0 ;
31
32 / / INTERNAL VARIABLES
33 wi re add_a_s ign , a d d _ b _ s i g n ;
34 wi r e [ y_width −1:0] add_a , add_b , a d d _ r e s u l t , mul_y ,
f _ r e s u l t ;
35 wi r e [ i _ w i d t h −1:0] i n d e x ;
36 wi r e [ d_width −1:0] div_n , div_d , div_q , d i v _ r ;
37 wi r e l t _ f l a g , d iv_en , d iv_done ;
38
39 a s s i g n ho ld = ( ( op [ 1 : 0 ] == 2 ’ b11 ) && ! d iv_done ) ;
40
41 / / SHIFT OPERANDS
42 a s s i g n add_a = {2 ’ b01 , g r _ f r a c , { ( y_width−f_wid th −2) {1 ’ b0
} } } ;
43 a s s i g n add_b = {2 ’ b01 , sm_frac , { ( y_width−f_wid th −2) {1 ’ b0
}}} >> s h i f t ;
44 a s s i g n a d d _ b _ s i g n = b _ s i g n ^ op [ 0 ] ;
45
II.4 PAU Operator II-19
46 / / PERFORM OPERATION
47 a s s i g n a d d _ r e s u l t = ( a _ s i g n == a d d _ b _ s i g n ) ?
48 add_a + add_b :
49 add_a − add_b ;
50
51 a s s i g n l t _ f l a g = ( ( g r _ f r a c < sm_f rac ) && (&op [ 1 : 0 ] ) ) ? 1 ’
b1 : 1 ’ b0 ;
52 a s s i g n d iv_n = {1 ’ b1 , g r _ f r a c , 2 ’ b0 } ;
53 a s s i g n d iv_d = ( l t _ f l a g ) ? {2 ’ b01 , sm_frac , 1 ’ b0 } : {1 ’ b1
, sm_frac , 2 ’ b0 } ;
54 a s s i g n d i v_ en = op [ 1 ] && op [ 0 ] && ! d iv_done ;
55
56 a s s i g n f _ r e s u l t = ( ! op [ 1 ] ) ? a d d _ r e s u l t :
57 ( ! op [ 0 ] ) ? mul_y : { div_q , { ( y_width−d_wid th ) {1 ’
b0 } } } ;
58
59 / / LEADING ONE DETECTOR FOR NORMALIZATION
60 f i n d _ f i r s t # ( y_wid th ) norm_add (
61 . i n ( f _ r e s u l t ) ,
62 . i n d e x ( i n d e x ) ,
63 . s c a n _ i n 0 ( ) ,
64 . scan_en ( scan_en ) ,
65 . t e s t _ m o d e ( t e s t _ m o d e ) ,
66 . s c a n _ o u t 0 ( ) ) ;
67
II.4 PAU Operator II-20
68 / / MULTIPLIER
69 v e d i c _ r e c # ( wid th ) mul t (
70 . a ( {1 ’ b1 , sm_frac , { ( width−f_wid th −1) {1 ’ b0 } }} ) ,
71 . b ( {1 ’ b1 , g r _ f r a c , { ( width−f_wid th −1) {1 ’ b0 } }} ) ,
72 . y ( mul_y ) ,
73 . s c a n _ i n 0 ( ) ,
74 . scan_en ( scan_en ) ,
75 . t e s t _ m o d e ( t e s t _ m o d e ) ,
76 . s c a n _ o u t 0 ( ) ) ;
77
78 / / DIVIDER
79 d i v i d e r # ( d_wid th ) d i v (
80 . c l k ( c l k ) ,
81 . r e s e t ( r e s e t ) ,
82 . en ( d i v_ en ) ,
83 . done ( d iv_done ) ,
84 . num ( d iv_n ) ,
85 . den ( d iv_d ) ,
86 . q ( d iv_q ) ,
87 . r ( d i v _ r ) ,
88 . s c a n _ i n 0 ( ) ,
89 . scan_en ( scan_en ) ,
90 . t e s t _ m o d e ( t e s t _ m o d e ) ,
91 . s c a n _ o u t 0 ( ) ) ;
92
II.4 PAU Operator II-21
93
94 a lways @( posedge c l k o r posedge r e s e t ) b e g i n
95 / / RESET STATE
96 i f ( r e s e t ) b e g i n
97 r e s u l t <= 0 ;
98 r e s u l t _ s i g n <= 0 ;
99 a s f <= 0 ;
100 end
101 e l s e i f ( ! en ) b e g i n
102 / / DETERMINE SIGN BASED ON OPERATION
103 i f ( op [ 1 ] ) b e g i n
104 r e s u l t _ s i g n <= a _ s i g n ^ b _ s i g n ;
105 end
106 e l s e b e g i n
107 r e s u l t _ s i g n <= a _ s i g n ;
108 end
109
110 / / NORMALIZE WHEN LEADING ONE IS CARRY
111 i f ( i n d e x == ( y_width −1) ) b e g i n
112 r e s u l t [ width −1:1] <= f _ r e s u l t [ y_width −1:( y_width−wid th
+1) ] ;
113 / / SET STICKY BIT
114 r e s u l t [ 0 ] <= | f _ r e s u l t [ ( y_width−wid th ) : 0 ] ;
115 / / ADJUST SCALING FACTOR
116 a s f <= g s f + 1 ’ b1 − l t _ f l a g ;
II.4 PAU Operator II-22
117 end
118
119 / / WHEN LEADING ONE IS HIDDEN BIT
120 e l s e i f ( i n d e x == ( y_width −2) ) b e g i n
121 r e s u l t [ width −1:1] <= f _ r e s u l t [ y_width −2:( y_width−wid th
) ] ;
122 / / SET STICKY BIT
123 r e s u l t [ 0 ] <= | f _ r e s u l t [ ( y_width−width −1) : 0 ] ;
124 / / KEEP SCALING FACTOR
125 a s f <= g s f − l t _ f l a g ;
126 end
127
128 / / WHEN LEADING ONE IS TO RIGHT OF HIDDEN BIT ( SHIFT LEFT
BY DIFFERENCE )
129 e l s e b e g i n
130 r e s u l t <= f _ r e s u l t >> ( i n d e x − ( y_wid th − wid th
) + 1) ;
131 / / SET STICKY BIT
132 r e s u l t [ 0 ] <= | ( f _ r e s u l t << ( wid th + ( y_wid th −
i n d e x ) ) ) ;
133 / / ADJUST SCALING FACTOR




II.4 PAU Operator II-23
138 endmodule
II.5 PAU Decoder II-24
II.5 PAU Decoder
1 module pos_decode (
2 c lk , r e s e t , en , ze ro , a s f ,
3 f r a c , f r a c _ s i g n , ans , rdy ,
4 scan_ in0 , scan_en , t e s t_mode , s c a n _ o u t 0 ) ;
5
6 p a r a m e t e r wid th = 3 2 ;
7 p a r a m e t e r e s = 2 ;
8 l o c a l p a r a m r _ w i d t h = $c lo g2 ( wid th ) ;
9 l o c a l p a r a m r_max = wid th − 1 − es ;
10 l o c a l p a r a m s _ w i d t h = es + r _ w i d t h ;
11 l o c a l p a r a m f _ w i d t h = wid th − es ;
12 l o c a l p a r a m t _ w i d t h = 2 * wid th ;
13
14 / / INPUTS
15 i n p u t c lk , r e s e t , en , f r a c _ s i g n ;
16 i n p u t [ width −1:0] f r a c ;
17 i n p u t [ s _ w i d t h : 0 ] a s f ;
18
19 / / OUTPUTS
20 o u t p u t r e g rdy , z e r o ;
21 o u t p u t r e g [ width −1:0] ans ;
22
23 / / DFT SCAN PORTS
II.5 PAU Decoder II-25
24 i n p u t scan_ in0 , scan_en , t e s t _ m o d e ;
25 o u t p u t s c a n _ o u t 0 ;
26
27 / / INTERNAL VARIABLES
28 wi re chk_rnd , guard , round , s t i c k y ;
29 wi r e [ 1 : 0 ] r e g _ p r e ;
30 wi r e [ s _ w i d t h : 0 ] a b s _ s f ;
31 wi r e [ es −1:0] exp ;
32 wi r e [ r _ w i d t h : 0 ] regime , r e g _ s h f t ;
33 wi r e [ t _ w i d t h −1:0] tmp_ans , t m p _ s h f t ;
34 wi r e [ width −1:0] r n d _ a n s ;
35
36 / / GET ABSOLUTE SCALING FACTOR
37 a s s i g n a b s _ s f = ( { ( s _ w i d t h +1) { a s f [ s _ w i d t h ] } } ^ a s f [ s _ w i d t h
: 0 ] ) + a s f [ s _ w i d t h ] ;
38
39 / / GET EXPONENT AND REGIME K−VALUE
40 a s s i g n exp = ( a s f [ s _ w i d t h ] ) ? a s f [ es −1:0] : a b s _ s f [ es
−1 : 0 ] ;
41 a s s i g n reg ime = ( a s f [ s _ w i d t h ] ) ? ~( a s f >> es ) + 1 ’ b1 :
a b s _ s f >> es ;
42
43 / / PREP TO SHIFT IN REGIME
44 a s s i g n r e g _ p r e = ( a s f [ s _ w i d t h ] ) ? 2 ’ b01 : 2 ’ b10 ;
II.5 PAU Decoder II-26
45 a s s i g n tmp_ans [ t _ w i d t h −1:0] = { r e g _ p r e , exp , f r a c [ width
−2 :0 ] , { ( r_max ) {1 ’ b0 } } } ;
46
47 / / TAKE CARE OF MAX REGIME CORNER CASE
48 a s s i g n r e g _ s h f t = ( reg ime == { ( r _ w i d t h ) {1 ’ b1 } } ) ? reg ime :
reg ime − 1 ;
49
50 / / SHIFT IN LEADING REGIME VALUE
51 a s s i g n t m p _ s h f t = ( a s f [ s _ w i d t h ] ) ? $ s i g n e d ( tmp_ans ) >>>
r e g _ s h f t : $ s i g n e d ( tmp_ans ) >>> reg ime ;
52
53 / / ASSIGN GRS BITS
54 a s s i g n guard = t m p _ s h f t [ wid th ] ;
55 a s s i g n round = t m p _ s h f t [ width −1];
56 a s s i g n s t i c k y = | t m p _ s h f t [ width −2 : 0 ] ;
57 a s s i g n chk_rnd = ( ( t m p _ s h f t [ wid th +1] & guard ) | ( gua rd & (
round | s t i c k y ) ) ) ;
58
59 / / PERFORM ROUNDING INCLUDING SIGN BIT
60 a s s i g n r n d _ a n s = {1 ’ b0 , t m p _ s h f t [ t _ w i d t h −1: wid th +1]} +
chk_rnd ;
61
62 a lways @( posedge c l k o r posedge r e s e t ) b e g i n
63 / / RESET STATE
64 i f ( r e s e t ) b e g i n
II.5 PAU Decoder II-27
65 ans <= 0 ;
66 z e r o <= 0 ;
67 rdy <= 0 ;
68 end
69 e l s e i f ( ! en ) b e g i n
70 ans <= ( { ( wid th ) { f r a c _ s i g n }} ^ r n d _ a n s ) + f r a c _ s i g n ;
71 z e r o <= ~ | r n d _ a n s ;
72 rdy <= 1 ’ b1 ;
73 end
74 e l s e b e g i n






II.6 PAU Test Frame II-28
II.6 PAU Test Frame
1 ‘ i n c l u d e " uvm_macros . svh "
2 ‘ i n c l u d e " t b _ p o s i t _ f p _ p k g . sv "
3 ‘ i n c l u d e " t b _ p o s i t _ f p _ i n t f . sv "
4
5 module t e s t ;
6 i m p o r t uvm_pkg : : * ;
7 i m p o r t t b _ p o s i t _ f p _ p k g : : * ;
8
9 b i t c l k ;
10 i n p u t _ i f i n ( c l k ) ;
11
12 pau
13 ‘ i f n d e f LAYOUT
14 # ( . wid th ( ‘ w i d t h ) ,
15 . e s ( ‘ e s ) )
16 ‘ e n d i f
17 t o p (
18 . r e s e t ( i n . r e s e t ) ,
19 . c l k ( i n . c l k ) ,
20 . s c a n _ i n 0 ( i n . s c a n _ i n 0 ) ,
21 . scan_en ( i n . s can_en ) ,
22 . t e s t _ m o d e ( i n . t e s t _ m o d e ) ,
23 . s c a n _ o u t 0 ( i n . s c a n _ o u t 0 ) ,
II.6 PAU Test Frame II-29
24 . s t a r t ( i n . s t a r t ) ,
25 . done ( i n . done ) ,
26 . op ( i n . op ) ,
27 . a ( i n . a ) ,
28 . b ( i n . b ) ,
29 . y ( i n . y ) ,
30 . exc ( i n . exc ) ) ;
31
32 i n i t i a l b e g i n
33 $ t i m e f o r m a t ( −9 ,2 , " ns " , 16) ;
34 $se t_cove rage_db_name ( " fpu " ) ;
35 ‘ i f d e f SDFSCAN
36 $ s d f _ a n n o t a t e ( " s d f / pau_saed32nm_scan . s d f " , t o p ) ;
37 ‘ e n d i f
38 uvm_resource_db #( i n p u t _ v i f ) : : s e t ( . s cope ( " i f s " ) , . name ( "
i n p u t _ v i f " ) , . v a l ( i n ) ) ;
39
40 r u n _ t e s t ( ) ;
41 end
42
43 / / 50 MHz c l o c k
44 a lways
45 #10 c l k = ~ c l k ;
46
47 endmodule




1 c l a s s t e s t e x t e n d s u v m _ t e s t ;
2 ‘ u v m _ c o m p o n e n t _ u t i l s ( t e s t )
3
4 env e n v i r ;
5 s e q u e n c e _ i n seq ;
6
7 f u n c t i o n new ( s t r i n g name , uvm_component p a r e n t ) ;
8 s u p e r . new ( name , p a r e n t ) ;
9 e n d f u n c t i o n : new
10
11 f u n c t i o n vo id b u i l d _ p h a s e ( uvm_phase phase ) ;
12 e n v i r = env : : t y p e _ i d : : c r e a t e ( " e n v i r " , t h i s ) ;
13 seq = s e q u e n c e _ i n : : t y p e _ i d : : c r e a t e ( " seq " , t h i s ) ;
III.1 Test Case III-2
14 e n d f u n c t i o n : b u i l d _ p h a s e
15
16 t a s k r u n _ p h a s e ( uvm_phase phase ) ;
17 phase . r a i s e _ o b j e c t i o n ( t h i s ) ;
18 seq . s t a r t ( e n v i r . a g n t . s q r ) ;
19 phase . d r o p _ o b j e c t i o n ( t h i s ) ;
20 e n d t a s k : r u n _ p h a s e
21
22 e n d c l a s s : t e s t
III.2 Test Sequence III-3
III.2 Test Sequence
1 c l a s s s e q u e n c e _ i n e x t e n d s uvm_sequence # ( p a c k e t _ i n ) ;
2 ‘ u v m _ o b j e c t _ u t i l s ( s e q u e n c e _ i n )
3
4 i n t i ;
5
6 f u n c t i o n new ( s t r i n g name=" s e q u e n c e _ i n " ) ;
7 s u p e r . new ( name ) ;
8 e n d f u n c t i o n : new
9
10 t a s k body ;
11 p a c k e t _ i n d a t a _ i n ;
12
13 d a t a _ i n = p a c k e t _ i n : : t y p e _ i d : : c r e a t e ( " d a t a _ i n " ) ;
14 d a t a _ i n = new ( ) ;
15 f o r ( i = 0 ; i < 4 ; i ++) b e g i n
16 r e p e a t ( 1 0 0 0 0 0 ) b e g i n
17 s t a r t _ i t e m ( d a t a _ i n ) ;
18 d a t a _ i n . r andomize ( ) ;
19 d a t a _ i n . op = i ;
20 f i n i s h _ i t e m ( d a t a _ i n ) ;
21 end
22 end
III.2 Test Sequence III-4
23 $ d i s p l a y ( "
/ / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / /
" ) ;
24 $ d i s p l a y ( " T e s t Complete " ) ;
25 $ d i s p l a y ( "
/ / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / /
" ) ;
26 e n d t a s k : body
27 e n d c l a s s : s e q u e n c e _ i n
III.3 Test Input Packet III-5
III.3 Test Input Packet
1 c l a s s p a c k e t _ i n e x t e n d s uvm_sequence_i tem ;
2 ‘ u v m _ o b j e c t _ u t i l s ( p a c k e t _ i n )
3
4 rand l o g i c [ ‘w id th −1:0] a ;
5 r and l o g i c [ ‘w id th −1:0] b ;
6 r and o p c o d e _ t op ;
7
8 ‘ i f d e f FPU
9 / / CONSTRAINED TO NORMALIZED VALUES
10 c o n s t r a i n t a_norm { ( a [ ‘ f r a c _ w i d t h −1:0] == 0) −> ( a [ ‘w id th −2:
‘ f r a c _ w i d t h ] == 0) ; }
11 c o n s t r a i n t b_norm { ( b [ ‘ f r a c _ w i d t h −1:0] == 0) −> ( b [ ‘w id th −2:
‘ f r a c _ w i d t h ] == 0) ; }
12 / / CONSTRAIN TO NOT NaN
13 c o n s t r a i n t a _ v a l i d {(& a [ ‘ f r a c _ w i d t h −1:0] == 1 ’ b1 ) −> (&a [
‘wid th −2: ‘ f r a c _ w i d t h ] == 1 ’ b0 ) ; }
14 c o n s t r a i n t b _ v a l i d {(&b [ ‘ f r a c _ w i d t h −1:0] == 1 ’ b1 ) −> (&b [
‘wid th −2: ‘ f r a c _ w i d t h ] == 1 ’ b0 ) ; }
15 ‘ e n d i f
16
17 f u n c t i o n new ( s t r i n g name = " p a c k e t _ i n " ) ;
18 s u p e r . new ( name ) ;
19 e n d f u n c t i o n : new
III.3 Test Input Packet III-6
20 e n d c l a s s : p a c k e t _ i n
III.4 Test Output Packet III-7
III.4 Test Output Packet
1 c l a s s p a c k e t _ o u t e x t e n d s uvm_sequence_i tem ;
2 ‘ u v m _ o b j e c t _ u t i l s ( p a c k e t _ o u t )
3
4 l o g i c [ ‘w id th −1:0] y ;
5 l o g i c e x c _ f l a g ;
6
7 f u n c t i o n new ( s t r i n g name = " p a c k e t _ o u t " ) ;
8 s u p e r . new ( name ) ;
9 e n d f u n c t i o n : new
10
11 e n d c l a s s : p a c k e t _ o u t
III.5 Test Environment III-8
III.5 Test Environment
1 c l a s s env e x t e n d s uvm_env ;
2 ‘ u v m _ c o m p o n e n t _ u t i l s ( env )
3
4 a g e n t a g n t ;
5 s c o r e b o a r d sb ;
6
7 f u n c t i o n new ( s t r i n g name , uvm_component p a r e n t ) ;
8 s u p e r . new ( name , p a r e n t ) ;
9 e n d f u n c t i o n : new
10
11 v i r t u a l f u n c t i o n vo id b u i l d _ p h a s e ( uvm_phase phase ) ;
12 s u p e r . b u i l d _ p h a s e ( phase ) ;
13 a g n t = a g e n t : : t y p e _ i d : : c r e a t e ( " a g n t " , t h i s ) ;
14 sb = s c o r e b o a r d : : t y p e _ i d : : c r e a t e ( " sb " , t h i s ) ;
15 e n d f u n c t i o n : b u i l d _ p h a s e
16
17 v i r t u a l f u n c t i o n vo id c o n n e c t _ p h a s e ( uvm_phase phase ) ;
18 s u p e r . c o n n e c t _ p h a s e ( phase ) ;
19 a g n t . i t e m _ c o l l e c t e d _ p o r t . c o n n e c t ( sb . o u t p u t _ f i f o .
a n a l y s i s _ e x p o r t ) ;
20 a g n t . i t e m _ g e n e r a t e d _ p o r t . c o n n e c t ( sb . i n p u t _ f i f o .
a n a l y s i s _ e x p o r t ) ;
21 e n d f u n c t i o n : c o n n e c t _ p h a s e
III.5 Test Environment III-9
22
23 e n d c l a s s : env
III.6 Test Agent III-10
III.6 Test Agent
1 c l a s s a g e n t e x t e n d s uvm_agent ;
2 ‘ u v m _ c o m p o n e n t _ u t i l s ( a g e n t )
3
4 s e q u e n c e r s q r ;
5 d r i v e r d rv ;
6 m o n i t o r mon ;
7
8 u v m _ a n a l y s i s _ p o r t # ( p a c k e t _ o u t ) i t e m _ c o l l e c t e d _ p o r t ;
9 u v m _ a n a l y s i s _ p o r t # ( p a c k e t _ i n ) i t e m _ g e n e r a t e d _ p o r t ;
10
11 f u n c t i o n new ( s t r i n g name = " a g e n t " , uvm_component p a r e n t =
n u l l ) ;
12 s u p e r . new ( name , p a r e n t ) ;
13 i t e m _ c o l l e c t e d _ p o r t = new ( " i t e m _ c o l l e c t e d _ p o r t " , t h i s ) ;
14 i t e m _ g e n e r a t e d _ p o r t = new ( " i t e m _ g e n e r a t e d _ p o r t " , t h i s ) ;
15 e n d f u n c t i o n : new
16
17 v i r t u a l f u n c t i o n vo id b u i l d _ p h a s e ( uvm_phase phase ) ;
18 s u p e r . b u i l d _ p h a s e ( phase ) ;
19 mon = m o n i t o r : : t y p e _ i d : : c r e a t e ( "mon" , t h i s ) ;
20 s q r = s e q u e n c e r : : t y p e _ i d : : c r e a t e ( " s q r " , t h i s ) ;
21 drv = d r i v e r : : t y p e _ i d : : c r e a t e ( " d rv " , t h i s ) ;
22 e n d f u n c t i o n : b u i l d _ p h a s e
III.6 Test Agent III-11
23
24 v i r t u a l f u n c t i o n vo id c o n n e c t _ p h a s e ( uvm_phase phase ) ;
25 s u p e r . c o n n e c t _ p h a s e ( phase ) ;
26 mon . i t e m _ c o l l e c t e d _ p o r t . c o n n e c t ( i t e m _ c o l l e c t e d _ p o r t ) ;
27 drv . i t e m _ g e n e r a t e d _ p o r t . c o n n e c t ( i t e m _ g e n e r a t e d _ p o r t ) ;
28 drv . s e q _ i t e m _ p o r t . c o n n e c t ( s q r . s e q _ i t e m _ e x p o r t ) ;
29 e n d f u n c t i o n : c o n n e c t _ p h a s e
30
31 e n d c l a s s : a g e n t
III.7 Test Sequencer III-12
III.7 Test Sequencer
1 c l a s s s e q u e n c e r e x t e n d s uvm_sequencer # ( p a c k e t _ i n ) ;
2 ‘ u v m _ c o m p o n e n t _ u t i l s ( s e q u e n c e r )
3
4 f u n c t i o n new ( s t r i n g name = " s e q u e n c e r " , uvm_component
p a r e n t = n u l l ) ;
5 s u p e r . new ( name , p a r e n t ) ;
6 e n d f u n c t i o n : new
7 e n d c l a s s : s e q u e n c e r
III.8 Test Driver III-13
III.8 Test Driver
1 t y p e d e f v i r t u a l i n p u t _ i f i n p u t _ v i f ;
2
3 c l a s s d r i v e r e x t e n d s uvm_dr ive r # ( p a c k e t _ i n ) ;
4 ‘ u v m _ c o m p o n e n t _ u t i l s ( d r i v e r )
5
6 c o v e r g r o u p in _c ov @( posedge v i f . s t a r t ) ;
7 a _ d a t a : c o v e r p o i n t v i f . a ;
8 b _ d a t a : c o v e r p o i n t v i f . b ;
9 endgroup
10
11 i n p u t _ v i f v i f ;
12 p a c k e t _ i n d r i v e r _ d a t a ;
13 u v m _ a n a l y s i s _ p o r t # ( p a c k e t _ i n ) i t e m _ g e n e r a t e d _ p o r t ;
14 i n t e g e r d o n e _ c n t ;
15
16 f u n c t i o n new ( s t r i n g name = " d r i v e r " , uvm_component p a r e n t =
n u l l ) ;
17 s u p e r . new ( name , p a r e n t ) ;
18 i t e m _ g e n e r a t e d _ p o r t = new ( " i t e m _ g e n e r a t e d _ p o r t " , t h i s ) ;
19 e n d f u n c t i o n : new
20
21 v i r t u a l f u n c t i o n vo id b u i l d _ p h a s e ( uvm_phase phase ) ;
22 s u p e r . b u i l d _ p h a s e ( phase ) ;
III.8 Test Driver III-14
23 void ’ ( uvm_resource_db #( i n p u t _ v i f ) : : read_by_name ( . scope ( "
i f s " ) , . name ( " i n p u t _ v i f " ) , . v a l ( v i f ) ) ) ;
24 d r i v e r _ d a t a = p a c k e t _ i n : : t y p e _ i d : : c r e a t e ( " t r " , t h i s ) ;
25 e n d f u n c t i o n : b u i l d _ p h a s e
26
27 v i r t u a l t a s k r u n _ p h a s e ( uvm_phase phase ) ;
28 s u p e r . r u n _ p h a s e ( phase ) ;
29 i n i t i a l i z e ( ) ;
30 f o r e v e r b e g i n
31 s t a r t _ o p e r a t i o n ( ) ;
32 ho ld ( ) ;
33 end
34 e n d t a s k : r u n _ p h a s e
35
36 v i r t u a l p r o t e c t e d t a s k i n i t i a l i z e ( ) ;
37 v i f . t e s t _ m o d e = 0 ;
38 v i f . s c a n _ i n 0 = 0 ;
39 v i f . s can_en = 0 ;
40 v i f . s t a r t = 0 ;
41 v i f . r e s e t = 0 ;
42 v i f . op = 0 ;
43 v i f . a = 0 ;
44 v i f . b = 0 ;
45 @( negedge v i f . c l k ) ;
46 v i f . r e s e t = 1 ;
III.8 Test Driver III-15
47 @( negedge v i f . c l k ) ;
48 v i f . r e s e t = 0 ;
49 @( negedge v i f . c l k ) ;
50 e n d t a s k : i n i t i a l i z e
51
52 v i r t u a l p r o t e c t e d t a s k s t a r t _ o p e r a t i o n ( ) ;
53 / / i f ( ! v i f . ho ld ) b e g i n
54 s e q _ i t e m _ p o r t . g e t ( d r i v e r _ d a t a ) ;
55 i t e m _ g e n e r a t e d _ p o r t . w r i t e ( d r i v e r _ d a t a ) ;
56 @( posedge v i f . c l k ) ;
57 v i f . a = d r i v e r _ d a t a . a ;
58 v i f . b = d r i v e r _ d a t a . b ;
59 v i f . op = d r i v e r _ d a t a . op ;
60 v i f . s t a r t = 1 ;
61 @( posedge v i f . c l k ) ;
62 v i f . s t a r t = 0 ;
63 / / end
64 e n d t a s k : s t a r t _ o p e r a t i o n
65
66 v i r t u a l p r o t e c t e d t a s k ho ld ( ) ;
67 @( posedge v i f . done ) ;
68 e n d t a s k : ho ld
69
70 e n d c l a s s : d r i v e r
III.9 Test Monitor III-16
III.9 Test Monitor
1 c l a s s m o n i t o r e x t e n d s uvm_monitor ;
2 ‘ u v m _ c o m p o n e n t _ u t i l s ( m o n i t o r )
3
4 i n p u t _ v i f v i f ;
5 p a c k e t _ o u t rx ;
6 u v m _ a n a l y s i s _ p o r t # ( p a c k e t _ o u t ) i t e m _ c o l l e c t e d _ p o r t ;
7
8 f u n c t i o n new ( s t r i n g name , uvm_component p a r e n t ) ;
9 s u p e r . new ( name , p a r e n t ) ;
10 i t e m _ c o l l e c t e d _ p o r t = new ( " i t e m _ c o l l e c t e d _ p o r t " , t h i s ) ;
11 rx = p a c k e t _ o u t : : t y p e _ i d : : c r e a t e ( " rx " , t h i s ) ;
12 e n d f u n c t i o n : new
13
14 v i r t u a l f u n c t i o n vo id b u i l d _ p h a s e ( uvm_phase phase ) ;
15 s u p e r . b u i l d _ p h a s e ( phase ) ;
16 void ’ ( uvm_resource_db #( i n p u t _ v i f ) : : read_by_name ( . scope ( "
i f s " ) , . name ( " i n p u t _ v i f " ) , . v a l ( v i f ) ) ) ;
17 e n d f u n c t i o n : b u i l d _ p h a s e
18
19 v i r t u a l t a s k r u n _ p h a s e ( uvm_phase phase ) ;
20 s u p e r . r u n _ p h a s e ( phase ) ;
21 c o l l e c t _ t r a n s a c t i o n s ( phase ) ;
22 e n d t a s k : r u n _ p h a s e
III.9 Test Monitor III-17
23
24 v i r t u a l t a s k c o l l e c t _ t r a n s a c t i o n s ( uvm_phase phase ) ;
25 w a i t ( v i f . r e s e t == 1) ;
26 @( negedge v i f . r e s e t ) ;
27 f o r e v e r b e g i n
28 @( posedge v i f . done )
29 rx . y = v i f . y ;
30 rx . e x c _ f l a g = v i f . exc ;
31 i t e m _ c o l l e c t e d _ p o r t . w r i t e ( rx ) ;
32 end
33 e n d t a s k : c o l l e c t _ t r a n s a c t i o n s
34
35 e n d c l a s s : m o n i t o r
III.10 Test Scoreboard III-18
III.10 Test Scoreboard
1 c l a s s s c o r e b o a r d e x t e n d s uvm_scoreboard ;
2 ‘ u v m _ c o m p o n e n t _ u t i l s ( s c o r e b o a r d )
3
4 u v m _ t l m _ a n a l y s i s _ f i f o # ( p a c k e t _ o u t ) o u t p u t _ f i f o ;
5 u v m _ t l m _ a n a l y s i s _ f i f o # ( p a c k e t _ i n ) i n p u t _ f i f o ;
6
7 i n p u t _ v i f v i f ;
8 p a c k e t _ i n d u t _ d a t a ;
9 p a c k e t _ o u t d u t _ r e s u l t ;
10
11 l o n g i n t c o u n t ;
12 f l o a t a , b , o u t ;
13 l o g i c o u t _ f l a g ;
14 o p c o d e _ t op ;
15
16 i n t fp ;
17
18 f u n c t i o n new ( s t r i n g name , uvm_component p a r e n t ) ;
19 s u p e r . new ( name , p a r e n t ) ;
20 e n d f u n c t i o n : new
21
22 v i r t u a l f u n c t i o n vo id b u i l d _ p h a s e ( uvm_phase phase ) ;
23 s u p e r . b u i l d _ p h a s e ( phase ) ;
III.10 Test Scoreboard III-19
24 i n p u t _ f i f o = new ( " i n p u t _ f i f o " ) ;
25 o u t p u t _ f i f o = new ( " o u t p u t _ f i f o " ) ;
26 d u t _ d a t a = new ( " d u t _ d a t a " ) ;
27 d u t _ r e s u l t = new ( " d u t _ r e s u l t " ) ;
28 void ’ ( uvm_resource_db #( i n p u t _ v i f ) : : read_by_name ( . scope ( "
i f s " ) , . name ( " i n p u t _ v i f " ) , . v a l ( v i f ) ) ) ;
29 $ d i s p l a y ( "
/ / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / /
" ) ;
30 $ d i s p l a y ( " Beg inn ing T e s t " ) ;
31 $ d i s p l a y ( "
/ / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / /
" ) ;
32 fp = $fopen ( " d a t a . t x t " , "w" ) ;
33 $ f w r i t e ( fp , "%s , %d , %d , 0 , 0 \ n " , ‘SYS , ‘wid th , ‘ e s ) ;
34 e n d f u n c t i o n : b u i l d _ p h a s e
35
36 t a s k r u n _ p h a s e ( uvm_phase phase ) ;
37 s u p e r . r u n _ p h a s e ( phase ) ;
38 f o r e v e r b e g i n
39 g e t _ i n p u t ( phase ) ;
40 g e t _ o u t p u t ( phase ) ;
41 end
42 e n d t a s k : r u n _ p h a s e
43
III.10 Test Scoreboard III-20
44
45 p r o t e c t e d v i r t u a l t a s k g e t _ i n p u t ( uvm_phase phase ) ;
46 i n p u t _ f i f o . g e t ( d u t _ d a t a ) ;
47 phase . r a i s e _ o b j e c t i o n ( t h i s ) ;
48 a = d u t _ d a t a . a ;
49 b = d u t _ d a t a . b ;
50 op = d u t _ d a t a . op ;
51 $ f w r i t e ( fp , "%d , %d , %d , " , a , b , op ) ;
52 e n d t a s k : g e t _ i n p u t
53
54 p r o t e c t e d v i r t u a l t a s k g e t _ o u t p u t ( uvm_phase phase ) ;
55 o u t p u t _ f i f o . g e t ( d u t _ r e s u l t ) ;
56 phase . d r o p _ o b j e c t i o n ( t h i s ) ;
57 o u t = d u t _ r e s u l t . y ;
58 o u t _ f l a g = d u t _ r e s u l t . e x c _ f l a g ;
59 $ f w r i t e ( fp , "%d , %d \ n " , out , o u t _ f l a g ) ;
60 e n d t a s k : g e t _ o u t p u t
61
62 e n d c l a s s : s c o r e b o a r d
III.11 Test Interface III-21
III.11 Test Interface
1 i n t e r f a c e i n p u t _ i f ( i n p u t c l k ) ;
2 v a r l o g i c s c a n _ i n 0 ;
3 v a r l o g i c s c a n _ o u t 0 ;
4 v a r l o g i c scan_en ;
5 v a r l o g i c t e s t _ m o d e ;
6 v a r l o g i c r e s e t ;
7 v a r l o g i c s t a r t ;
8 v a r l o g i c done ;
9 v a r l o g i c exc ;
10 v a r l o g i c [ ‘w id th −1:0] a ;
11 v a r l o g i c [ ‘w id th −1:0] b ;
12 v a r l o g i c [ ‘w id th −1:0] y ;
13 v a r l o g i c [ 1 : 0 ] op ;
14 e n d i n t e r f a c e : i n p u t _ i f
III.12 Test Package III-22
III.12 Test Package
1 package t b _ p o s i t _ f p _ p k g ;
2
3 i m p o r t uvm_pkg : : * ;
4 ‘ i n c l u d e " uvm_macros . svh "
5 ‘ d e f i n e LAYOUT
6 ‘ d e f i n e SINGLE
7 ‘ d e f i n e FPU
8
9 ‘ i f d e f HALF
10 ‘ d e f i n e wid th 16
11 ‘ i f d e f FPU
12 ‘ d e f i n e SYS " fpu "
13 ‘ d e f i n e es 5
14 ‘ e l s e
15 ‘ d e f i n e SYS " pau "
16 ‘ d e f i n e es 1
17 ‘ e n d i f
18 ‘ d e f i n e f r a c _ w i d t h 10
19 ‘ e l s i f SINGLE
20 ‘ d e f i n e wid th 32
21 ‘ i f d e f FPU
22 ‘ d e f i n e SYS " fpu "
23 ‘ d e f i n e es 8
III.12 Test Package III-23
24 ‘ e l s e
25 ‘ d e f i n e SYS " pau "
26 ‘ d e f i n e es 3
27 ‘ e n d i f
28 ‘ d e f i n e f r a c _ w i d t h 23
29 ‘ e l s e
30 ‘ d e f i n e wid th 64
31 ‘ i f d e f FPU
32 ‘ d e f i n e SYS " fpu "
33 ‘ d e f i n e es 11
34 ‘ e l s e
35 ‘ d e f i n e SYS " pau "
36 ‘ d e f i n e es 4
37 ‘ e n d i f
38 ‘ d e f i n e f r a c _ w i d t h 52
39 ‘ e n d i f
40
41 t y p e d e f l o g i c [ ‘w id th −1:0] f l o a t ;
42
43 t y p e d e f enum l o g i c [ 1 : 0 ] {
44 op_add = 2 ’ b00 ,
45 op_sub = 2 ’ b01 ,
46 op_mul = 2 ’ b10 ,
47 op_d iv = 2 ’ b11
48 } o p c o d e _ t ;
III.12 Test Package III-24
49
50 i m p o r t " DPI−C" f u n c t i o n l o n g i n t t i m e _ s t a r t ( ) ;
51 i m p o r t " DPI−C" f u n c t i o n l o n g i n t t i m e _ s t o p ( l o n g i n t ) ;
52 i m p o r t " DPI−C" f u n c t i o n vo id c o n v e r t _ e x e ( l o n g i n t ) ;
53 i m p o r t " DPI−C" f u n c t i o n vo id a l e r t ( i n t ) ;
54
55 ‘ i n c l u d e " t b _ p a c k e t _ i n . sv "
56 ‘ i n c l u d e " t b _ p a c k e t _ o u t . sv "
57 ‘ i n c l u d e " t b _ s e q u e n c e . sv "
58 ‘ i n c l u d e " t b _ s e q u e n c e r . sv "
59 ‘ i n c l u d e " t b _ d r i v e r . sv "
60 ‘ i n c l u d e " t b _ m o n i t o r . sv "
61 ‘ i n c l u d e " t b _ a g e n t . sv "
62 ‘ i n c l u d e " t b _ s c o r e b o a r d . sv "
63 ‘ i n c l u d e " tb _e nv . sv "
64 ‘ i n c l u d e " t e s t . sv "
65




1 t o t a l _ t i m e = t i c ;
2 f p u _ t i m e = t i c ;
3 fpu = a n a l y z e _ f p u ( ) ;
4 t o c ( f p u _ t i m e )
5 pau_ t ime = t i c ;
6 pau = a n a l y z e _ p a u ( ) ;
7 t o c ( pau_ t ime )
8 p l o t _ d a t a ( fpu , pau ) ;
9 t o c ( t o t a l _ t i m e )
10
11 f u n c t i o n r d _ d a t a ( t y p e )
12 f i l e = append ( ’ i c c _ p d / ’ , type , ’ / d a t a . t x t ’ ) ;
13 s t r = r e a d m a t r i x ( f i l e , ’ OutputType ’ , ’ s t r i n g ’ ) ;
IV.1 Analysis Script IV-2
14 s y s = s t r ( 1 , 1 ) ;
15 wid th = do ub l e ( s t r ( 1 , 2 ) ) ;
16 es = do ub l e ( s t r ( 1 , 3 ) ) ;
17 ops = do ub l e ( [ s t r ( 2 : end , 1 ) s t r ( 2 : end , 2 ) ] ) ;
18 r e s = do ub l e ( s t r ( 2 : end , 4 ) ) ;
19 op = do ub le ( s t r ( 2 : end , 3 ) ) ;
20 i f ( t y p e == ’ pau ’ )
21 save ’ p a u _ d a t a . mat ’ s t r s y s wid th es ops r e s op −v7 . 3 ;
22 e l s e
23 save ’ f p u _ d a t a . mat ’ s t r s y s wid th es ops r e s op −v7 . 3 ;
24 end
25 r e t u r n ;
26 end
27
28 f u n c t i o n pos = pos_conv ( x , width , e s )
29 useed = 2 ^ (2 ^ es ) ;
30 c e n t e r = 2 ^ ( wid th − 1) ;
31
32 % Check f o r E x c e p t i o n s
33 i f ( x == 0)
34 pos = 0 ;
35 r e t u r n ;
36 e l s e i f ( x == c e n t e r )
37 pos = 2^(2* wid th ) ;
38 r e t u r n ;
IV.1 Analysis Script IV-3
39 end
40
41 % Check Sign
42 i f ( x < ( c e n t e r + 1) )
43 s = 1 ;
44 e l s e
45 s = −1;
46 x = 2 * c e n t e r − x ;
47 end
48 b = f l i p ( d e 2 b i ( x , width −1) ) ;
49
50 % C a l c u l a t e reg ime
51 i f ( b ( 1 ) == 0)
52 k = 0 ;
53 f o r i = 1 : l e n g t h ( b )
54 i f ( b ( i ) == 0)
55 k = k − 1 ;
56 e l s e
57 b r e a k
58 end
59 end
60 e l s e
61 k = 0 ;
62 f o r i = 2 : l e n g t h ( b )
63 i f ( b ( i ) == 1)
IV.1 Analysis Script IV-4
64 k = k + 1 ;
65 e l s e




70 r e g = useed ^ k ;
71
72 % Corner Case
73 i f ( i == ( width −1) )
74 pos = r e g ;
75 r e t u r n ;
76 end
77
78 % Get Exponent
79 i f ( ( wid th − i ) > es )
80 exp = 2 ^ b i 2 d e ( f l i p ( b ( i +1 : i + es ) ) ) ;
81 e l s e
82 exp = 2 ^ b i 2 d e ( f l i p ( b ( i +1 : end ) ) ) ;
83 end
84
85 % C a l c u l a t e F r a c t i o n
86 f r a c = b ( i +1+ es : end ) ;
87 f r a c _ d = 1 ;
88 f o r i = 1 : l e n g t h ( f r a c )
IV.1 Analysis Script IV-5
89 i f ( f r a c ( i ) == 1)
90 f r a c _ d = f r a c _ d + (2 ^ (− i ) ) ;
91 end
92 end
93 pos = vpa ( s * r e g * exp * f r a c _ d ) ;
94 r e t u r n
95 end
96
97 f u n c t i o n fp = fp_conv ( x , wid th )
98 s w i t c h wid th
99 c a s e 32




104 f u n c t i o n c o n v _ d a t a ( o b j )
105 wid th = o b j . w id th ;
106 es = o b j . e s ;
107 s p l i t s = s i z e ( obj , ’ ops ’ , 1 ) / 10000 ;
108 s y s = o b j . s y s ;
109 op _c n t = s i z e ( obj , ’ ops ’ , 1 ) / 4 ;
110 c_ops = z e r o s ( 1 0 0 0 0 , 2 ) ;
111 c _ r e s = z e r o s ( 1 0 0 0 0 , 1 ) ;
112 f p r i n t f ( ’ \ tComple t ed Data P o i n t s : ’ ) ;
113
IV.1 Analysis Script IV-6
114 f o r i = 0 : ( s p l i t s −1)
115 base = ( i *10000) +1;
116 n e x t = ( i +1) *10000;
117 op_a = o b j . ops ( base : nex t , 1 ) ;
118 op_b = o b j . ops ( base : nex t , 2 ) ;
119 r e s = o b j . r e s ( ba se : nex t , 1 ) ;
120 op = o b j . op ( base , 1 ) +1 ;
121 s t a r t = mod ( base , op _c n t ) ;
122 s t o p = s t a r t + 9999 ;
123
124 i f ( s y s == " pau " )
125 f o r j = 1 :10000
126 c_ops ( j , 1 ) = pos_conv ( op_a ( j ) , width , e s ) ;
127 c_ops ( j , 2 ) = pos_conv ( op_b ( j ) , width , e s ) ;
128 c _ r e s ( j , 1 ) = pos_conv ( r e s ( j ) , width , e s ) ;
129 end
130 e l s e
131 f o r j = 1 :10000
132 c_ops ( j , 1 ) = fp_conv ( op_a ( j ) , w id th ) ;
133 c_ops ( j , 2 ) = fp_conv ( op_b ( j ) , w id th ) ;
134 c _ r e s ( j , 1 ) = fp_conv ( r e s ( j ) , w id th ) ;
135 end
136 end
137 o b j . c_op_a ( s t a r t : s t op , op ) = c_ops ( : , 1 ) ;
138 o b j . c_op_b ( s t a r t : s t op , op ) = c_ops ( : , 2 ) ;
IV.1 Analysis Script IV-7
139 o b j . c _ r e s ( s t a r t : s t op , op ) = c _ r e s ;
140 f p r i n t f ( r epmat ( ’ \ b ’ , 1 , 6 ) ) ;
141 f p r i n t f ( ’%6d ’ , ( i +1) *10000) ;
142 end
143 f p r i n t f ( ’ \ n ’ ) ;
144 c l e a r wid th es s p l i t s s y s op _c n t base n e x t op_a op_b r e s
op c_ops c _ r e s ;
145 r e t u r n ;
146 end
147
148 f u n c t i o n comp ( o b j )
149 e _ r e s = z e r o s ( 1 0 0 0 0 , 1 ) ;
150 op _c n t = s i z e ( obj , ’ ops ’ , 1 ) / 4 ;
151 s p l i t s = op _c n t / 10000 ;
152 o p e r a t i o n s = [ " A d d i t i o n " " S u b t r a c t i o n " " M u l t i p l i c a t i o n " "
D i v i s i o n " ] ;
153
154 f p r i n t f ( ’ \ tComple t ed O p e r a t i o n : ’ ) ;
155
156 f o r i = 1 : 4
157 f o r j = 0 : s p l i t s −1
158 base = ( j *10000) +1;
159 n e x t = ( j +1) *10000;
160 op_a = o b j . c_op_a ( base : nex t , 1 ) ;
161 op_b = o b j . c_op_b ( base : nex t , 1 ) ;
IV.1 Analysis Script IV-8
162
163 f o r k = 1:10000
164 s w i t c h ( i )
165 c a s e 1
166 e _ r e s ( k ) = op_a ( k ) + op_b ( k ) ;
167 c a s e 2
168 e _ r e s ( k ) = op_a ( k ) − op_b ( k ) ;
169 c a s e 3
170 e _ r e s ( k ) = op_a ( k ) * op_b ( k ) ;
171 c a s e 4




176 o b j . e _ r e s ( base : nex t , i ) = e _ r e s ( : ) ;
177 end
178 f p r i n t f ( r epmat ( ’ \ b ’ , 1 , 14) ) ;
179 f p r i n t f ( ’%14s ’ , o p e r a t i o n s ( i ) ) ;
180 end
181 f p r i n t f ( ’ \ n ’ ) ;
182 c l e a r e _ r e s op _c n t s p l i t s o p e r a t i o n s i j ba se n e x t op_a
op_b k
183 r e t u r n ;
184 end
185
IV.1 Analysis Script IV-9
186 f u n c t i o n e r r ( o b j )
187 e r r = z e r o s ( 1 0 0 0 0 , 1 ) ;
188 op _c n t = s i z e ( obj , ’ ops ’ , 1 ) / 4 ;
189 s p l i t s = op _c n t / 10000 ;
190 f p r i n t f ( ’ \ t C a l c u l a t e d E r r o r s : ’ ) ;
191
192 f o r i = 1 : 4
193 f o r j = 0 : s p l i t s −1
194 base = ( j *10000) +1;
195 n e x t = ( j +1) *10000;
196 c _ r e s = o b j . c _ r e s ( base : nex t , i ) ;
197 e _ r e s = o b j . e _ r e s ( base : nex t , i ) ;
198 f o r k = 1:10000
199 e r r ( k ) = −l og10 ( abs ( log10 ( c _ r e s ( k ) / e _ r e s ( k ) ) ) )
;
200 end
201 o b j . e r r ( ba se : nex t , i ) = e r r ( : ) ;
202 f p r i n t f ( r epmat ( ’ \ b ’ , 1 , 6 ) ) ;
203 f p r i n t f ( ’%6d ’ , i * ( j +1) *10000) ;
204 end
205 end
206 f p r i n t f ( ’ \ n ’ ) ;
207 r e t u r n ;
208 end
209
IV.1 Analysis Script IV-10
210 f u n c t i o n pau_ob j = a n a l y z e _ p a u ( )
211 d i s p ( ’ Reading PAU Data ’ ) ;
212 %r d _ d a t a ( ’ pau ’ ) ;
213 pau_ob j = m a t f i l e ( ’ p a u _ d a t a . mat ’ , ’ W r i t a b l e ’ , t r u e ) ;
214
215 d i s p ( ’ C o n v e r t i n g PAU Data ’ ) ;
216 %c o n v _ d a t a ( pau_ob j ) ;
217
218 d i s p ( ’ C a l c u l a t i n g Exac t PAU Data ’ ) ;
219 %comp ( pau_ob j ) ;
220
221 d i s p ( ’ C a l c u l a t i n g PAU E r r o r ’ ) ;
222 %e r r ( pau_ob j ) ;
223
224 %d i s p ( ’PAU A n a l y s i s Complete ’ ) ;
225 r e t u r n ;
226 end
227
228 f u n c t i o n f p u _ o b j = a n a l y z e _ f p u ( )
229 d i s p ( ’ Reading FPU Data ’ ) ;
230 %r d _ d a t a ( ’ fpu ’ ) ;
231 f p u _ o b j = m a t f i l e ( ’ f p u _ d a t a . mat ’ , ’ W r i t a b l e ’ , t r u e ) ;
232
233 d i s p ( ’ C o n v e r t i n g FPU Data ’ ) ;
234 %c o n v _ d a t a ( f p u _ o b j ) ;
IV.1 Analysis Script IV-11
235
236 d i s p ( ’ C a l c u l a t i n g Exac t FPU Data ’ ) ;
237 %comp ( f p u _ o b j ) ;
238
239 d i s p ( ’ C a l c u l a t i n g FPU E r r o r ’ ) ;
240 %e r r ( f p u _ o b j ) ;
241
242 d i s p ( ’FPU A n a l y s i s Complete ’ ) ;
243 r e t u r n ;
244 end
245
246 f u n c t i o n p l o t _ d a t a ( fpu , pau )
247 x = 1 : s i z e ( fpu . e r r , 1 ) ;
248 t i t l e s = [ " A d d i t i o n " " S u b t r a c t i o n " " M u l t i p l i c a t i o n " "
D i v i s i o n " ] ;
249 p a t h = ’ MatthewWagnerGradPaper / f i g s / ’ ;
250 f i g u r e ( 1 )
251 f o r i = 1 : 4
252 f i g u r e ( i ) ;
253 t i t l e ( append ( ’ C o m p u t a t i o n a l E r r o r f o r ’ , t i t l e s ( i ) ) ) ;
254 ho ld on ;
255 y = s o r t r o w s ( abs ( fpu . e r r ( : , i ) ) ) ;
256 p l o t ( x , y , ’ r ’ , ’ DisplayName ’ , ’FPU ’ )
257 y = s o r t r o w s ( abs ( pau . e r r ( : , i ) ) ) ;
258 p l o t ( x , y , ’ g ’ , ’ DisplayName ’ , ’PAU ’ )
IV.1 Analysis Script IV-12
259 l e g e n d ;
260 x l a b e l ( ’ Computa t ion ’ ) ;
261 y l a b e l ( ’ Decimal E r r o r ’ ) ;
262 g r i d on ;
263 s a v e f i g ( append ( pa th , t i t l e s ( i ) , ’ . f i g ’ ) ) ;
264 f = g c f ;
265 e x p o r t g r a p h i c s ( f , append ( pa th , t i t l e s ( i ) , ’ . png ’ ) , ’
R e s o l u t i o n ’ , 300) ;
266 end
267 end
IV.2 Common PAU/FPU Blocks IV-13
IV.2 Common PAU/FPU Blocks
IV.2.1 Leading One Detector
1 module f i n d _ f i r s t (
2 in , index , s can_ in0 , scan_en , t e s t_mode , s c a n _ o u t 0 ) ;
3
4 / / LEADING ONE DETECTOR SIZE CONSTANTS
5 p a r a m e t e r wid th = 8 ;
6 p a r a m e t e r d i r e c t i o n = " l e a d " ;
7 l o c a l p a r a m i n d e x _ w i d t h = $c lo g2 ( wid th ) ;
8
9 / / INPUTS
10 i n p u t [ width −1:0] i n ;
11
12 / / OUTPUTS
13 o u t p u t r e g [ i n d e x _ w i d t h : 0 ] i n d e x ;
14
15 i n p u t scan_ in0 , scan_en , t e s t _ m o d e ;
16 o u t p u t s c a n _ o u t 0 ;
17
18 / / INTERNAL VARIABLES
19 i n t e g e r i ;
20 g e n e r a t e
21 / / COMBINATIONAL BLOCK
22 i f ( d i r e c t i o n == " l e a d " ) b e g i n : l o d
IV.2 Common PAU/FPU Blocks IV-14
23 a lways @* b e g i n
24 / / SET INITIAL /NOT FOUND VALUES
25 i n d e x [ i n d e x _ w i d t h ] = 1 ’ b1 ;
26 i n d e x [ index_wid th −1:0] = { ( i n d e x _ w i d t h ) {1 ’ b0 } } ;
27 / / LOOP THROUGH INPUT VECTOR FROM LSB TO MSB
28 f o r ( i =0 ; i < wid th ; i = i +1) b e g i n
29 / / CHECK IF INPUT BIT IS SET
30 i f ( i n [ i ] ) b e g i n
31 / / CLEAR NOT FOUND BIT
32 i n d e x [ i n d e x _ w i d t h ] = 1 ’ b0 ;
33 / / SET OUTPUT TO CURRENT BIT POSITION





39 e l s e b e g i n : t o d
40 a lways @* b e g i n
41 / / SET INITIAL /NOT FOUND VALUES
42 i n d e x [ i n d e x _ w i d t h ] = 1 ’ b1 ;
43 i n d e x [ index_wid th −1:0] = { ( i n d e x _ w i d t h ) {1 ’ b0 } } ;
44 / / LOOP THROUGH INPUT VECTOR FROM LSB TO MSB
45 f o r ( i =width −1; i >=0; i = i −1) b e g i n
46 / / CHECK IF INPUT BIT IS SET
47 i f ( i n [ i ] ) b e g i n
IV.2 Common PAU/FPU Blocks IV-15
48 / / CLEAR NOT FOUND BIT
49 i n d e x [ i n d e x _ w i d t h ] = 1 ’ b0 ;
50 / / SET OUTPUT TO CURRENT BIT POSITION





56 e n d g e n e r a t e
57 endmodule
IV.2 Common PAU/FPU Blocks IV-16
IV.2.2 Vedic Multiplier
1 module v e d i c _ r e c (
2 a , b , y , s can_ in0 , scan_en , t e s t_mode , s c a n _ o u t 0 ) ;
3
4 / / VEDIC MULTIPLIER SIZING CONSTANTS
5 p a r a m e t e r wid th = 4 ;
6 l o c a l p a r a m i n _ w i d t h = wid th ;
7 l o c a l p a r a m o u t _ w i d t h = 2* wid th ;
8
9 / / INPUTS
10 i n p u t [ width −1:0] a , b ;
11
12 / / OUTPUTS
13 o u t p u t [2* width −1:0] y ;
14
15 i n p u t scan_ in0 , scan_en , t e s t _ m o d e ;
16 o u t p u t s c a n _ o u t 0 ;
17
18 / / RECURSIVE GENERATE
19 g e n e r a t e
20 / / 2X2 MULTIPLIER BASE UNIT
21 i f ( wid th == 2) b e g i n : w_eq_2
22 wi r e [ 3 : 0 ] acc ;
23 a s s i g n acc [ 0 ] = a [ 0 ] & b [ 0 ] ;
IV.2 Common PAU/FPU Blocks IV-17
24 a s s i g n acc [ 1 ] = ( a [ 1 ] & b [ 0 ] ) ^ ( a [ 0 ] & b [ 1 ] ) ;
25 a s s i g n acc [ 2 ] = ( a [ 0 ] & b [ 1 ] ) & ( a [ 1 ] & b [ 0 ] ) ^ ( a [ 1 ]
& b [ 1 ] ) ;
26 a s s i g n acc [ 3 ] = ( a [ 0 ] & b [ 1 ] ) & ( a [ 1 ] & b [ 0 ] ) & ( a [ 1 ]
& b [ 1 ] ) ;
27 a s s i g n y = { acc [ 3 ] , acc [ 2 ] , acc [ 1 ] , acc [ 0 ] } ;
28 end
29
30 / / 3X3 MULTIPLIER BASE UNIT
31 e l s e i f ( wid th == 3) b e g i n : w_eq_3
32 wi r e [ 3 : 0 ] vout_1 , vout_2 , vout_3 , vout_4 , ao u t _2 ;
33 wi r e [ 5 : 0 ] ao u t _1 ;
34
35 / / CREATES 4X4 MULTIPLIER WITH MSB’ S SET TO ZERO
36 v e d i c _ r e c # ( 2 ) m1(
37 . s c a n _ i n 0 ( ) ,
38 . scan_en ( scan_en ) ,
39 . t e s t _ m o d e ( t e s t _ m o d e ) ,
40 . s c a n _ o u t 0 ( ) ,
41 . a ( {1 ’ b0 , a [ 2 ] } ) ,
42 . b ( {1 ’ b0 , b [ 2 ] } ) ,
43 . y ( vou t_1 [ 3 : 0 ] ) ) ;
44 v e d i c _ r e c # ( 2 ) m2(
45 . s c a n _ i n 0 ( ) ,
46 . scan_en ( scan_en ) ,
IV.2 Common PAU/FPU Blocks IV-18
47 . t e s t _ m o d e ( t e s t _ m o d e ) ,
48 . s c a n _ o u t 0 ( ) ,
49 . a ( a [ 1 : 0 ] ) ,
50 . b ( {1 ’ b0 , b [ 2 ] } ) ,
51 . y ( vou t_2 [ 3 : 0 ] ) ) ;
52 v e d i c _ r e c # ( 2 ) m3(
53 . s c a n _ i n 0 ( ) ,
54 . scan_en ( scan_en ) ,
55 . t e s t _ m o d e ( t e s t _ m o d e ) ,
56 . s c a n _ o u t 0 ( ) ,
57 . a ( {1 ’ b0 , a [ 2 ] } ) ,
58 . b ( b [ 1 : 0 ] ) ,
59 . y ( vou t_3 [ 3 : 0 ] ) ) ;
60 v e d i c _ r e c # ( 2 ) m4(
61 . s c a n _ i n 0 ( ) ,
62 . scan_en ( scan_en ) ,
63 . t e s t _ m o d e ( t e s t _ m o d e ) ,
64 . s c a n _ o u t 0 ( ) ,
65 . a ( a [ 1 : 0 ] ) ,
66 . b ( b [ 1 : 0 ] ) ,
67 . y ( vou t_4 [ 3 : 0 ] ) ) ;
68
69 / / ADD CROSS PRODUCTS
70 a s s i g n ao u t _1 = { vou t_1 [ 3 : 0 ] , 2 ’ b0 } + {2 ’ b0 , vou t_2
[ 3 : 0 ] } ;
IV.2 Common PAU/FPU Blocks IV-19
71 a s s i g n ao u t _2 = vout_3 [ 3 : 0 ] + {2 ’ b0 , vou t_4
[ 3 : 2 ] } ;
72 a s s i g n y = { ao u t _1 [ 3 : 0 ] + ao u t _2 [ 3 : 0 ] , vou t_4 [ 1 : 0 ] } ;
73 end
74
75 / / RECURSIVE MULTIPLIER
76 e l s e b e g i n : w_gt_3
77 wi r e [ i n_wid th −1:0] vout_1 , vout_2 , vout_3 , vout_4 ,
ao u t _2 ;
78 wi r e [ i n _ w i d t h +( i n _ w i d t h / 2 ) −1:0] a ou t _1 ;
79
80 / / CREATES HALF−WIDTH MULTIPLIERS
81 v e d i c _ r e c # ( wid th >> 1) m1(
82 . s c a n _ i n 0 ( ) ,
83 . scan_en ( scan_en ) ,
84 . t e s t _ m o d e ( t e s t _ m o d e ) ,
85 . s c a n _ o u t 0 ( ) ,
86 . a ( a [ i n_wid th −1: in_wid th > >1]) ,
87 . b ( b [ in_wid th −1: in_wid th > >1]) ,
88 . y ( vou t_1 [ in_wid th −1 :0 ] ) ) ;
89 v e d i c _ r e c # ( wid th >> 1) m2(
90 . s c a n _ i n 0 ( ) ,
91 . scan_en ( scan_en ) ,
92 . t e s t _ m o d e ( t e s t _ m o d e ) ,
93 . s c a n _ o u t 0 ( ) ,
IV.2 Common PAU/FPU Blocks IV-20
94 . a ( a [ ( i n_wid th > >1) −1:0] ) ,
95 . b ( b [ in_wid th −1: in_wid th > >1]) ,
96 . y ( vou t_2 [ in_wid th −1 :0 ] ) ) ;
97 v e d i c _ r e c # ( wid th >> 1) m3(
98 . s c a n _ i n 0 ( ) ,
99 . scan_en ( scan_en ) ,
100 . t e s t _ m o d e ( t e s t _ m o d e ) ,
101 . s c a n _ o u t 0 ( ) ,
102 . a ( a [ i n_wid th −1: in_wid th > >1]) ,
103 . b ( b [ ( i n_wid th > >1) −1:0] ) ,
104 . y ( vou t_3 [ in_wid th −1 :0 ] ) ) ;
105 v e d i c _ r e c # ( wid th >> 1) m4(
106 . s c a n _ i n 0 ( ) ,
107 . scan_en ( scan_en ) ,
108 . t e s t _ m o d e ( t e s t _ m o d e ) ,
109 . s c a n _ o u t 0 ( ) ,
110 . a ( a [ ( i n_wid th > >1) −1:0] ) ,
111 . b ( b [ ( i n_wid th > >1) −1:0] ) ,
112 . y ( vou t_4 [ in_wid th −1 :0 ] ) ) ;
113
114 / / ADD CROSS PRODUCTS
115 a s s i g n ao u t _1 =
116 { vou t_1 [ in_wid th −1 :0 ] , { ( i n_wid th > >1) {1 ’ b0 }}} +
117 { { ( in_wid th > >1) {1 ’ b0 }} , vou t_2 [ in_wid th −1 : 0 ] } ;
118
IV.2 Common PAU/FPU Blocks IV-21
119 a s s i g n ao u t _2 =
120 vou t_3 [ in_wid th −1:0] +
121 { { ( in_wid th > >1) {1 ’ b0 }} , vou t_4 [ in_wid th −1:( i n_wid th
> >1) ] } ;
122
123 a s s i g n y =
124 { ao u t _1 [ i n _ w i d t h +( i n _ w i d t h / 2 ) −1:0] + a ou t _2 [ in_wid th
−1 :0 ] , vou t_4 [ ( i n_wid th > >1) −1 :0 ]} ;
125 end
126 e n d g e n e r a t e
127 endmodule
IV.2 Common PAU/FPU Blocks IV-22
IV.2.3 Integer Divider
1 module d i v i d e r (
2 c lk , r e s e t , en , done , num , den , q , r ,
3 scan_ in0 , scan_en , t e s t_mode , s c a n _ o u t 0 ) ;
4
5 / / DIVIDER SIZING PARAMETER
6 p a r a m e t e r wid th = 2 5 ;
7
8 / / INPUTS
9 i n p u t c lk , r e s e t , en ;
10 i n p u t [ width −1:0] num , den ;
11
12 / / OUTPUTS
13 o u t p u t r e g done ;
14 o u t p u t r e g [ width −1:0] q , r ;
15
16 i n p u t scan_ in0 , scan_en , t e s t _ m o d e ;
17 o u t p u t s c a n _ o u t 0 ;
18
19 / / LOOP COUNTER
20 i n t e g e r c n t ;
21
22 / / INTERNAL VARIABLES
23 r e g [ width −1:0] d ;
IV.2 Common PAU/FPU Blocks IV-23
24 wi r e [ wid th : 0 ] s ;
25
26 a s s i g n s = {1 ’ b0 , r [ width −1:0]} − {1 ’ b0 , d [ width −1 : 0 ] } ;
27
28 a lways @( posedge c l k o r posedge r e s e t ) b e g i n
29 i f ( r e s e t ) b e g i n
30 d <= 0 ;
31 q <= 0 ;
32 r <= 0 ;
33 c n t <= 0 ;
34 done <= 0 ;
35 end
36 e l s e i f ( ! en ) b e g i n
37 d <= 0 ;
38 q <= 0 ;
39 r <= 0 ;
40 c n t <= 0 ;
41 done <= 0 ;
42 end
43 e l s e b e g i n
44 i f ( ! c n t ) b e g i n
45 r <= num [ width −1 : 0 ] ;
46 q <= 0 ;
47 d <= den ;
48 c n t <= c n t + 1 ;
IV.2 Common PAU/FPU Blocks IV-24
49 end
50 e l s e i f ( c n t < wid th ) b e g i n
51 i f ( r == 0) b e g i n
52 done <= 1 ;
53 c n t <= wid th ;
54 end
55 e l s e i f ( s [ wid th ] ) b e g i n
56 r [ width −1:0] <= { r [ width −2 :0 ] , q [ width −1]};
57 q [ width −1:0] <= q [ width −1:0] << 1 ;
58 c n t <= c n t + 1 ;
59 end
60 e l s e b e g i n
61 r [ width −1:0] <= { s [ width −2 :0 ] , q [ width −1]};
62 q [ width −1:0] <= {q [ width −2 :0 ] , 1 ’ b1 } ;
63 c n t <= c n t + 1 ;
64 end
65 end
66 e l s e b e g i n
67 done <= 1 ;
68 end
69 end
70 end
71 endmodule
