Floating Point Unit is one of the integral unit in the Advanced Processors. The arithmetic operations on floating point unit are quite complicated. They are represented in IEEE 754 format in either 32-bit format (single precision) or 64-bit format (double precision). They are extensively used in high end processors for various applications such as mathematical analysis and formulation, signal processing etc. This paper describes the detailed process for the computation of addition, subtraction and multiplication operations on floating point numbers. It has been designed using VHDL. The design has been simulated and synthesized to identify the area occupied and its performance in terms of delay.
I. INTRODUCTION
The real numbers may be described informally as numbers that can be given by an infinite decimal representation, such as 2.48717733398724433.... The real numbers include both rational numbers, such as 56 and −23/129, and irrational numbers, such as π and the square root of 2, and can be represented as points along an infinitely long number line. They can have fixed point as well as floating point representation. Computation of floating point numbers needs advanced processing techniques. Advanced processors have dedicated floating point processor unit which is capable of performing arithmetic operations on real numbers with single precision (32-bits format) [1] or double precision(64-bits format). Floating point notation is represented in the form as follows [2] : n = b e * m where, n = the number to be represented b = base m = mantissa
Value of b is '2' for binary numbers '8' for octal numbers '10' for decimal numbers '16' for hexadecimal numbers
In floating point arithmetic [3] user can round off the results of the computations as per his requirement but IEEE standard 754 defines the rules that lead to the same result of computation by rounding off. It prevents the existence of different results in different computations for the same input.
Single precision 32 -bit floating point format 32-bit floating point representation as per IEEE follows the standard shown in fig. 1 
Fig. 1: IEEE 754 standard for single precision representation
The most significant bit starts from the left. The number represented in the single precision format is Value = (-1)s 2 e × 1.f (normalized) when E > 0 else = (-1)s 2 -126 × 0.f (denormalized) where, f = (b23 -1 +b22 -2 + bi n +…+b0 -23 ) where bi n =1 or 0 s = sign (0 is positive; 1 is negative) E = biased exponent; Emax=255 , Emin=0. E=255 and E=0 are used to represent special values. e = unbiased exponent; e = E -127(bias)
A bias of 127 is added to the actual exponent to make negative exponents possible without using a sign bit. So for example if the value 105 is stored in the exponent placeholder, the exponent is actually -22 (105 -127). Also, the leading fraction bit before the decimal point is actually implicit and can be 1 or 0 depending on the exponent and therefore saving one bit. After the arithmetic computation of a number it is required to RESEARCH ARTICLE OPEN ACCESS 
Exceptions in floating point Unit
Various exceptions are defined by IEEE standard 754 which helps in implementing the arithmetic at the hardware level [5] . These exceptions are listed below:
Invalid operations: Some arithmetic operations are invalid, such as a division by zero or square root of a negative number. The result of an invalid operation shall be a NaN. There are two types of NaN, quiet NaN (QNaN) and signaling NaN (SNaN). They have the following format, where s is the sign bit: QNaN = s 11111111 10000000000000000000000 SNaN = s 11111111 00000000000000000000001
Division by Zero
The division of a number (except zero) by zero gives infinity as a result. However, other arithmetic operations such as addition or multiplication may also give infinity as a result. The addition or multiplication of two numbers may also give infinity as a result. Therefore, to differentiate between the two cases, a divide-by-zero exception was implemented. Other exceptions that are defined by the IEEE standard are listed as Inexact, underflow, overflow, infinity and zero. Different rounding modes used are Round to nearest even, Round-tozero, Round-up and Round-Down.
II. ARITHMETIC OPERATIONS
Addition / Subtractions [4] : A similar procedure is to be followed for the implementation of addition and subtraction. Hence, a single unit is used for these operations. Table 1 shows an example of two operands considered for the computation. Subtraction:-For subtraction similar procedure is followed accept that the fraction part is subtracted. Fig. 2 . Shows the flow chart for addition and subtraction of floating point numbers using floating point arithmetic.
Fig. 2: Flow chart for addition / subtraction

Multiplication:
A separate block is to be provided for the multiplication of the floating point numbers.
Steps followed for the process of multiplication are as follows: . Hidden bit is '1' and three zeros added at the end that help to prevent the loss of data during rounding and shifting. 3. Checks which exponent is larger and finds there difference. 4. Sends larger exponent as the output. 5. Shifts the fraction part of the exponent to the right with the smaller exponent. 6. Sends the fractions to add / sub unit. Fig. 6 shows the pre-normalize unit. Also, inputs/outputs are described in Table 2 . Fig. 8 . Table 4 describes the inputs/ outputs of post-normalization unit.
It performs the normalization as follows 1. It counts the leading number of zeros in the fraction part starting from the hidden bit. 2. Decrements the exponent by the same number of bits. 3. Left shift the fraction by the same number of bits 4. Makes the hidden bit finally '1' 5. Takes the rounding mode decision depending on mode_in_fpu input to the main entity and performs rounding off of the fraction part. 6. The fraction part is truncated. 7. Sends the outputs to the exception unit. 8. Checks if any data has been lost during rounding. 
Fig. 8: Post Normalization Unit
Pre-Normalize unit for multiplication:
The input operands to the FPU main entity are fed to the pre-normalization unit shown in Fig. 9 and are described in 
Multiplication Unit
It takes the fraction part from pre-normalize unit for multiplication unit as the output and gives the product as the output as shown in Fig. 10 and is described in Table 6 . 
Post-normalization unit for multiplication
It is shown in Fig. 11 and is described in Table 7 . It performs the following functions:-1. Count the number of zeros starting from the left. 2. Decrements the value of exponent accordingly 3. Shifting the fraction part to the left by the number of zeros. 4. Rounding the result depending on the mode_in_fpu signal of the FPU unit. 5. Truncates the fraction part 6. Also, checks if there is any loss of data. 7. Sends the sign bit, exponent and fraction part and information about the loss of data to the exception unit. Inputs to FPU Two 32-bits operands in IEEE-754 floating point format along with the opcode, rounding mode select and clock are given as inputs to FPU. Table below shows the inputs and their functioning.
Fig. 12 Exception Unit
FPU Unit
Interface of top level entity is shown in Fig. 12 and are described in Table 9.1 and Table 9 .2. 
IV. SIMULATION RESULTS
The design has been simulated and synthesized on Xilinx 13.1 ISE Design Suite. It has been synthesized on Vitex 5 FPGA module. Fig.  13.a, b, c, d shows the simulated waveform for the pre-normalized, addition / subtraction and postnormalized units, output of addition from FPU top module. Table 10 shows the hardware requirement of the design. 
V. CONCLUSION
In this paper, floating point unit has been designed, simulated and then synthesized in order to obtain its performance in terms of the area occupied and delay on Vitex 5 FPGA Module. For the data path opb_in_fpu to opb_in_sig_0 total combinational logic delay and routing delay is 1.154ns and total overflow to overflow delay is 3.259ns. Hardware requirements have also been specified in the paper. Prenormalization and postnormalization units of the FPU can be further optimized to reduce the hardware requirement as well as delay.
