AbstractÐTraditional computer systems often suffer from roundoff error and catastrophic cancellation in floating point computations. These systems produce apparently high precision results with little or no indication of the accuracy. This paper presents hardware designs, arithmetic algorithms, and software support for a family of variable-precision, interval arithmetic processors. These processors give the programmer the ability to detect and, if desired, to correct implicit errors in finite precision numerical computations. They also provide the ability to solve problems that cannot be solved efficiently using traditional floating point computations. Execution time estimates indicate that these processors are two to three orders of magnitude faster than software packages that provide similar functionality.
INTRODUCTION
T HE number of transistors on a processor doubles approximately every 18 months, which corresponds to an annual increase of around 60 percent [1] . The rapid increase in transistor count gives processor hardware the ability to perform operations that previously were only supported in software. An example of this is found in recent processors that feature hardware and instruction set support for operations commonly used in multimedia applications [2] , [3] .
In many applications, such as scientific computing, the large number of arithmetic operations and the reliance placed on the results make it extremely important to provide accurate and reliable numerical computations. Unfortunately, roundoff error and catastrophic cancellation can quickly lead to results that are completely inaccurate [4] , [5] . Consequently, an efficient method is needed for monitoring and controlling errors in floating point computations. Most computer systems, however, do not provide this capability. Furthermore, as the complexity of applications and the number of floating point operations increase, traditional methods for error analysis become infeasible.
Interval arithmetic provides an efficient method for monitoring errors in floating point computations, by producing two values for each result [6] . The two values correspond to the lower and upper endpoints of an interval such that the true result is guaranteed to lie on this interval. The width of the interval, i.e., the distance between the two endpoints, indicates the accuracy of the result. In addition to monitoring roundoff errors, interval arithmetic can also be used to monitor the effects of approximation errors and errors that occur due to nonexact inputs [7] , [8] .
When performing interval arithmetic on a computer, one or both of the interval endpoints may be nonrepresentable. In this case, the interval endpoints are computed by outward rounding. Outward rounding requires that the lower endpoint be rounded toward negative infinity and the upper endpoint be rounded toward positive infinity. Outward rounding ensures that the resulting interval encloses the true result. Although naive application of interval arithmetic can lead to wide intervals, efficient algorithms that produce intervals with narrow widths have recently been developed [9] , [10] .
Variable-precision, interval arithmetic improves the accuracy and efficiency of conventional interval arithmetic. With variable-precision, interval arithmetic, each interval endpoint is represented using a variable-precision floating point number. This allows the precision of the computation to be varied based on the problem to be solved and the required accuracy of the results. Thus, problems that are numerically stable and require low accuracy can use low precision arithmetic, which is relatively fast, while problems that are numerically unstable or require high accuracy can use higher precision arithmetic, which is slower. Variable-precision, interval arithmetic can help ensure that intervals remain narrow, even for problems that are numerically unstable [11] .
In addition to its ability to monitor and control errors in numerical computations, variable-precision, interval arithmetic can also be used to solve problems that cannot be efficiently solved using traditional floating point computations. It has been extremely successful in solving systems of nonlinear equations, determining eigenvalues and eigenvectors of matrices, finding roots of functions, and performing global optimization. These and other applications of interval arithmetic and variable-precision, interval arithmetic are discussed in [10] , [12] .
To overcome the numerical limitations of existing computer systems, several software implementations for accurate and reliable arithmetic have been developed [13] . These tools include variable-precision interval arithmetic packages [14] , interval arithmetic libraries [15] , intervalenhanced compilers [16] , and language extensions for scientific computations [17] , [18] , [19] . The main disadvantage of software implementations is their slow speed. They incur tremendous overhead due to function calls, memory management, error and range checking, changing rounding modes, and exception handling. Typically, software implementations are one to four orders of magnitude slower than functionally equivalent floating point implementations [13] .
Hardware implementations help overcome the speed disadvantage of software realizations [13] . Previous research in this area includes the design of accurate dot product processors [20] , [21] , functional units for rational arithmetic [22] , variable-precision integer processors [23] , and variable-precision floating point processors [24] . Although these processors help improve the accuracy of the computation, they do not provide an efficient method to determine how accurate the results are. More recently, combined interval and floating point units have been developed [25] , [26] . These units provide high-speed interval arithmetic operations, but they do not allow the precision of the operands to vary. This paper presents a family of variable-precision, interval arithmetic processors. These processors allow the programmer to specify the precision of the computation, determine the accuracy of the results, and recompute inaccurate results with higher precision. Section 2 discusses the hardware design and implementation of three variableprecision, interval arithmetic processors. Section 3 presents the arithmetic algorithms employed by the processors. Section 4 discusses the software interface to the processors and gives a sample variable-precision interval arithmetic program. Conclusions are presented in Section 5. This paper is an extension of the research presented in [27] , [28] . More detailed information about the processors is given in [29] .
PROCESSOR ARCHITECTURE
This section gives an overview of the number representation and hardware design for the variable-precision, interval arithmetic processors (VPIAPs). The hardware is designed to handle the common case quickly while still providing correct results and acceptable performance when extremely high precision is required. In the following discussion, the data path size (i.e., the number of bits per word in the significand of the variable-precision numbers) is denoted by m and a VPIAP with a data path of m bits is referred to as an m-bit VPIAP. For the family of processors presented in this paper, m 16, 32, or 64.
Number Representation
The format for variable-precision numbers is shown in Fig. 1 . Intervals are represented by two variable-precision numbers that correspond to the interval endpoints. Each variable-precision number consists of a 16-bit exponent field (E), a sign bit (S), a 2-bit type field (T ), a 5-bit significand length field (L), and a significand (F ) composed of L 1 significand words (F 0 to F L). The exponent, sign, type, and length fields make up the header word of the variable-precision number. The exponent is represented as a two's complement integer. The sign bit is zero if the number is positive and one if it is negative. The type field indicates if a number is normalized, infinite, zero, or not-anumber. The length field specifies the number of m-bit words in the significand. The words of the significand are stored from most significant F 0 to least significant F L. The significand is normalized between 1/2 and 1. The value of a variable-precision floating point number is [30] is shown in Fig. 2 . IEEE double-precision numbers consist of a sign-bit (S), an 11-bit exponent (E), and a 52-bit significand (F ). The exponent is represented with a bias of 1,023. A normalized IEEE double-precision number has a significand between 1 and 2 and uses a hidden one to represent the leading bit. The value of a normalized IEEE double-precision number is DP À1 S Â 1:F Â 2 EÀ1;023 . Normalized IEEE double-precision floating point numbers have a precision of 53 bits and their range is approximately 2 À1;022 ; 2 1;024 % 10 À307 ; 10 308 .
Hardware Design
A block diagram of the data path of the variable-precision, interval arithmetic processor is shown in Fig. 3 . The significand and header word data paths are depicted as bold and plain lines, respectively. The main functional units of the processor are the register file, an m-bit by m-bit multiplier, a 2m-bit adder, a 2m-bit operand selector, a 64-word by m-bit long accumulator, a 2m-bit shifter, a 24-bit header word generator, and a 64-bit memory interface unit.
A detailed description of each of the functional units and the processor's instruction set architecture is given in [29] . The register file consists of two memory units: a 64-word by 24-bit header memory and a 256-word by m-bit significand memory. The header word memory stores the exponent, sign, type, and length of the variable-precision numbers. The significand memory stores the significand words of the variable-precision numbers from most significant to least significant. The significand memory has two read ports and two write ports and the header memory has two read ports and one write port. The register file allows up to 64 variable-precision numbers or 32 variable-precision intervals to be stored.
Significand words that are read from the register file go into the operand selector or the multiplier. The operand selector determines which values go into the adder, the shifter, and the memory interface unit. The multiplier takes two m-bit significand words as inputs and computes their 2m-bit product. The adder takes two 2m-bit numbers and a carry-in bit as inputs and produces a 2m-bit sum and a carry-out bit. The shifter takes a 2m-bit number and shifts it by up to m À 1 bits to the right or to the left. It contains intermediate storage to allow variable-precision numbers to be shifted.
The long accumulator stores intermediate variableprecision results. It functions as a long fixed point register that limits roundoff error in variable-precision arithmetic operations. The long accumulator consists of a 64-word by m-bit dual-port RAM, and rounding and normalization logic. Temporary variable-precision values are stored in the dual-port-RAM, which has one read port and one write port. Values are written to the RAM from the adder. Values read from the RAM either go directly into the register file or are fed back into the operand selector.
The header word generator forms the header words of the variable-precision numbers. It consists of an exponent adder and logic for producing the type, sign, and length of the variable-precision result. It also allows two variableprecision numbers to be compared, based on their exponents, signs, and types. Outputs from the header word generator go to the header word memory or to the memory interface unit.
The memory interface unit reads data and instructions from memory and writes data to memory. It also performs conversions between IEEE double-precision numbers and variable-precision floating point numbers. Outputs from the memory interface unit go to the header word generator, the adder, or memory.
When an IEEE double-precision number is converted to a variable-precision number, the exponent and significand are checked to see if the floating point number is zero, nota-number, infinity, normalized, or denormalized. This information is used to set the type field of the variableprecision number. If the number is normalized or denormalized, the exponent and significand of the variableprecision number are set based on the exponent and significand of the IEEE double-precision number, as described in [29] . A similar process is used to convert a variable-precision number to an IEEE double-precision number [29] . If the variable-precision number cannot be exactly represented as an IEEE double-precision number, it is rounded based on bits in the processor status and control register. In this case, it is also necessary to test for overflow and underflow.
Area, Delay, and Transistor Count Estimates
This section gives area, delay, and transistor count estimates for three VPIAPs with data paths of 16, 32, and 64 bits. Although these estimates are for a particular technology, they are useful in comparing the three VPIAPs to one another and determining which hardware components are likely to contribute most to the area and critical delay path of the VPIAPs. Table 1 gives the hardware requirements for the three VPIAPs. For each functional unit, the number of words is denoted by w and the number of bits is denoted by b. For example, the long accumulator for the 32-bit VPIAP consists of 64 words, each of which is 32 bits.
Area estimates are given in Table 2 , based on data from a 1.0 micron CMOS standard cell library [31] . The estimates for the multiplier assume that multiplication is implemented using a Reduced Area Multiplier [32] , [33] . The area of each functional unit is estimated by calculating the total size of the macrocells, e.g., AND gates, full adders, half adders, etc., that make up the functional unit and then adding an additional 50 percent for internal wiring. The total area is estimated as the sum of the functional unit areas plus an additional 60 percent for buses and unused space. The total estimated areas for the 16-bit, 32-bit, and 64-bit VPIAPs are 58 mm 2 , 102 mm 2 , and 212 mm 2 , respectively. In comparison, an IEEE double-precision processor in the same technology has an estimated area of 101 mm 2 [28] . Delay and cycle time estimates are given in Table 3 . The delay of each functional unit is computed by taking the worst case delay of the critical path and adding 25 percent for process variations and clock skew. The multipliers use a two cycle pipeline. In the first cycle, the partial products are generated and reduced to two numbers. In the second cycle, these two numbers are added together to produce the product. Since each design is pipelined, the cycle time is the sum of the multiplier reduction delay and the latch delay. The cycle times for the 16-bit, 32-bit, and 64-bit VPIAPs are 12.5 ns, 16.0 ns, and 22.0 ns, respectively. An IEEE doubleprecision processor in the same technology has an estimated cycle time of 20.0 ns [28] .
Transistor count estimates for the VPIAPs are shown in Table 4 . These values are calculated from the macrocell schematics and related data given in [31] . The total number of transistors for the 16-bit, 32-bit, and 64-bit VPIAPs are 122,734, 211,252, and 437,512, respectively. An IEEE doubleprecision processor in the same technology with the hardware components described in [28] requires 205,466 transistors. Relative to the total number of transistors on modern microprocessors, this is a small percentage. For example, the PowerPC 620 and the Digital 21164 described in [1] have approximately 6.9 and 9.3 million transistors, respectively. Adding the 32-bit VPIAP to the PowerPC 620 would result in an increase in transistor count of only 3 percent. Although this would result in only a modest increase in area, the overall increase to the cost of the processor would be larger due to additional costs for designing and testing the combined systems.
Incorporating a VPIAP in a real data processing architecture environment requires additional resources. The amount of resources required and the overall performance achieved depend on the design of the original system and how the VPIAP is interfaced with the central processing unit (CPU). For example, if the VPIAP is implemented as part of the CPU, then the CPU needs to be able to recognize VPIAP instructions and transfer them to the VPIAP for processing, and buses are required to transfer data between the VPIAP and the CPU. Based on the area estimates, it is reasonable to assume that adding a 32-bit VPIAP to the CPU will require approximately the same amount of area as adding an IEEE double precision floating point processor.
Although the addition of a VPIAP to the main CPU will improve the performance of variable-precision, interval arithmetic programs, it may decrease the performance of the CPU for regular programs due to the need for more complex control logic. The impact on regular programs depends on the CPU architecture and implementations, as well as the types of programs being executed. Also, the design of the VPIAP may need to be modified to allow its clock rate to match the clock rate of the CPU. If the VPIAP is implemented on a separate chip from the CPU, it is likely to need its own bus interface unit and memory cache. This approach will also decrease the performance for variable-precision, interval arithmetic programs due to the overhead needed for transferring data between the CPU and the VPIAP.
VARIABLE-PRECISION INTERVAL ARITHMETIC ALGORITHMS
This section gives an overview of the hardware algorithms for variable-precision, interval arithmetic. A complete description of each of these algorithms is given in [29] . For the variable-precision arithmetic operations, the two operands are denoted by A and B, with significands F A and F B and exponents E A and E B , respectively. For variableprecision, interval arithmetic operations, the intervals are X x l ; x u and Y y l ; y u . Intervals are stored in the register file using a pair of consecutive header words, with the header word for the lower endpoint stored first.
Addition and Subtraction
To perform variable-precision floating point addition, E A and E B are compared using the header word generator. The operand with the larger exponent has its significand words written into the long accumulator. If it is assumed that E B ! E A , the words of F B are read two at a time from the significand memory, passed through the operand selector and adder, and written into the long accumulator. In the subsequent cycles, F A is added to the long accumulator, using a series of 2m-bit additions in which the carry-out of the ith addition is the carry-in of the i 1th addition. To accomplish these additions, the words from F A are read from the significand memory two at a time, passed through the operand selector, shifted by an appropriate amount, added to two words from the long accumulator, and then written back to the long accumulator. The difference between E A and E B is used to select the appropriate words from the long accumulator and to determine the number of bits that each word of F A is shifted. If addition is performed on operands with different signs, or subtraction is performed on operands with the same sign, the number with the smaller magnitude is subtracted from the number with the larger magnitude and the sign of the result is set to the sign of the number with the larger magnitude. The header word generator determines the exponent, sign, type, and length of the result. After the final result is computed, the long accumulator is normalized and rounded to a specified precision using the rounding and normalization logic in the long accumulator. The significand of the result is then written from the long accumulator to the significand memory, two words at a time.
Interval addition and subtraction are defined [6] as
Thus, interval addition (subtraction) requires two variableprecision additions (subtractions). The lower endpoint is computed and rounded toward negative infinity. The upper endpoint is then computed and rounded toward positive infinity. The processor automatically changes the rounding direction of the long accumulator depending on whether the lower or upper interval endpoint is being computed.
Multiplication
For floating point multiplication, the significands of the two operands are multiplied and the exponents are added. The sign of the result is positive if the signs of the multiplier and the multiplicand are the same and negative if they are different. Since the significand of the product is between 1/4 and 1, it may be necessary to shift the significand left one position and decrement the exponent to normalize the product. Variable-precision multiplication is performed by using the multiplier, adder, and long accumulator to generate and accumulate partial products. The long accumulator, which is initially set to zero, stores the sum of the partial products. Each iteration, one word (m bits) of the multiplier, and one word of the multiplicand are read from the significand memory and multiplied to produce a two-word partial product. While this multiplication is taking place, two words of the previously accumulated partial products are read from the long accumulator. These two words go through the operand selector and the shifter, without being modified, and are added to the newly generated partial product. The result is then stored back in the long accumulator for the next iteration. To multiply two n word variable-precision numbers, n 2 partial products are generated and accumulated. Fig. 4 shows the multiplication P A Á B, where A and B are both four words and P is eight words. The order for generating and accumulating the partial products is indicated by the numbers shown in parenthesis. Each pair of rows in Fig. 4 corresponds to multiplying one word of B by A. The significand words of B are accessed from least significant to most significant, i.e., B 3 to B 0 . If n is even, then B i is first multiplied by the odd words of A, followed by the even words of A. If n is odd, then B i is first multiplied by the even words of A, followed by the odd words of A. With this technique, the carry-out of one addition is used as the carry-in of the next addition.
The multiplication of A Á B 3 , which corresponds to the first four partial products in Fig. 4 , is used to illustrate the data-flow for variable-precision multiplication. Initially, the long accumulator is set to zero. In the first iteration, A 3 and B 3 are read from the significand memory and multiplied together. Simultaneously, two words, P 6 and P 7 , are read from the long accumulator. These two words, which are both zero, go through the operand selector and the shifter without being modified and are added to A 3 Â B 3 . The result is then stored in the long accumulator words P 6 and P 7 . Similarly, in the second iteration, A 1 and B 3 are multiplied together and added to the long accumulator words P 4 and P 5 , which are both zero. The result, A 1 Â B 3 , is stored in back in the long accumulator words P 4 and P 5 . In the third iteration, A 2 and B 3 are multiplied together and added to the long accumulator words P 5 and P 6 , which have the values computed in the previous two iterations. In the fourth iteration, A 0 and B 3 are multiplied together and added to the long accumulator words P 3 (which is zero) and P 4 (which was computed in the second iteration). For this addition, the carry out from the previous iteration is used as the carry in, and the result is stored in the long accumulator words P 3 and P 4 .
To perform the entire multiplication of A Á B, this processor continues until all 16 partial products have been generated and accumulated. After this, the significand is normalized, rounded, and written from the long accumulator to the significand memory, two words at a time. In parallel with the multiplication, the header word generator determines the exponent, sign, type, and length of the result and writes these values to the header word memory. If one or both of the operands is zero, infinity, or not-a-number, the partial products are not generated and accumulated since the result can be calculated based only on the types and signs of the input operands [29] .
Interval multiplication is defined [6] as X Â Y minx l y l ; x l y u ; x u y l ; x u y u ; maxx l y l ; x l y u ; x u y l ; x u y u : 3
Rather than computing all four products and then comparing the results, the endpoints to be multiplied to form the lower and upper endpoints of the product are determined by examining the sign bits of x l , x u , y l , and y u [6] . With this technique, only two variable-precision multiplications are required to perform interval multiplication unless 0 P X and 0 P Y . Performing the endpoint selection in software would require at least nine conditional branches. Instead, a specialized endpoint selection circuit selects the appropriate interval endpoints. The endpoint selection circuit requires only 16 logic gates and is also used to select the endpoints for interval division [29] .
Division and Square Root
The algorithm for division is a variation of the short reciprocal divide algorithm [34] , which has been modified for variable-precision, interval arithmetic. The variableprecision division algorithm uses an approximation to the reciprocal of the divisor to generate and accumulate successive quotient digits. It requires n 2 4n 4 single precision multiplications and 3n 2 6n 4 single precision additions to divide two n word numbers and produce a correctly rounded n word quotient [29] .
Interval division is defined [6] as X=Y minx l =y l ; x l =y u ; x u =y l ; x u =y u ; maxx l =y l ; x l =y u ; x u =y l ; x u =y u : 4
Similar to interval multiplication, the sign bits are examined to determine which endpoints are divided to compute the endpoints of the quotient and only two variable-precision divisions are required. If 0 P Y , the quotient is ÀI; I and the types and signs of the interval quotient's endpoints are set without performing division. A similar algorithm is used to compute square roots. It requires n 2 5n 4 single precision multiplications and 4n 2 7n 4 single precision additions to compute the square root of an n word number to n words of precision [29] . Since the square root is monotonically increasing, an interval square root is defined [6] as
provided that x l ! 0. Otherwise, one or both endpoints of the result is not-a-number.
Other Operations
To support variable-precision, interval arithmetic, several other operations are provided. These include interval intersection, hull, width, and midpoint, which are defined [6] as intersectX; Y maxx l ; y l ; minx u ; y u 5
hullX; Y minx l ; y l ; maxx u ; y u 6 widthX x u À x l 7 midpointX x l x u =2: 8
The interval intersection and hull operations take two variable-precision intervals and return a variable-precision interval. To determine the minimum and maximum values, the header word generator and the significand word adder are used. If the two numbers being compared have the same sign, type, and exponent, the significand words are compared from most significant to least significant. It takes at most 2n significand comparisons to determine the lower and upper endpoints for interval hull. Interval intersection uses at most 3n significand comparisons since, after the lower and upper endpoints are determined, a test is made to ensure that the upper endpoint is greater than or equal to the lower endpoint. The interval width and midpoint operations take one variable-precision interval and return a variable-precision floating point number. The midpoint and width operations are implemented using the variable-precision addition and subtraction algorithms, respectively. The division by two in the midpoint operation is implemented by decrementing the exponent of x l x u by one. Table 5 shows the number of cycles required for operations on both variable-precision point (VP) and interval (VPI) operands. The number of m-bit words in each operand is denoted by n. The cycle counts reported include the cycles needed for instruction fetch, instruction decode, reading the operands from the register file, performing the operation, rounding the result, and storing the rounded result back into the register file. The cycle counts given assume that the operands are already in the register file.
Performance Estimates
1 These estimates were verified by a cycle accurate VHDL model of the processor [29] .
The algorithms for addition, subtraction, hull, intersection, midpoint, and width require On cycles, and the algorithms for multiplication, division, and square root require On 2 cycles. Although algorithms with better asymptotic complexities exist, they require more control logic and are slower for commonly used precisions. Hardware support and efficient implementation of the interval operations allow them to be executed in approximately twice as many cycles as the equivalent operations on point operands. Tables 6, 7 , 8, and 9 show execution times for variableprecision interval addition, multiplication, division, and square root on each of the VPIAPs. The number of bits 1 . The processor uses a load-store architecture in which all values are loaded into the register file before being used.
TABLE 5 Cycle Counts
(n Á m) is varied from 64 to 1,024 bits. For comparison, the execution times of the VPI software package (VPI-SP) are also given [14] . The ratio of the VPI-SP's execution time to the corresponding processor's execution time is given in parentheses.
For the VPIAPs, the execution time is computed as the product of the number of cycles and the cycle time. The execution times of the VPI-SW are determined by running 1,000 iterations of the operation on a 40 MHz Sparc IPX processor and taking the average execution time. When the precision of the computation is low, the 64-bit and 32-bit VPIAPs have comparable execution times. This occurs because, although the 64-bit VPIAP requires fewer cycles to perform the operations, the 32-bit VPIAP has a shorter cycle time. As the precision of the operands increases, the number of cycles becomes the dominant factor and the 64-bit VPIAP has the shortest execution times. Based on these estimates, the VPIAPs are two to three orders of magnitude faster than the VPI-SW.
The 16-bit VPIAP has the shortest cycle time and uses the least amount of area, but has the longest execution times for the operations examined. The 64-bit VPIAP has the shortest execution times for most operations, but uses the largest amount of area and has the longest cycle time. The 32-bit VPIAP offers a good compromise between these two designs. It uses less than half the area of the 64-bit VPIAP and has comparable execution times for low to moderate precisions.
SOFTWARE INTERFACE AND SAMPLE PROGRAM
This section describes a software interface to the variableprecision, interval arithmetic processors and gives a sample program that demonstrates its use. The software interface, which is an extension of the C++ programming language, is similar to the interface provided by the VPI software package [14] . To support numerical computations, the software interface provides instructions and operations for twelve data types. These include combinations of variableprecision floating point numbers, intervals, complex numbers, vectors, and matrices. The numerical data types supported by the software interface are shown in Table 10 . For example, the data type vp_cimatrix is a matrix of complex, variable-precision intervals. Fig. 5 gives a sample program that shows how variableprecision, interval arithmetic is used to recompute inaccurate results. The program computes z x 4 À 4y 4 À 4y 2 . Initially, the maximum precision and result precision are both set to MAXPREC words, which is 16 in this example. After this, the intervals x and y and the error tolerance are read from standard input. Next, a loop is entered in which z is computed and the width of z is tested to see if it is less than the error tolerance. Once the width of z is less than the error tolerance or the maximum precision is exceeded, the loop is exited and widthz is output. Fig. 6 shows the results of running the program with values of m 32, x 665857; 665857, y 470832; 470832, and tolerance = 10 À10 . When the precision is one or two words, roundoff error and catastrophic cancellation lead to extremely large intervals. When the precision reaches four words (128 bits), the width of z is less than 10 À10 and the loop is exited. Although the correct result is z 1, the result computed with IEEE double-precision arithmetic is z 1:18856 Á 10 7 [17] .
CONCLUSIONS
This paper presents hardware designs, arithmetic algorithms, and software support for a family of variableprecision, interval arithmetic processors. These processors allow the programmer to set the initial precision of the computation, determine the accuracy of the results, and recompute with higher precision. With these processors, variable-precision, interval arithmetic algorithms can be used to solve problems that cannot be efficiently solved using traditional floating point computation. Direct hardware support for variable-precision, interval arithmetic greatly improves the accuracy and dependability of numerical computations and is much faster than existing software techniques for monitoring and controlling numerical error. Based on execution time estimates, the processors are two to three orders of magnitude faster than the VPI software package for variable-precision floating point and interval operations. The processor designs and arithmetic algorithms have been specified and simulated at the behavioral level in VHDL [29] .
