A new conception of flexible calculation that allows us to adjust a sum depending on the available time computation is presented. More specifically, the objective is to obtain a calculation model that makes the processing timelprecision more flexible. The addition method is based on carry-select scheme adder and the proposed design uses precalculated data stored in look-up tables, which provide, above all, quality results and systematization in the implementation of low level primitives that set parameters for the processing time. We report an evaluation of the architecture in are% delay and computation error, as well as a suitable implementation in FPGA to validate the design.
INTRODUCTION
There are a great number of applications that are difficult to tit into the rigid schemes of the calculation of conventional arithmetic architectures. For these applications it would be advantageous to have operators that provide control on the results and act on the even quality of the result and processing cost based on the specific computational requirements of each case [I] , [2] .
We can find several examples in which an intensive processing of data provided by peripheral takes place. In these cases, a strong coordination among sensors and the rest of system is necessarily pmduced. Far example, in systems of mobile objects guidance, when the speed of the object is increased, the system has less time to process the information that is received from the sensors and to make decisions about its movement. In this application, a fast answer in appropiate time that allows decisions to be made at every moment may be advisable, although at the expense of less precision in the results.
In this paper, we propose a method of flexible arithmetic sum. Its main characteristic is the variable quality of the result based on the available time. The algorithm is based on the use of strategies that contribute determinism to the response time and, at the Same time, allow for parallel designs.
DESIGN PRINCIPLES
The proposed method consists of the combination of two techniques: obtaining the result in a successive processing way and using precalculated data in look-up tables.
Response quality is related to the number of calculated stages of the sum, and therefore, will be able to act on the timequality-parallelism relationship. This approach forms a new architechue that will implicitly incorporate flexibility in order to adapt the duration of the calculation to time availability, which is the insInunent for real-time management. This characteristic * This work is being backed by gmnt DPI2002-04434-CM-01 from the Mnisferio de Cienciay Tecnologio of the Spanish Government.
provides capabilities for successive refinement of the solution.
The precalculated data memories (LUT -Look Up Table) have interesting characteristics relating to real-time processing: they work in a totally determinist way and they can incorporate ermr detection and correction mechanisms. Note that this operation is only possible for some k values, as will be explained below.
ALGORITHM

Addition Method
The proposed addition method is based on the carry-select adder scheme [6] and it is made up of the following steps:
1 . Fragmentation of operands info k-sire block It is immediate from the original operands. For numbers of m bits (with m>k), we can divide the number into n blocks of k bits, so that n k t m. 2. Addition ofthe correspondingpairs of blockr. The partial additions are obtained directly from a look-up table containing the precalculated results: LLT-Adder. The processing of the cany is directly made by obtaining the sum and its successor from a Compound LUT-Adder. Figure 1 . By designing multiple memory access routes, simultaneous access can be gained without the need for several memory chips.
3. Ordered concatenation o/thepanial odditions toking the curry logics info account: The selection of each block is a function of the carry bit of the preceding block selected according to the algorithm carry-select adder [6] . The compound LUT-Adder is used to consider the carry in a direct way, since adding the carry to a block is the same as obtaining its successor in the LUT.
Flexible addition
The addition operation based on Look-Up Tables offers predictability of the response times. The basic idea consists of only performing the sum on blocks for which available time exists. Therefore, this design has real-time propenies. Thus, depending on the time available, the system will adapt the quality of response. According to the increase in the number of iterations, the e m 1 rate will decrease.
The flexible adder design is based on the previous algorithm with the special feature that only part of the blocks obtained from the operands are combined, according to the time availability.
In a scheme of sequential concatenation of the partial additions, it is proposed that the combination of the blacks will begin with block i, depending on the availability of time, and move towards the left, figure 2. It is possible to obtain fast approximations of the result by selecting only most of the lefi blocks and selecting the rest at random or the upper ones, An arithmetic unit must have an operation control that translates the requirements into the number of processed stages. The operation control module consists of a combinational circuit that has problem conditions in its inpuu and a number of operation stages in its outputs, for example, a coder, multiplexor or table circuit.
ARCHITECTURE
This architecture is suitable for specific purpose applications where time restrictions are present. Figure 4 shows the proposed architecture. We assume, for example, the numbers are fragmented into 4 blocks. The main features of this architecture are:
Design
and its successor for all the pairs of blocks.
Access to the Compound LUT-Adder provides the result
The selection circuit has a simple design since the effective sum is carried out in the memory with precalculated results.
The tree selection combine,; the partial results until the final result of the complete operation is obtained. Three results with different qualities of degree and delay :are extracted from the tree selection circuit.
The operation control circuit selects the partial result that best fits the conditions of the problem. Apparently, the improvement in time of one incomplete sum is not important. Nevertheless, when the amount of the sum to be made is elevated, the architecture acquires a greater relevance.
I >+-
EVALUATION OF 1"E ARCHITECTLIRE
In this section, we present estimates of the area costs, execution time and error computation of the architecture proposed in the previous section. The power consumption of the circuit is not important for this research and is therefore nc>t dealt with m this paper. It will be studied in depth in the event that it can be implemented in a chip.
Area estimations
The main contributions to the iirea of the architecture come from the compound LUT-adder. The area of the selection circuit is small when compared to the a1:ea of the LUT. The model we use for the area estimations is taken from [+I, [51 and [7] .The unit used is the size of a complex gate za. since the area of the compound LUT-adder, selection circuits, and multiplexor are easily expressed in this unit. Tha LUT-adder is the component of the architechue that occupier, the greatest area The others have a marginal area in comparison and, therefore, the estimation is focused only on the LUT-adder.
Data storage imposes severe restrictions on k block size. As we can see in table 1, the area cost increases exponentially with the k value. Therefore, we have to achieve a balance between the memory required and the complexity of the circuit. Table 2 shows the cost in terms of ca for the most common sizes. As shown in the previous tables, the amount of area is much greater than in conventional adder designs based on simple combinational circuits, nevertheless, this architecture is still suitable for applications in which the size of the circuit is not a problem.
Delay estimations Delays in the complete addition calculus is divided into:
Access time to the LUT-Addifion in order to obtain the precalculated results. This time will only be determined by memory access time TLUT. Let II be the delay of a complex gate, such as one full-adder. According to [ 5 ] , [7] analysis' we assume a delay of about TLUT = 3.5tt for 8 input bit tables, T L~ = 51, for 12-13 input tables and TLUT = 6.%1 for 16 input bit tables.
Selection of the blockr that make up the result. In the case of tree concatenation, total selection time is obtained by taking into account that all the selections at one tree level are carried out in parallel and that the total number of tree levels is lg2 n.
Let T,* be the time taken in the selection on one tree level, so the expression for the total selection time is T,,I.lg, n. A selection step consists o f two single gates: (and, or), or one complex gate q, so T , I
= 1 T,. For operands of m = n k bits, the proposed algorithm calculates the addition in: Tadd = TLUT + Ts,dgn(mk) time units.
We perform a comparison of the proposed architecture with other known adder algorithms.
In the first place, in terms of the expression of the asymptotic temporal complexity, those adders have a growth in the delay equal to the proposed design. The table 3 shows the temporal complexity of adder designs [8], where m is the length in bits of the operands. The addition algorithms differ in the constants that modify the general cost expression. In the TLA algorithm, the compound LUT-Adder performance plays a fundamental role in the final calculation time.
In addition, the circuit delays depend on the technology used and on the implementation itself in order to prove this, the TLA-4 and TLA-8 have been implemented in VHDL and tested on FPGA in comparison with the implementation provided by [9]. The LUT implementation corresponds to the design presented in [IO] , [Ill, and has been integrated into the selection circuit. Table 4 shows the results obtained after the synthesis and simulation of each adder for some number wordlength, including k wordlength. COSA codification is not available in [9]. We do not implement it to get objectivity in the results.
' Implementation using a family of standard gates from the AMS 0.35 pm CMOS libmy 
Adder
The previous results demonsh-ate that the proposed adder design presents a delay similar or better to the conventional designs for this particular implementation, and they show the technology's high degree of dependency on performance. The independent additions test consists of calculating the average error rate in 10' additions of two random rational numbers.
Error computation analysis in flexible addition
The Successive addi/jom test is aimed at empirically analyzing error propagation while adding inaccurate values consecutively. In this case, the error avenge is calculated in 1,000 sets of 1,000 successive additions of random rational numben within the interval [0,1) for each of the operation's loops, that is, the result of each of the additions acts as an addend of the following addition operation and so on. The numbers are generated at a positive interval, so they do not compensate positive errors with negative ones in the successive additions. However, the uselected partial results are selected alternatively by cxxcess or default in order to provide compensation on a complete operation level. The application consists of calculating the final force that is formed by the combination of individual forces in mal-time
APPLICATION EXAMPLE
(1) way. This is the sum of all of them:
We proposed the following architecture to resolve the expression (I):
CONCLIJSIONS
The following conclusions have been drawn from the research described in this paper:
The use of precalculated results in stored logic permits the construction of fast operators comparable tci existing methods and lays the foundations for the design of high performance architectures. Adder complexity h m s out to be logarithmic with the number of blocks: T. , , , E O(l0g (n)). The proposal equals the asymptotic temporal complexity of present-day high performance adders. Technological improvements in manufacture or in communication with the selection circuit will tend to reduce TLUT access time and, therefore, total addition time.
The adder behavior, whkh produces more and more precise results as the number of iterations increase, is suitable for the construction of systoms with temporallprecision restrictions, in which result quality is exchanged for response determinism and speed. Developed methodology for the adder, due to its features of high performance and obtaining imprecise calculations with limited and de,:reasing cnor, can be used in the development of other arithmetic operations with temporal restrictions, Finally, the error analysis: camed out shows that the algorithm provides limited resulls in addition operations, even in cases in which successive calculations are made with imprecise operands. Table 5 shows the number of stages in a tree selection, the time-saving that takes place in the five sums and the computation mor. Simulation is made in FPGA for a set of 1000 series of five consecutive sums.
The simulation results demonstrate that this technique saves considerable time in cases in which a fast respome is necessary. Error is maintained witbin the acceptable margins. Although the value of the final force is not obtained with absolute precision, the result can be sufficient to make a movement decision.
