An algorithm for a fast decimal addition is proposed. The addition is performed in two steps. First, the result of addition is produced in a decimal signed-digit format. Second, the decimal signed-digit result is converted into the non-redundant form BCD. The conversion uses a borrow generating scheme based on a parallel-prefix network. Using the flexible features of the decimal signed-digit representation, the proposed decimal addition is changed in order to perform a decimal subtraction. An architectural implementation for the combined decimal addition and subtraction is proposed. The design is evaluated and compared with some decimal adders available in the literature and improved performance is reported.
INTRODUCTION
Performing manual calculations using decimal arithmetic is part of human nature. Typical computers, on the other hand, support binary arithmetic more readily. The ENIAC, which became operational in 1945 at the University of Pennsylvania, was one of the early attempts to use radix 10 calculations in digital computers [Hill et al. 2000] . Studying the recent processors reveals that the IBM eServer z900 seems to be the only one capable of performing decimal instructions in hardware [Schwarz et al. 2002; Busaba et al. 2001] . However, its decimal computation capability is limited just to integers operands. Recently, decimal arithmetic has become more important in the financial and commercial world including banking, tax calculation, currency conversion, insurance and accounting. The following facts may explain this recent interest.
-A survey of commercial databases [Tsang and Olschanowsky 1991] shows that Authors' address: School of Electrical and Electronic Engineering, The University of Adelaide, Adelaide, SA 5005, Australia Permission to make digital/hard copy of all or part of this material without fee for personal or classroom use provided that the copies are not made or distributed for profit or commercial advantage, the ACM copyright/server notice, the title of the publication, and its date appear, and notice is given that copying is by permission of the ACM, Inc. To copy otherwise, to republish, to post on servers, or to redistribute to lists requires prior specific permission and/or a fee. c 20 ACM 1529-3785/20/0700-0001 $5.00 98.6% of the numbers stored are decimal or integer while, more than half of them are represented in pure decimal format.
-It is well understood that when converting between decimal and binary formats, most fractional decimal numbers are only approximately represented in binary floating-point (FP) representation and therefore, may loose precision [Gay 1990; Clinger 1990 ]. This means that, using binary FP numbers in financial applications, which cannot tolerate errors, does not necessarily guarantee correct results. -Regulations such as that for the European Commission Directorate General II [European Commission Directorate General II 1999] specify decimal digits for currency calculations.
The importance of decimal arithmetic has led to a proposed revision to the IEEE 754 standard for FP arithmetic to include specifications for decimal arithmetic specifications [IEEE Standard Committee 2004] . This means that even though computers are still carrying out decimal FP (DFP) calculations using software libraries [Sun Microsystems 2001; Python Software Foundation 2004] and binary FP numbers, it is likely that in the near future, most high end processors will perform decimal operations directly on DFP operands using dedicated DFP units. Decimal adders are the critically important building blocks of almost all decimal operations. For example, decimal subtraction A − B is defined as decimal addition of the minuend A and the 9's complemented subtrahend B. Decimal multiplication A × B can be understood to be B repetitive additions of the multiplicand A. In the same manner, decimal division A B can be performed using repetitive subtraction of the divisor B from the dividend A [Parhami 2000 ].
This paper presents the algorithm and the implementation of a decimal adder. The input operands and the final sum are represented in BCD format. A simple method and the corresponding implementation are proposed to use the decimal adder as a decimal subtractor. The adder is optimised in order to minimise the critical path delay. The design can be used as the coefficient adder in a DFP unit.
The paper is organised as follows. After the introduction in Section 1, a discussion on the decimal adders available in the literature is given in Section 2. Section 3 presents a signed-digit arithmetic for representing redundant decimal numbers. This arithmetic is used for developing the proposed decimal addition. The decimal addition and the implementation of the major components involved in the design are discussed in detail in Section 4. In Section 7, the design is evaluated and its features are discussed.
BACKGROUND
In the traditional approach for decimal addition of BCD operands [Hwang 1979 ], the two BCD digits are added using a 4-bit binary adder. Then, if the result is found to be larger than 9, it is corrected by adding 6. This generates a carry, which must be added to the next decimal position.
There are several reports of decimal adders that provide concurrency between these two stages. They precalculate the possible values for each result digit and meanwhile, using a carry lookahead or a parallel-prefix network investigate whether the result digit can be represented in BCD format. Then, the carry bits calculated by the network are used to select one of the values as the sum digit.
· 3
One of the earliest attempts to implement a decimal adder was proposed by Schmookler and Weinberger [Schmookler and Weinberger 1971] . The adder produces BCD sum digits without producing the binary sum digits. Unlike the traditional approaches, their technique does not need the additional decimal correction of the result digit. Schmookler and Weinberger correct each sum digit as soon as a carry is produced. In their approach, decimal addition is performed in two separate steps as redundant decimal addition and redundant to BCD (non-redundant) conversion. In the redundant decimal format used by Schmookler and Weinberger, each redundant result digit z i is represented in the decimal carry-save format as
where c i+1 is a single bit carry-out to the next addition position and s i is a BCD sum digit with weight 1 10 times the wight of c i+1 . Although the decimal carrysave system requires only 5 bits to represent a digit, a considerable delay overhead applies to negate numbers.
More instances of decimal adder implementation can be found in [Tague and Woods 1989; Bultmann et al. 2001; Busaba et al. 2001] . Haller et al. [Haller et al. 1999 ] propose a combined binary/decimal adder unit. The unit generates all pre-sum values for each result digit. Meanwhile, it calculates the carries over all the addition positions. In the Haller et al. approach, like the well-known binary carry-select adder, both pre-sum values resulting from carry-in equal to 0 and 1 are calculated using the 6 correction logic. Then, one is selected once the value of the corresponding carry is revealed. This results in shorter response time, however, due to the representation of redundant numbers, selecting between decimal addition and subtraction requires hardware that contributes a considerable overhead delay to the circuit response time.
DECIMAL SIGNED-DIGIT ARITHMETIC
In this section a decimal signed-digit (DSD) arithmetic is described. This DSD arithmetic will be used in Section 4 to construct the decimal adder.
Related Work
Traditionally, a conventional decimal digit z i , where
is represented in a 4-bit format, known as binary coded decimal or BCD [Hwang 1979 ]. However, this method is insufficient in its original form, to represent a DSD z i of the form
Although redundant binary arithmetic is a mature subject with a research history dating back to 1961 [Avizienis 1961 ], redundant decimal, specifically DSD systems, seem not to have received similar attention. Shirazi et al. [Shirazi et al. 1989 ] present a balanced signed-digit (SD) representation for BCD numbers, named RBCD. In the RBCD arithmetic, every digit
where z i is stored in the 2's complement form using a 4-bit binary array. Although this system requires a small number of bits to represent a RBCD digit, it suffers from a delay overhead occurred by a BCD to RBCD converter. The conversion is needed when conventional decimal numbers are applied to the inputs of the system.
DSD Presentation
In the proposed DSD arithmetic, a DSD number Z expressed as
is an n-digit array with digit z i selected from the maximally redundant set [Ercegovac and Lang 2004]
The digit z i is represented by a 4-digit binary SD (BSD) vector as
where
where "−" refers to decimal subtraction. This representation is a natural extension to the well known BCD format therefore, a BCD to DSD format conversion can be performed in zero time with no use of hardware.
DSD Negation
The DSD number Z = (Z + , Z − ) can be negated simply as
where "¬" denotes a 1's complement (invert) function.
DECIMAL ADDITION ALGORITHM
In this section, an algorithm for the decimal addition Z = X + Y , where X = x n−1 x n−2 · · · x 1 x 0 and Y = y n−1 y n−2 · · · y 1 y 0 are two n-digit BCD numbers, is proposed. The decimal addition is performed in two steps; DSD addition (DSDA) and DSD to BCD (DSD2BCD) conversion.
DSDA
DSDA adds corresponding digits of the operands and produces the sum in DSD format. Considering x i and y i as two BCD digits of X and Y in set (2), a 1-digit decimal addition is expressed as
· 5 where z i ∈ {0, 1, · · · , 17, 18} .
As can be seen from (11), the result digit is no longer in the set (6). Hence it must be converted if it is to be a DSD. This problem is fixed by performing a digit-set conversion using the idea of "generalized SD number system" [Parhami 1990 ] as follows.
(1) The position sum p i is computed as
(2) The transfer bit t out is derived from p i to calculate the final result digit as
where t in is the transfer bit from the adjacent addition position in the right.
From (13) it is obtained that if
then for every value of p i , the DSD z i will satisfy (6). As mentioned before, the transfer bit from the previous addition position, t in , is always set to 1 unless, t in is applied to the least significant position of the decimal adder. If so, the carry-in signal applied to the n-digit addition determines the value of t in . Having applied the digit-set conversion, the result digit z i is now in set (6) and is known to be a DSD digit. Due to the use of the carry-free BSD addition, all the result digits z i , where i = 0, 1, · · · , n − 2, n − 1, are produced in parallel. It should be noted that as t out = 1 in the most significant DSDA position, representation overflow [Parhi and Srinivas 1995] occurs and therefore, the DSD sum obtained from DSDA is represented in n + 1 digits. The additional digit is shown as
DSD2BCD Conversion
Once DSDA calculates the result digits in DSD format, DSD2BCD conversion reformats the sum representation from DSD to the conventional BCD form.
4.2.1 Definition. Considering Z as an n-digit integer DSD number described in Subsection 3.2, the DSD2BCD converting function f is defined as
where z ′ i is a BCD digit represented as
It is clear that Z and Z ′ differ just in their representation, not in value.
Algorithm.
The conversion from DSD to BCD can be represented as a prefix problem [Zimmermann 1998 ]. This means that the corresponding parallelprefix algorithms can be used in order to minimise the conversion cycle time. Similar to the conventional binary parallel-prefix algorithms, the DSD2BCD function is described through the generate-propagate scheme [Weste and Harris 2004] .
For an n-digit DSD number Z as represented in Subsection 3.2, signals propagate and generate are respectively defined as
and
In addition, a new signal, namely borrow, is introduced as
where "∨" represents the logical AND and "∧" denotes the logical OR. Signal b i , which has a definition opposite to carry, indicates whether a value equal to 1 should be subtracted from the next digit. To represent (20) as a parallel-prefix algorithm, the group propagate and group generate signals are recursively defined as
Also, the group borrow can be defined as
Now, using the defined parallel-prefix scheme, the algorithm of the DSD2BCD function can be defined as follows.
(1) Each digit z i of the DSD number Z is converted to the 2's complement form of
In this operation, n digitwise conversions are carried out simultaneously. Meanwhile, for every i = 0, 1, · · · , n − 1, values
, and z3
are generated in the 2's complement format. Values z0 (2) Considering (18) and (19), it is understood that p i and g i can be produced as
where "⊙" is the logical XNOR, and
where cout z0 ′ i is the most significant carry generated when converting z i to z0 (21), (22) and (23), all the group propagate and the group generate signals required for producing B i , where i = 0, 1, · · · , n, are calculated. (4) Finally, either z0
(29)
DECIMAL SUBTRACTION ALGORITHM
In the proposed DSD arithmetic, subtraction Z = X − Y , where X and Y are two n-digit BCD numbers, can be performed with no hardware and therefore with no time delay as
To show this feature, the subtraction operation can be expressed digitwise as
where i = 0, 1, · · · , n−1. It has been stated already that the result of the subtraction x i − y i , where x i and y i are two binary numbers, can be interpreted as BSD number (x i , y i ). On the other hand, since x i and y i are two BCD numbers, it is known that
Therefore, digitwise subtraction (31) results in
6. IMPLEMENTATION 6.1 DSDA Figure 1 displays the implementation of DSDA, the circuit producing the DSD digit z i . It consists of the following components.
-A 5-digit BSD adder calculates z i = x i + y i − 10 in the BSD format. In fact, the BSD adder performs the calculation
x y <3> <2:0> <3>
<4> <3> <3> <2:0> Fig. 1 . The structure of a 1-digit DSDA. For nonzero addition positions, t in = 1. The unit tagged as BSDA represents a BSD adder. Notations <n> and <n : m> represent n-th bit and n-th to m-th bits of the corresponding bit array.
. . .
2. An n-digit DSDA implemented using 1-digit DSDA building blocks. The extended digit of the result is Z <n>= zn = z + n , z − n = (000cout, 0000) = (0001, 0000).
where (x i , y i ) is interpreted as an SD digit applied to the adder.
-An adjust circuit reformats z i from 6-digit to 4-digit representation. The circuit is developed based on the following observation.
, can take only one of the sets ((0, 0), (0, 1)), ((0, 0) , (1, 1)), ((0, 0) , (1, 0)), ((0, 0) , (0, 0)) and ((1, 0) , (0, 1)). Figure 2 shows an n-digit DSDA, which is built using n blocks of 1-digit DSDA. The BSD adders can be implemented using the circuit depicted in Figure 3 or otherwise [Kornerup 1999 
3. An implementation for the n-digit BSD adder with a BSD augend and a 2's complement addend. The units tagged as FA represent single-bit full adders.
DSD2BCD Conversion
Based on the discussed algorithm, an implementation for 1-digit DSD2BCD conversion is designed and shown in Figure 4 . As shown in the figure, z0 ′ i is obtained from z i using a 4-bit binary adder with a = z + i , b = ¬z − i and cin = 1. The identity between BSD to binary (2's complement) conversion and the binary addition processes is now clearly understood [Uya et al. 1984; Vandemeulebroecke et al. 1990; Srinivas and Parhi 1992; Parhi 1997] . Consider binary subtraction Z = X − Y , where X = x n−1 x n−2 · · · x 1 x 0 and Y = y n−1 y n−2 · · · y 1 y 0 represent two n-bit 2's complement numbers. Using the BSD representation definition, the composite number X − Y can be interpreted as a BSD number, where each BSD digit (x i , y i ) has a value x i − y i . This means that algorithms developed for the binary subtraction can be directly used for the BSD to binary conversion. Figure 4 shows that once the three 4-digit BSD adders perform additions (26) and deliver the results, the following 4-bit binary adders convert the BSD results into the 2's complement format. The 4-bit binary adders can be realised through either carry-propagating or parallel-prefix approaches.
To produce the group propagate (P i:j ) and the group generate (G i:j ) signals (and consequently, B i signals), any parallel-prefix methods including Sklansky [Sklansky 1960 ], Brent-Kung [Brent and Kung 1982] , Kogge-Stone [Kogge and Stone 1973] and Han-Carlson [Han and Carlson 1987] is applicable.
Combined Decimal Adder/Subtractor
In order to have a circuit capable of carrying out the decimal addition and subtraction, it is sufficient to insert a MUX 2:1 (2 to 1 multiplexer) between DSDA and DSD2BCD converter as shown in Figure 5 .
EVALUATION
Investigating the implementation proposed for the decimal adder/subtractor reveals that for decimal input operands with conventional lengths, i.e. 7, 16 and 34 digits based on the IEEE 754R standard [IEEE Standard Committee 2004] , the critical
... 
4. An implementation for one digit of the proposed (n+1)-digit DSD2BCD converter. Signals p i and g i , where i = 0, 1, · · · , n, are applied to the parallel-prefix network to generate B i . The path of the design passes through DSDA, MUX 2:1 selecting between decimal addition and subtraction, the parallel-prefix network producing B i and the MUX 2:1 placed at the end of DSD2BCD conversion. This means that unlike some implementations such as the BCD and binary adder/subtractor introduced by Fischer and Rohsaint [Fischer and Rohsaint 1992] with propagation delay of O (n), the delay of the proposed approach is proportional to log 2 n, which is considerably lower. Bultmann et al. [Bultmann et al. 2001 ] present a binary and decimal adder unit
with a digit carry network implemented using the carry lookahead approaches with propagation delay of O (log 2 n). In this design, due to the algorithm used for defining decimal addition, digitwise +6 adders are employed prior to the multiplexors selecting between decimal addition and subtraction. In the proposed combined decimal adder/subtractor, this binary adder is replaced with DSDA, which incurs a shorter delay to the circuit response time due to its simple structure. This improvement is made possible only because of the new DSD arithmetic and the new algorithms proposed for performing decimal addition and subtraction.
CONCLUSION
In this paper, an implementation of a design capable of performing either decimal addition or decimal subtraction of two n-digit BCD operands is proposed. The algorithm upon which the decimal adder is based, comprises an addition of the operands in DSD format followed by a DSD to BCD conversion. The proposed DSD representation makes the circuit simple and therefore, more flexible. Decimal subtraction is even easier to implement because it does not require the first step of the addition in DSD format. A parallel-prefix network is used to implement the borrow generating circuit of the conversion. Compared to recently reported designs, the proposed combined decimal adder/subtractor shows faster response time.
