We present a hardware implementation of the log-add algorithm, being a simple method of computing ln(A + B) given ln(A) and ln(B), as used in speech recognition.
Introduction
As part of the speech recognition process [1] , we are required to compute probabilities based on Gaussian mixtures. Each mixture has components which are computed in the log domain, but which must be added in the normal domain.
While we could use large look-up tables [2] or CORDIC (co-ordinate rotation digital computer) [3] to convert between domains, a convenient algorithm exists for this specific problem [4] . It removes the need to perform a conversion at all, instead relying on a look-up table significantly smaller than that of the logarithm or exponential operations, along with some simple arithmetic computations -and hence well suited for implementation in hardware.
Accordingly, in this Letter we describe the theory behind this algorithm, and give details of our novel implementation on a field-programmable gate array (FPGA), which forms part of a hardware speech recognition system. This implementation requires fewer resources and has a lower latency than the alternatives.
Theory
Given two values ln(A) and ln(B) for which we would like to compute ln(A+B):
(1)
To compute the result, we work out ln(B/A), which is simply equal to ln(B) -ln(A), and then use a look-up table to map that value to ln(1+B/A). Since the values in this 
Data representation
The calculations for speech recognition are best performed in the negative log domain, as it reduces the many multiplications associated with the process to additions, for which hardware is better suited.
Recognition is a statistical process, and so the values used are probabilities. If a probability A is converted to the negative log domain by computing -ln(A), a 16-bit log-domain integer value would represent the probabilities from 3 × 10 -28462 to 1, a range which is far too broad for our purposes. A more reasonable range is 10 -12 to 1, which can be achieved by computing -Kln(A), where K equals -2371.8. This approach is used by the HTK speech development toolkit [5] , which was used to generate the speech models that we used, and to verify the results of our system.
We found that for the more complex speech models, 16 bits was not sufficient to maintain accuracy, and so 24-bit values were used instead, resulting in the range of probabilities being 10 -3072 to 1. The value of K was kept constant in order to maintain compatibility with HTK.
Implementation
As a hardware implementation, this algorithm seems ideal, since it relies on functions easily realisable on a chip. But in order to give it a significant advantage over the alternative methods described below, the look-up table needs to be kept as small as possible without adversely affecting accuracy.
Hence the first step of the implementation was to analyse the data to be used in the table. Software was written to perform the calculations directly, and produce a full set of values. These were then inspected in order to identify patterns which could be used to produce a more efficient design. This involved keeping to a minimum both the number of entries in the look-up table, and the amount of additional logic required.
Inspection of this data revealed that when Kln(B/A) is 0, Kln(1+B/A) is -1644. Since all of the outputs are negative, we ignore the sign at this stage. So taking the outputs as positive numbers, as the input value increases, the output decreases, initially at the rate of 1 for every 2 increments of the input, and then more and more slowly. The first consequence of this was that we could ignore the least significant bit (LSB) of the input, as it did not affect the output by more than ± 1. The other was that for all values of the input above 16,384, the output changed only twice, decreasing from 2 to 1 at 17,471, and then to 0 at 20,077.
The result of this was that a table 8,192 entries deep and 11 bits wide (a total of 11 Kb) was sufficient to represent all values of the input from 0 to 16,384 (discarding the LSB), with the two values above this handled using a couple of comparators, as shown in Fig. 1 .
The only other processing required was a comparator for the two inputs Kln(A) and
Kln(B), a subtractor to compute their difference, and another subtractor to subtract the smaller (i.e. more negative) input from the output of the look-up table (which is equivalent to adding the smaller input to the negative of the value from the look-up table, required because the numbers stored in the look-up table are positive, the minus sign having been discarded). The architecture of the log-add block is shown in Fig. 2 .
Domain conversion
The alternative to this algorithm is to convert the data from the log domain to the normal domain (i.e. take the exponential), perform the summation, and then take the log of the result. 
Conclusion
We have shown one way in which the log-add algorithm can be implemented in hardware, in this case as part of a speech recognition system. We have shown that this method requires significantly less data storage than domain conversion based on lookup tables, while having a shorter latency than CORDIC, and in both cases avoiding the problem of how best to represent data in the normal domain.
While the size of the look-up table and the cutoff values are specific to our implementation, the nature of the data ensures that optimisations can be made in the manner described.
Using an existing theory, our design demonstrates that what would otherwise be a complex calculation can be reduced to elements well-suited to hardware, with no significant loss of accuracy. 
