    Options for Denormal Representation in Logarithmic Arithmetic

    International audienceEconomical hardware often uses a FiXed-point Number System (FXNS), whose constant absolute precision is acceptable for many signal-processing algorithms. The almost-constant relative precision of the more expensive Floating-Point (FP) number system simplifies design, for example, by eliminating worries about FXNS overflow because the range of FP is much larger than FXNS for the same wordsize; however, primitive FP introduces another problem: underflow. The conventional Signed Logarithmic Number System (SLNS) offers similar range and precision as FP with much better performance (in terms of power, speed and area) for multiplication, division, powers and roots. Moderate-precision addition in SLNS uses table lookup with properties similar to FP (including underflow). This paper proposes a new number system, called the Denormal LNS (DLNS), which is a hybrid of the properties of FXNS and SLNS. The inspiration for DLNS comes from the denormal (aka subnormal) numbers found in IEEE-754 (that provide better, gradual underflow) and the μ-law often used for speech encoding; the novel DLNS circuit here allows arithmetic to be performed directly on such encoded data. The proposed approach allows customizing the range in which gradual underflow occurs. A wide gradual underflow range acts like FXNS; a narrow one acts like SLNS. The DLNS approach is most affordable for applications involving addition, subtraction and multiplication by constants, such as the Fast Fourier Transform (FFT). Simulation of an FFT application illustrates a moderate gradual underflow decreasing bit-switching activity 15% compared to underflow-free SLNS, at the cost of increasing application error by 30%. DLNS reduces switching activity 5% to 20% more than an abruptly-underflowing SLNS with one-half the error. Synthesis shows the novel circuit primarily consists of traditional SLNS addition and subtraction tables, with additional datapaths that allow the novel ALU to act on conventional SLNS as well as DLNS and mixed data, for a worst-case area overhead of 26%. For similar range and precision, simulation of Taylor-series computations suggest subnormal values in DLNS behave similarly to those in the IEEE-754 FP standard. Unlike SLNS, DLNS approach is quite costly for general (non-constant) multiplication, division and roots. To overcome this difficulty, this paper proposes two variation called Denormal Mitchell LNS (DMLNS) and Denormal Offset Mitchell LNS (DOMLNS), in which the well-known Mitchell's method makes the cost of general multiplication, division and roots closer to that of SLNS. Taylor-series computations suggest subnormal values in DMLNS and DOMLNS also behave similarly to those in the IEEE-754 FP standard. Synthesis shows that DMLNS and DOMLNS respectively have average area overheads of 25% and 17% compared to an equivalent SLNS 5-operation unit.Les circuits intégrés économiques utilisent souvent des systèmes de numération en virgule fixe, dont la précision absolue constante est acceptable pour de nombreux algorithmes de traitement du signal. La précision relative quasi-constante du système virgule flottante, plus coûteux, simplifie la conception, en éliminant notamment le risque de débordement par le haut, la dynamique du flottant étant bien plus grande qu'en virgule fixe. Cependant, le flottant primitif induit un autre problème : le débordement par le bas (underflow). Le système logarithmique conventionnel (SLNS) offre une dynamique et une précision similaire au flottant, pour des performances bien meilleures (en termes de consommation, vitesse et surface) pour la multiplication, la division, les puissances et les racines. L'addition en précision moyenne en SLNS est basées sur des accès à des tables, avec des propriétés similaires au flottant (incluant le débordement par le bas). Cet article propose trois variations autour d'un nouveau système de représentation des nombres, respectivement appelées Denormal LNS (DLNS), Denormal Mitchell LNS (DMLNS) et Denormal Offset Mitchell LNS (DOMLNS), qui sont toutes des hybrides des propriétés de la virgule fixe et du SLNS. L'inspiration de D(OM)LNS vient des nombre dénormaux (ou sous-normaux) de la norme IEEE-754, qui fournissent un débordement par le bas graduel, et le codage µ-law utilisé dans la transmission de la voix. Le nouveau circuit DLNS proposé permet de calculer directement sur les données codées. L'approche proposée permet d'ajuster l'intervalle dans lequel le débordement progressif intervient. Une plage large se comporte comme la virgule fixe, une étroite comme le SLNS. L'approche DLNS est la plus économique pour les applications impliquant des additions, soustractions et multiplications par des constantes, telles que les transformées de Fourier rapides (FFT). Notre première mise en {\oe}uvre s'appuie sur les blocs de base existant d SLNS. Des synthèses montrent que le nouveau circuit est constitué principalement des tables d'additions SLNS traditionnelles, avec des chemins de données supplémentaires qui permettent à la nouvelle unité d'opérer sur des données SLNS, DLNS ou mixtes, pour un surcoût en surface de 26% dans le pire cas. Contrairement au SLNS, cette réalisation de DLNS reste coûteuse pour la multiplication générique, la division et les racines. Pour surmonter cette difficulté, cet article propose les variations DMLNS et DOMLNS, pour lesquelles la méthode de Mitchell rapproche le coût des multiplications génériques, divisions et racines de leurs équivalents en SLNS. Des calculs sur des séries de Taylor suggèrent que les valeurs sous-normales en DMLNS et DOMLNS se comportent également de manière similaires à celles de la norme IEEE-754. Des synthèses montrent que DMLNS et DOMLNS offrent des surcoûts respectifs de 25% et 17% par rapport à une unité SLNS à 5 opérations équivalente

    Improving the Hardware Performance of Arithmetic Circuits using Approximate Computing

    An application that can produce a useful result despite some level of computational error is said to be error resilient. Approximate computing can be applied to error resilient applications by intentionally introducing error to the computation in order to improve performance, and it has been shown that approximation is especially well-suited for application in arithmetic computing hardware. In this thesis, novel approximate arithmetic architectures are proposed for three different operations, namely multiplication, division, and the multiply accumulate (MAC) operation. For all designs, accuracy is evaluated in terms of mean relative error distance (MRED) and normalized mean error distance (NMED), while hardware performance is reported in terms of critical path delay, area, and power consumption. Three approximate Booth multipliers (ABM-M1, ABM-M2, ABM-M3) are designed in which two novel inexact partial product generators are used to reduce the dimensions of the partial product matrix. The proposed multipliers are compared to other state-of-the-art designs in terms of both accuracy and hardware performance, and are found to reduce power consumption by up to 56% when compared to the exact multiplier. The function of the multipliers is verified in several image processing applications. Two approximate restoring dividers (AXRD-M1, AXRD-M2) are proposed along with a novel inexact restoring divider cell. In the first divider, the conventional cells are replaced with the proposed inexact cells in several columns. The second divider computes only a subset of the trial subtractions, after which the divisor and partial remainder are rounded and encoded so that they may be used to estimate the remaining quotient bits. The proposed dividers are evaluated for accuracy and hardware performance alongside several benchmarking designs, and their function is verified using change detection and foreground extraction applications. An approximate MAC unit is presented in which the multiplication is implemented using a modified version of ABM-M3. The delay is reduced by using a fused architecture where the accumulator is summed as part of the multiplier compression. The accuracy and hardware savings of the MAC unit are measured against several works from the literature, and the design is utilized in a number of convolution operations