Search CORE

74 research outputs found

Application-Specific Number Representation

Author: Fu Haohuan
Fu Haohuan
Publication venue: Computing, Imperial College London
Publication date: 01/02/2009
Field of study

Reconfigurable devices, such as Field Programmable Gate Arrays (FPGAs), enable application- specific number representations. Well-known number formats include fixed-point, floating- point, logarithmic number system (LNS), and residue number system (RNS). Such different number representations lead to different arithmetic designs and error behaviours, thus produc- ing implementations with different performance, accuracy, and cost. To investigate the design options in number representations, the first part of this thesis presents a platform that enables automated exploration of the number representation design space. The second part of the thesis shows case studies that optimise the designs for area, latency or throughput from the perspective of number representations. Automated design space exploration in the first part addresses the following two major issues: ² Automation requires arithmetic unit generation. This thesis provides optimised arithmetic library generators for logarithmic and residue arithmetic units, which support a wide range of bit widths and achieve significant improvement over previous designs. ² Generation of arithmetic units requires specifying the bit widths for each variable. This thesis describes an automatic bit-width optimisation tool called R-Tool, which combines dynamic and static analysis methods, and supports different number systems (fixed-point, floating-point, and LNS numbers). Putting it all together, the second part explores the effects of application-specific number representation on practical benchmarks, such as radiative Monte Carlo simulation, and seismic imaging computations. Experimental results show that customising the number representations brings benefits to hardware implementations: by selecting a more appropriate number format, we can reduce the area cost by up to 73.5% and improve the throughput by 14.2% to 34.1%; by performing the bit-width optimisation, we can further reduce the area cost by 9.7% to 17.3%. On the performance side, hardware implementations with customised number formats achieve 5 to potentially over 40 times speedup over software implementations

Spiral - Imperial College Digital Repository

Options for Denormal Representation in Logarithmic Arithmetic

Author: Arnold Mark G.
Collange Caroline
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 06/01/2014
Field of study

International audienceEconomical hardware often uses a FiXed-point Number System (FXNS), whose constant absolute precision is acceptable for many signal-processing algorithms. The almost-constant relative precision of the more expensive Floating-Point (FP) number system simplifies design, for example, by eliminating worries about FXNS overflow because the range of FP is much larger than FXNS for the same wordsize; however, primitive FP introduces another problem: underflow. The conventional Signed Logarithmic Number System (SLNS) offers similar range and precision as FP with much better performance (in terms of power, speed and area) for multiplication, division, powers and roots. Moderate-precision addition in SLNS uses table lookup with properties similar to FP (including underflow). This paper proposes a new number system, called the Denormal LNS (DLNS), which is a hybrid of the properties of FXNS and SLNS. The inspiration for DLNS comes from the denormal (aka subnormal) numbers found in IEEE-754 (that provide better, gradual underflow) and the μ-law often used for speech encoding; the novel DLNS circuit here allows arithmetic to be performed directly on such encoded data. The proposed approach allows customizing the range in which gradual underflow occurs. A wide gradual underflow range acts like FXNS; a narrow one acts like SLNS. The DLNS approach is most affordable for applications involving addition, subtraction and multiplication by constants, such as the Fast Fourier Transform (FFT). Simulation of an FFT application illustrates a moderate gradual underflow decreasing bit-switching activity 15% compared to underflow-free SLNS, at the cost of increasing application error by 30%. DLNS reduces switching activity 5% to 20% more than an abruptly-underflowing SLNS with one-half the error. Synthesis shows the novel circuit primarily consists of traditional SLNS addition and subtraction tables, with additional datapaths that allow the novel ALU to act on conventional SLNS as well as DLNS and mixed data, for a worst-case area overhead of 26%. For similar range and precision, simulation of Taylor-series computations suggest subnormal values in DLNS behave similarly to those in the IEEE-754 FP standard. Unlike SLNS, DLNS approach is quite costly for general (non-constant) multiplication, division and roots. To overcome this difficulty, this paper proposes two variation called Denormal Mitchell LNS (DMLNS) and Denormal Offset Mitchell LNS (DOMLNS), in which the well-known Mitchell's method makes the cost of general multiplication, division and roots closer to that of SLNS. Taylor-series computations suggest subnormal values in DMLNS and DOMLNS also behave similarly to those in the IEEE-754 FP standard. Synthesis shows that DMLNS and DOMLNS respectively have average area overheads of 25% and 17% compared to an equivalent SLNS 5-operation unit.Les circuits intégrés économiques utilisent souvent des systèmes de numération en virgule fixe, dont la précision absolue constante est acceptable pour de nombreux algorithmes de traitement du signal. La précision relative quasi-constante du système virgule flottante, plus coûteux, simplifie la conception, en éliminant notamment le risque de débordement par le haut, la dynamique du flottant étant bien plus grande qu'en virgule fixe. Cependant, le flottant primitif induit un autre problème : le débordement par le bas (underflow). Le système logarithmique conventionnel (SLNS) offre une dynamique et une précision similaire au flottant, pour des performances bien meilleures (en termes de consommation, vitesse et surface) pour la multiplication, la division, les puissances et les racines. L'addition en précision moyenne en SLNS est basées sur des accès à des tables, avec des propriétés similaires au flottant (incluant le débordement par le bas). Cet article propose trois variations autour d'un nouveau système de représentation des nombres, respectivement appelées Denormal LNS (DLNS), Denormal Mitchell LNS (DMLNS) et Denormal Offset Mitchell LNS (DOMLNS), qui sont toutes des hybrides des propriétés de la virgule fixe et du SLNS. L'inspiration de D(OM)LNS vient des nombre dénormaux (ou sous-normaux) de la norme IEEE-754, qui fournissent un débordement par le bas graduel, et le codage µ-law utilisé dans la transmission de la voix. Le nouveau circuit DLNS proposé permet de calculer directement sur les données codées. L'approche proposée permet d'ajuster l'intervalle dans lequel le débordement progressif intervient. Une plage large se comporte comme la virgule fixe, une étroite comme le SLNS. L'approche DLNS est la plus économique pour les applications impliquant des additions, soustractions et multiplications par des constantes, telles que les transformées de Fourier rapides (FFT). Notre première mise en {\oe}uvre s'appuie sur les blocs de base existant d SLNS. Des synthèses montrent que le nouveau circuit est constitué principalement des tables d'additions SLNS traditionnelles, avec des chemins de données supplémentaires qui permettent à la nouvelle unité d'opérer sur des données SLNS, DLNS ou mixtes, pour un surcoût en surface de 26% dans le pire cas. Contrairement au SLNS, cette réalisation de DLNS reste coûteuse pour la multiplication générique, la division et les racines. Pour surmonter cette difficulté, cet article propose les variations DMLNS et DOMLNS, pour lesquelles la méthode de Mitchell rapproche le coût des multiplications génériques, divisions et racines de leurs équivalents en SLNS. Des calculs sur des séries de Taylor suggèrent que les valeurs sous-normales en DMLNS et DOMLNS se comportent également de manière similaires à celles de la norme IEEE-754. Des synthèses montrent que DMLNS et DOMLNS offrent des surcoûts respectifs de 25% et 17% par rapport à une unité SLNS à 5 opérations équivalente

HAL-CentraleSupelec

INRIA a CCSD electronic archive server

HAL-Rennes 1

Fast, area-efficient 32-bit LNS for computer arithmetic operations

Author: Che Ismail Rizalafande
Publication venue: Newcastle University
Publication date: 01/01/2012
Field of study

PhD ThesisThe logarithmic number system has been proposed as an alternative to floating-point. Multiplication, division and square-root operations are accomplished with fixedpoint arithmetic, but addition and subtraction are considerably more challenging. Recent work has demonstrated that these operations too can be done with similar speed and accuracy to their floating-point equivalents, but the necessary circuitry is complex. In particular, it is dominated by the need for large lookup tables for the storage of a non-linear function. This thesis describes the architectures required to implement a newly design approach for producing fast and area-efficient 32-bit LNS arithmetic unit. The designs are structured based on two different algorithms. At first, a new cotransformation procedure is introduced in the singularity region whilst performing subtractions in which the technique capable to generate less total storage than the cotransformation method in the previous LNS architecture. Secondly, improvement to an existing interpolation process is proposed, that also reduce the total tables to an extent that allows their easy synthesis in logic. Consequently, the total delays in the system can be significantly reduced. According to the comparison analysis with previous best LNS design and floating-point units, it is shown that the new LNS architecture capable to offer significantly better in speed while sustaining its accuracy within floating-point limit. In addition, its implementation is more economical than previous best LNS system and almost equivalent with existing floating-point arithmetic unit.University Malaysia Perlis: Ministry of Higher Education, Malaysia

Newcastle University eTheses

Diseño de un Microprocesador Logarítmico Orientado al Procesamiento Digital de Señales /

Author: Universidad Autónoma del Caribe: Centro de Investigaciones.
Publication venue: Universidad Autónoma del Caribe.
Publication date: 01/05/2016
Field of study

Diseñar un microprocesador logarítmico orientado al procesamiento digital de señales.El sistema numérico logarítmico (LNS por Logarithmic Number System), es un sistema numérico no convencional que combina la facilidad de implementación y precisión que proporciona el sistema numérico de punto fijo y el intervalo de representación numérico que ofrece el sistema numérico de punto flotante [1]. Debido a que el LNS basa su funcionamiento en las propiedades de las funciones logaritmo y antilogaritmo, este sistema numérico permite realizar operaciones de multiplicación, división y raíz cuadrada haciendo uso de sumas, restas y desplazamientos lógicos en el formato de punto fijo, reduciendo de esta forma la complejidad de la arquitectura hardware de dichas operaciones [2], [3]. A pesar de que en la actualidad existen diversas arquitecturas basadas en aritmética convencional que permiten la implementación en hardware de este tipo de operaciones aritméticas, esto sigue siendo un problema abierto, dado que las soluciones existentes presentan un elevado consumo de área, potencia y excesivo número de ciclos de reloj para llevar a cabo dichas operaciones [4]-[5].Universidad Autónoma del Carib

Comparison of logarithmic and floating-point number systems implemented on Xilinx Virtex-II field-programmable gate arrays

Author: Lee Barry Roland
Publication venue
Publication date
Field of study

The aim of this thesis is to compare the implementation of parameterisable LNS (logarithmic number system) and floating-point high dynamic range number systems on FPGA. The Virtex/Virtex-II range of FPGAs from Xilinx, which are the most popular FPGA technology, are used to implement the designs. The study focuses on using the low level primitives of the technology in an efficient way and so initially the design issues in implementing fixed-point operators are considered. The four basic operations of addition, multiplication, division and square root are considered. Carry- free adders, ripple-carry adders, parallel multipliers and digit recurrence division and square root are discussed. The floating-point operators use the word format and exceptions as described by the IEEE std-754. A dual-path adder implementation is described in detail, as are floating-point multiplier, divider and square root components. Results and comparisons with other works are given. The efficient implementation of function evaluation methods is considered next. An overview of current FPGA methods is given and a new piecewise polynomial implementation using the Taylor series is presented and compared with other designs in the literature. In the next section the LNS word format, accuracy and exceptions are described and two new LNS addition/subtraction function approximations are described. The algorithms for performing multiplication, division and powering in the LNS domain are also described and are compared with other designs in the open literature. Parameterisable conversion algorithms to convert to/from the fixed-point domain from/to the LNS and floating-point domain are described and implementation results given. In the next chapter MATLAB bit-true software models are given that have the exact functionality as the hardware models. The interfaces of the models are given and a serial communication system to perform low speed system tests is described. A comparison of the LNS and floating-point number systems in terms of area and delay is given. Different functions implemented in LNS and floating-point arithmetic are also compared and conclusions are drawn. The results show that when the LNS is implemented with a 6-bit or less characteristic it is superior to floating-point. However, for larger characteristic lengths the floating-point system is more efficient due to the delay and exponential area increase of the LNS addition operator. The LNS is beneficial for larger characteristics than 6-bits only for specialist applications that require a high portion of division, multiplication, square root, powering operations and few additions

Online Research @ Cardiff

Unrestricted Faithful Rounding is Good Enough for Some LNS Applications

Author: Colin Walter
Mark G. Arnold
Publication venue
Publication date
Field of study

We propose relaxing the restricted form of faithful rounding used in prior 32-bit Logarithmic Number System (LNS) implementations. Unrestricted faithful rounding yields a three- to six-fold savings in VLSI ROM size (or four- to six- fold savings in FGPA table size) with only modest increase in error. This can be acceptable for the DSP and multimedia applications in which the non-standard LNS is a candidate for adoption. Such applications are cost sensitive, and the tremendous cost reduction of the LNS model proposed here should argue very favourably for its adoption

CiteSeerX

Novel Architectures for Offloading and Accelerating Computations in Artificial Intelligence and Big Data

Author: Weber Lukas Max
Publication venue
Publication date: 19/10/2023
Field of study

Due to the end of Moore's Law and Dennard Scaling, performance gains in general-purpose architectures have significantly slowed in recent years. While raising the number of cores has been a viable approach for further performance increases, Amdahl's Law and its implications on parallelization also limit further performance gains. Consequently, research has shifted towards different approaches, including domain-specific custom architectures tailored to specific workloads. This has led to a new golden age for computer architecture, as noted in the Turing Award Lecture by Hennessy and Patterson, which has spawned several new architectures and architectural advances specifically targeted at highly current workloads, including Machine Learning. This thesis introduces a hierarchy of architectural improvements ranging from minor incremental changes, such as High-Bandwidth Memory, to more complex architectural extensions that offload workloads from the general-purpose CPU towards more specialized accelerators. Finally, we introduce novel architectural paradigms, namely Near-Data or In-Network Processing, as the most complex architectural improvements. This cumulative dissertation then investigates several architectural improvements to accelerate Sum-Product Networks, a novel Machine Learning approach from the class of Probabilistic Graphical Models. Furthermore, we use these improvements as case studies to discuss the impact of novel architectures, showing that minor and major architectural changes can significantly increase performance in Machine Learning applications. In addition, this thesis presents recent works on Near-Data Processing, which introduces Smart Storage Devices as a novel architectural paradigm that is especially interesting in the context of Big Data. We discuss how Near-Data Processing can be applied to improve performance in different database settings by offloading database operations to smart storage devices. Offloading data-reductive operations, such as selections, reduces the amount of data transferred, thus improving performance and alleviating bandwidth-related bottlenecks. Using Near-Data Processing as a use-case, we also discuss how Machine Learning approaches, like Sum-Product Networks, can improve novel architectures. Specifically, we introduce an approach for offloading Cardinality Estimation using Sum-Product Networks that could enable more intelligent decision-making in smart storage devices. Overall, we show that Machine Learning can benefit from developing novel architectures while also showing that Machine Learning can be applied to improve the applications of novel architectures

TUbiblio

tuprints