197 research outputs found

    Truncated Binary Multipliers with minimum Mean Square Error: analytical characterization, circuit implementation and applications

    Get PDF
    In the wireless multimedia word, DSP systems are ubiquitous. DSP algorithms are computationally intensive and test the limits of battery life in portable device such as cell phones, hearing aids, MP3 players, digital video recorders and so on. Multiplication and squaring are the main operation in many signal processing algorithms (filtering, convolution, FFT, DCT, euclidean distance etc.), hence efficient parallel multipliers are desirable. A full-width digital nxn bits multiplier computes the 2n bits output as a weighted sum of partial products. A multiplier with the output represented on n bits output is useful, as example, in DSP datapaths which saves the output in the same n bits registers of the input. Note that the truncated multipliers are useful not only for DSP but also for digital, computational intensive, ASICs where the bit-widths at the output of the arithmetic blocks are chosen on the basis of system-related accuracy issues. Hence 2n bits of precision at the multiplier output are very often more than required. A truncated multiplier is an nxn multiplier with n bits output. Since in a truncated multiplier the n less-significant bits of the full-width product are discarded, some of the partial products are removed and replaced by a suitable compensation function, to trade-off accuracy with hardware cost. Several techniques have been proposed in the Literature following this basic idea. The difference between the various circuits is in the choice and the implementation of the compensation circuit. The correction techniques proposed in the Literature are obtained through exhaustive search. This means that the results are only available for small n values and that the proposed approach are not extendable to greater bit widths. Furthermore the analytical characterization of the error is not possible. In this dissertation an innovative solution for the design and characterization of truncated multipliers is presented. The proposed circuits are based on the analytical calculation of the error of the truncated multiplier. This approach allows to have the description of a multiplier characterized by a minimum mean square error which gives a fast and low power VLSI implementation. Furthermore the analytical approach yields to a closed form expression of the mean square error and maximum absolute error for the proposed truncated multipliers. In this way the a priori knowledge of the output error is available. The errors are known for every bit width of the multiplier and it is also possible to decide, for a given bit width, which correction circuit has to be used in order to obtain a certain error. This analytical relation between the error and the parameters of hardware implementation is extremely important for the digital designer, since now it is possible to select the suitable implementation as a function of the desired accuracy. Proposed truncated multipliers overcome the previously proposed truncated multipliers since provide lower error, lower power dissipation, lower area occupation and also provide higher working frequency. The circuits are also easily implemented and allow an automatic HDL description as a function of bit width and desired error. The complete description of the errors for the truncated multipliers allows the use of these circuits as building blocks for more complex systems. It will be shown how the proposed multiplier can be used to design low area occupation FIR filters and an efficient PI temperature controller

    Design of Energy-Efficient Approximate Arithmetic Circuits

    Get PDF
    Energy consumption has become one of the most critical design challenges in integrated circuit design. Arithmetic computing circuits, in particular array-based arithmetic computing circuits such as adders, multipliers, squarers, have been widely used. In many cases, array-based arithmetic computing circuits consume a significant amount of energy in a chip design. Hence, reduction of energy consumption of array-based arithmetic computing circuits is an important design consideration. To this end, designing low-power arithmetic circuits by intelligently trading off processing precision for energy saving in error-resilient applications such as DSP, machine learning and neuromorphic circuits provides a promising solution to the energy dissipation challenge of such systems. To solve the chip’s energy problem, especially for those applications with inherent error resilience, array-based approximate arithmetic computing (AAAC) circuits that produce errors while having improved energy efficiency have been proposed. Specifically, a number of approximate adders, multipliers and squarers have been presented in the literature. However, the chief limitation of these designs is their un-optimized processing accuracy, which is largely due to the current lack of systemic guidance for array-based AAAC circuit design pertaining to optimal tradeoffs between error, energy and area overhead. Therefore, in this research, our first contribution is to propose a general model for approximate array-based approximate arithmetic computing to guide the minimization of processing error. As part of this model, the Error Compensation Unit (ECU) is identified as a key building block for a wide range of AAAC circuits. We develop theoretical analysis geared towards addressing two critical design problems of the ECU, namely, determination of optimal error compensation values and identification of the optimal error compensation scheme. We demonstrate how this general AAAC model can be leveraged to derive practical design insights that may lead to optimal tradeoffs between accuracy, energy dissipation and area overhead. To further minimize energy consumption, delay and area of AAAC circuits, we perform ECU logic simplification by introducing don't cares. By applying the proposed model, we propose an approximate 16x16 fixed-width Booth multiplier that consumes 44.85% and 28.33% less energy and area compared with theoretically the most accurate fixed-width Booth multiplier when implemented using a 90nm CMOS standard cell library. Furthermore, it reduces average error, max error and mean square error by 11.11%, 28.11% and 25.00%, respectively, when compared with the best reported approximate Booth multiplier and outperforms the best reported approximate design significantly by 19.10% in terms of the energy-delay-mean square error product (EDE_(ms)). Using the same approach, significant energy consumption, area and error reduction is achieved for a squarer unit, with more than 20.00% EDE_(ms) reduction over existing fixed-width squarer designs. To further reduce error and cost by utilizing extra signatures and don't cares, we demonstrate a 16-bit fixed-width squarer that improves the energy-delay-max error (EDE_(max)) by 15.81%

    Hardware Implementation of the Logarithm Function using Improved Parabolic Synthesis

    Get PDF
    This thesis presents a design that approximates the fractional part of the based two logarithm function by using Improved Parabolic Synthesis including its CMOS VLSI implementations. Improved Parabolic Synthesis is a novel methodology in favor of implementing unary functions e.g. trigonometric, logarithm, square root etc. in hardware. It is an evolved approach from Parabolic Synthesis by combining it with Second-Degree Interpolation. In the thesis, the design explores a simple and parallel architecture for fast timing and optimizes wordlengths in computing stages for a small design. The error behavior of the design is described and char- acterized to meet the desired error metrics. This implementation is compared to other approaches e.g. Parabolic Synthesis and CORDIC using 65nm standard cell libraries and it is proved to have better performance in terms of smaller chip area, lower dynamic power, and shorter critical path

    Floating Point Calculation of the Cube Function on FPGAs

    Get PDF
    © 2023 IEEE. This version of the paper has been accepted for publication. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works. The final published paper is available online at: https://doi.org/10.1109/TPDS.2022.3220039[Abstract]: Specialized arithmetic units allow fast and efficient computation of lesser used mathematical functions. The overall impact of those units would be negligible in a general purpose processor, as added circuitry makes chips more complex despite most software would seldom make use of it. On the opposite side, custom computing machines are built for a specific task, and they can always benefit from specialized units if they are available. In this work, floating point architectures are proposed for computing the cube on Intel and Xilinx FPGAs. Those implementations reduce the cost and latency compared to using simple floating point multiplications and squarers.This work was supported by the Ministry of Science and Innovation of Spain (PID2019-104184RB-I00 / AEI / 10.13039/501100011033), and by Xunta de Galicia and FEDER funds of the EU (Centro de Investigaci´on de Galicia accreditation 2019–2022, ref. ED431G 2019/01; Consolidation Program of Competitive Reference Groups, ref. ED431C 2021/30).Xunta de Galicia; ED431G 2019/01Xunta de Galicia; ED431C 2021/3

    Distributed-parameter state-variable techniques applied to communication over dispersive channels.

    Get PDF
    Massachusetts Institute of Technology. Dept. of Electrical Engineering. Thesis. 1969. Sc.D.MICROFICHE COPY ALSO AVAILABLE IN BARKER ENGINEERING LIBRARY.Vita.Bibliography: leaves 275-277.Sc.D

    Optimized linear, quadratic and cubic interpolators for elementary function hardware implementations

    Get PDF
    This paper presents a method for designing linear, quadratic and cubic interpolators that compute elementary functions using truncated multipliers, squarers and cubers. Initial coefficient values are obtained using a Chebyshev series approximation. A direct search algorithm is then used to optimize the quantized coefficient values to meet a user-specified error constraint. The algorithm minimizes coefficient lengths to reduce lookup table requirements, maximizes the number of truncated columns to reduce the area, delay and power of the arithmetic units, and minimizes the maximum absolute error of the interpolator output. The method can be used to design interpolators to approximate any function to a user-specified accuracy, up to and beyond 53-bits of precision (e.g., IEEE double precision significand). Linear, quadratic and cubic interpolator designs that approximate reciprocal, square root, reciprocal square root and sine are presented and analyzed. Area, delay and power estimates are given for 16, 24 and 32-bit interpolators that compute the reciprocal function, targeting a 65 nm CMOS technology from IBM. Results indicate the proposed method uses smaller arithmetic units and has reduced lookup table sizes compared to previously proposed methods. The method can be used to optimize coefficients in other systems while accounting for coefficient quantization as well as truncation and rounding effects of multiple arithmetic units.Peer reviewedElectrical and Computer Engineerin

    Adaptive and hybrid schemes for efficient parallel squaring and cubing units

    Get PDF
    Squaring (X2) and cubing (X3) units are special operations of multiplication used in many applications, such as image compression, equalization, decoding and demodulation, 3D graphics, scientific computing, artificial neural networks, logarithmic number system, and multimedia application. They can also be an efficient way to compute other basic functions. Therefore, improving their performances is a goal for many researchers. This dissertation will discuss modification to algorithms to compute parallel squaring and cubing units in both signed and unsigned representation. After that, truncated technique is applied to improve their performance. Each unit is modeled and estimated to obtain its area, delay by using linear evaluation model. A C program was written to generate Hardware Description Language files for each unit. These units are simulated and verified in simulation. Moreover, area, delay, and power consumption are calculated for each unit and compared with those ones in previous approaches for both Virtex 5 Xilinx FPGA and IBM 65nm ASIC technologies

    Aerospace guidance computer analysis and synthesis,

    Get PDF
    Bibliography: p. 38.AFAL-TR-68-69Contract AF-33(657)-11311 M.I.T. DSR Project no.79891by Robert W. Roig, John O. Silvey, John E. Ward et al

    Performance analysis of symbol timing recovery circuits employed in digital communications systems

    Get PDF
    An analytical approach is presented for the jitter performance of a timing recovery circuit consisting of a prefilter, a zero-memory nonlinear device, and a narrow band postfilter tuned to the pulse repetition frequency. Assuming first a squarer type of nonlinearity, analytical expressions for the rms jitter in the timing wave are obtained as a function of the pre and post filtering characteristics. These expressions are suitable for judging the case where the baseband signal is bandlimited. Also for some specific examples, the jitter performance of this kind of STR circuit is evaluated. Secondly, a general type of nonlinerity is assumed, and the rms jitter expressions are obtained in terms of the higher order moments of the input signal. The higher moments themselves are shown to be computed iteratively. Finally some numerical results are obtained for the fourth order nonlinearity and the rms jitter curves are plotted as a function of the excess bandwidth factor γ, for several values of the quality factor Q of the postfilter

    A digital signal processor based optical position sensor and its application to flexible beam control

    Get PDF
    A Digital Signal Processor (DSP) based optical position sensor was developed. The sensor system consists of the following components: 1) analog electronics, 2) the DSP based synchronous demodulation software, 3) PC based interface software which samples and saves the data, and 4) PC based control codes for a flexible beam. experiment. The ability of the system to determine the distance from the optical sensor to the power modulated light source was assessed by the following tests: 1) a stationary drift test to evaluate the system\u27s noise, 2) a short-range test to determine the resolution of the optical sensor over a 25mm range and, 3) a long-range test to evaluate the ability of the system to predict the location of the optical sensor over a 600mm range. It was found that the resolution of the system is approximately 0.5mm for the short range test and 5mm for the long range test. Finally, the sensor was deployed for the position feedback of a flexible beam experiment. Performance indices used to evaluate the response of the system were: 1) the sum of the squared position error, 2) the final steady state position error of the end of the flexible beam, and 3) the 5% settling time of the flexible beam. A number of control laws were evaluated and it was determined that a variable PID controller produced the best overall performance. The system can consistently position the end of the flexible beam from a +1-20cm to within 5mm of the command position in approximately 8 seconds with a properly tuned controller
    corecore