606 research outputs found

    Design of Energy-Efficient Approximate Arithmetic Circuits

    Get PDF
    Energy consumption has become one of the most critical design challenges in integrated circuit design. Arithmetic computing circuits, in particular array-based arithmetic computing circuits such as adders, multipliers, squarers, have been widely used. In many cases, array-based arithmetic computing circuits consume a significant amount of energy in a chip design. Hence, reduction of energy consumption of array-based arithmetic computing circuits is an important design consideration. To this end, designing low-power arithmetic circuits by intelligently trading off processing precision for energy saving in error-resilient applications such as DSP, machine learning and neuromorphic circuits provides a promising solution to the energy dissipation challenge of such systems. To solve the chip’s energy problem, especially for those applications with inherent error resilience, array-based approximate arithmetic computing (AAAC) circuits that produce errors while having improved energy efficiency have been proposed. Specifically, a number of approximate adders, multipliers and squarers have been presented in the literature. However, the chief limitation of these designs is their un-optimized processing accuracy, which is largely due to the current lack of systemic guidance for array-based AAAC circuit design pertaining to optimal tradeoffs between error, energy and area overhead. Therefore, in this research, our first contribution is to propose a general model for approximate array-based approximate arithmetic computing to guide the minimization of processing error. As part of this model, the Error Compensation Unit (ECU) is identified as a key building block for a wide range of AAAC circuits. We develop theoretical analysis geared towards addressing two critical design problems of the ECU, namely, determination of optimal error compensation values and identification of the optimal error compensation scheme. We demonstrate how this general AAAC model can be leveraged to derive practical design insights that may lead to optimal tradeoffs between accuracy, energy dissipation and area overhead. To further minimize energy consumption, delay and area of AAAC circuits, we perform ECU logic simplification by introducing don't cares. By applying the proposed model, we propose an approximate 16x16 fixed-width Booth multiplier that consumes 44.85% and 28.33% less energy and area compared with theoretically the most accurate fixed-width Booth multiplier when implemented using a 90nm CMOS standard cell library. Furthermore, it reduces average error, max error and mean square error by 11.11%, 28.11% and 25.00%, respectively, when compared with the best reported approximate Booth multiplier and outperforms the best reported approximate design significantly by 19.10% in terms of the energy-delay-mean square error product (EDE_(ms)). Using the same approach, significant energy consumption, area and error reduction is achieved for a squarer unit, with more than 20.00% EDE_(ms) reduction over existing fixed-width squarer designs. To further reduce error and cost by utilizing extra signatures and don't cares, we demonstrate a 16-bit fixed-width squarer that improves the energy-delay-max error (EDE_(max)) by 15.81%

    Square-rich fixed point polynomial evaluation on FPGAs

    Get PDF
    Polynomial evaluation is important across a wide range of application domains, so significant work has been done on accelerating its computation. The conventional algorithm, referred to as Horner's rule, involves the least number of steps but can lead to increased latency due to serial computation. Parallel evaluation algorithms such as Estrin's method have shorter latency than Horner's rule, but achieve this at the expense of large hardware overhead. This paper presents an efficient polynomial evaluation algorithm, which reforms the evaluation process to include an increased number of squaring steps. By using a squarer design that is more efficient than general multiplication, this can result in polynomial evaluation with a 57.9% latency reduction over Horner's rule and 14.6% over Estrin's method, while consuming less area than Horner's rule, when implemented on a Xilinx Virtex 6 FPGA. When applied in fixed point function evaluation, where precision requirements limit the rounding of operands, it still achieves a 52.4% performance gain compared to Horner's rule with only a 4% area overhead in evaluating 5th degree polynomials

    Truncated Binary Multipliers with minimum Mean Square Error: analytical characterization, circuit implementation and applications

    Get PDF
    In the wireless multimedia word, DSP systems are ubiquitous. DSP algorithms are computationally intensive and test the limits of battery life in portable device such as cell phones, hearing aids, MP3 players, digital video recorders and so on. Multiplication and squaring are the main operation in many signal processing algorithms (filtering, convolution, FFT, DCT, euclidean distance etc.), hence efficient parallel multipliers are desirable. A full-width digital nxn bits multiplier computes the 2n bits output as a weighted sum of partial products. A multiplier with the output represented on n bits output is useful, as example, in DSP datapaths which saves the output in the same n bits registers of the input. Note that the truncated multipliers are useful not only for DSP but also for digital, computational intensive, ASICs where the bit-widths at the output of the arithmetic blocks are chosen on the basis of system-related accuracy issues. Hence 2n bits of precision at the multiplier output are very often more than required. A truncated multiplier is an nxn multiplier with n bits output. Since in a truncated multiplier the n less-significant bits of the full-width product are discarded, some of the partial products are removed and replaced by a suitable compensation function, to trade-off accuracy with hardware cost. Several techniques have been proposed in the Literature following this basic idea. The difference between the various circuits is in the choice and the implementation of the compensation circuit. The correction techniques proposed in the Literature are obtained through exhaustive search. This means that the results are only available for small n values and that the proposed approach are not extendable to greater bit widths. Furthermore the analytical characterization of the error is not possible. In this dissertation an innovative solution for the design and characterization of truncated multipliers is presented. The proposed circuits are based on the analytical calculation of the error of the truncated multiplier. This approach allows to have the description of a multiplier characterized by a minimum mean square error which gives a fast and low power VLSI implementation. Furthermore the analytical approach yields to a closed form expression of the mean square error and maximum absolute error for the proposed truncated multipliers. In this way the a priori knowledge of the output error is available. The errors are known for every bit width of the multiplier and it is also possible to decide, for a given bit width, which correction circuit has to be used in order to obtain a certain error. This analytical relation between the error and the parameters of hardware implementation is extremely important for the digital designer, since now it is possible to select the suitable implementation as a function of the desired accuracy. Proposed truncated multipliers overcome the previously proposed truncated multipliers since provide lower error, lower power dissipation, lower area occupation and also provide higher working frequency. The circuits are also easily implemented and allow an automatic HDL description as a function of bit width and desired error. The complete description of the errors for the truncated multipliers allows the use of these circuits as building blocks for more complex systems. It will be shown how the proposed multiplier can be used to design low area occupation FIR filters and an efficient PI temperature controller

    Phase error statistics of a phase-locked loop synchronized direct detection optical PPM communication system

    Get PDF
    Receiver timing synchronization of an optical Pulse-Position Modulation (PPM) communication system can be achieved using a phased-locked loop (PLL), provided the photodetector output is suitably processed. The magnitude of the PLL phase error is a good indicator of the timing error at the receiver decoder. The statistics of the phase error are investigated while varying several key system parameters such as PPM order, signal and background strengths, and PPL bandwidth. A practical optical communication system utilizing a laser diode transmitter and an avalanche photodiode in the receiver is described, and the sampled phase error data are presented. A linear regression analysis is applied to the data to obtain estimates of the relational constants involving the phase error variance and incident signal power

    A VHDL-AMS Simulation Environment for an UWB Impulse Radio Transceiver

    Get PDF
    Ultra-Wide-Band (UWB) communication based on the impulse radio paradigm is becoming increasingly popular. According to the IEEE 802.15 WPAN Low Rate Alternative PHY Task Group 4a, UWB will play a major role in localization applications, due to the high time resolution of UWB signals which allow accurate indirect measurements of distance between transceivers. Key for the successful implementation of UWB transceivers is the level of integration that will be reached, for which a simulation environment that helps take appropriate design decisions is crucial. Owing to this motivation, in this paper we propose a multiresolution UWB simulation environment based on the VHDL-AMS hardware description language, along with a proper methodology which helps tackle the complexity of designing a mixed-signal UWB System-on-Chip. We applied the methodology and used the simulation environment for the specification and design of an UWB transceiver based on the energy detection principle. As a by-product, simulation results show the effectiveness of UWB in the so-called ranging application, that is the accurate evaluation of the distance between a couple of transceivers using the two-way-ranging metho

    An Investigation into Neuromorphic ICs using Memristor-CMOS Hybrid Circuits

    Full text link
    The memristance of a memristor depends on the amount of charge flowing through it and when current stops flowing through it, it remembers the state. Thus, memristors are extremely suited for implementation of memory units. Memristors find great application in neuromorphic circuits as it is possible to couple memory and processing, compared to traditional Von-Neumann digital architectures where memory and processing are separate. Neural networks have a layered structure where information passes from one layer to another and each of these layers have the possibility of a high degree of parallelism. CMOS-Memristor based neural network accelerators provide a method of speeding up neural networks by making use of this parallelism and analog computation. In this project we have conducted an initial investigation into the current state of the art implementation of memristor based programming circuits. Various memristor programming circuits and basic neuromorphic circuits have been simulated. The next phase of our project revolved around designing basic building blocks which can be used to design neural networks. A memristor bridge based synaptic weighting block, a operational transconductor based summing block were initially designed. We then designed activation function blocks which are used to introduce controlled non-linearity. Blocks for a basic rectified linear unit and a novel implementation for tan-hyperbolic function have been proposed. An artificial neural network has been designed using these blocks to validate and test their performance. We have also used these fundamental blocks to design basic layers of Convolutional Neural Networks. Convolutional Neural Networks are heavily used in image processing applications. The core convolutional block has been designed and it has been used as an image processing kernel to test its performance.Comment: Bachelor's thesi

    A digital signal processor based optical position sensor and its application to flexible beam control

    Get PDF
    A Digital Signal Processor (DSP) based optical position sensor was developed. The sensor system consists of the following components: 1) analog electronics, 2) the DSP based synchronous demodulation software, 3) PC based interface software which samples and saves the data, and 4) PC based control codes for a flexible beam. experiment. The ability of the system to determine the distance from the optical sensor to the power modulated light source was assessed by the following tests: 1) a stationary drift test to evaluate the system\u27s noise, 2) a short-range test to determine the resolution of the optical sensor over a 25mm range and, 3) a long-range test to evaluate the ability of the system to predict the location of the optical sensor over a 600mm range. It was found that the resolution of the system is approximately 0.5mm for the short range test and 5mm for the long range test. Finally, the sensor was deployed for the position feedback of a flexible beam experiment. Performance indices used to evaluate the response of the system were: 1) the sum of the squared position error, 2) the final steady state position error of the end of the flexible beam, and 3) the 5% settling time of the flexible beam. A number of control laws were evaluated and it was determined that a variable PID controller produced the best overall performance. The system can consistently position the end of the flexible beam from a +1-20cm to within 5mm of the command position in approximately 8 seconds with a properly tuned controller

    Multidimensional Pareto optimization of touchscreen keyboards for speed, familiarity and improved spell checking

    Get PDF
    The paper presents a new optimization technique for keyboard layouts based on Pareto front optimization. We used this multifactorial technique to create two new touchscreen phone keyboard layouts based on three design metrics: minimizing finger travel distance in order to maximize text entry speed, a new metric to maximize the quality of spell correction quality by minimizing neighbouring key ambiguity, and maximizing familiarity through a similarity function with the standard Qwerty layout. The paper describes the optimization process and resulting layouts for a standard trapezoid shaped keyboard and a more rectangular layout. Fitts' law modelling shows a predicted 11% improvement in entry speed without taking into account the significantly improved error correction potential and the subsequent effect on speed. In initial user tests typing speed dropped from approx. 21wpm with Qwerty to 13wpm (64%) on first use of our layout but recovered to 18wpm (85%) within four short trial sessions, and was still improving. NASA TLX forms showed no significant difference on load between Qwerty and our new layout use in the fourth session. Together we believe this shows the new layouts are faster and can be quickly adopted by users

    Dual-band FSK receiver and building block design for UWB impulse radio

    Get PDF
    Master'sMASTER OF ENGINEERIN
    corecore