2,497 research outputs found

    Truncated Binary Multipliers with minimum Mean Square Error: analytical characterization, circuit implementation and applications

    Get PDF
    In the wireless multimedia word, DSP systems are ubiquitous. DSP algorithms are computationally intensive and test the limits of battery life in portable device such as cell phones, hearing aids, MP3 players, digital video recorders and so on. Multiplication and squaring are the main operation in many signal processing algorithms (filtering, convolution, FFT, DCT, euclidean distance etc.), hence efficient parallel multipliers are desirable. A full-width digital nxn bits multiplier computes the 2n bits output as a weighted sum of partial products. A multiplier with the output represented on n bits output is useful, as example, in DSP datapaths which saves the output in the same n bits registers of the input. Note that the truncated multipliers are useful not only for DSP but also for digital, computational intensive, ASICs where the bit-widths at the output of the arithmetic blocks are chosen on the basis of system-related accuracy issues. Hence 2n bits of precision at the multiplier output are very often more than required. A truncated multiplier is an nxn multiplier with n bits output. Since in a truncated multiplier the n less-significant bits of the full-width product are discarded, some of the partial products are removed and replaced by a suitable compensation function, to trade-off accuracy with hardware cost. Several techniques have been proposed in the Literature following this basic idea. The difference between the various circuits is in the choice and the implementation of the compensation circuit. The correction techniques proposed in the Literature are obtained through exhaustive search. This means that the results are only available for small n values and that the proposed approach are not extendable to greater bit widths. Furthermore the analytical characterization of the error is not possible. In this dissertation an innovative solution for the design and characterization of truncated multipliers is presented. The proposed circuits are based on the analytical calculation of the error of the truncated multiplier. This approach allows to have the description of a multiplier characterized by a minimum mean square error which gives a fast and low power VLSI implementation. Furthermore the analytical approach yields to a closed form expression of the mean square error and maximum absolute error for the proposed truncated multipliers. In this way the a priori knowledge of the output error is available. The errors are known for every bit width of the multiplier and it is also possible to decide, for a given bit width, which correction circuit has to be used in order to obtain a certain error. This analytical relation between the error and the parameters of hardware implementation is extremely important for the digital designer, since now it is possible to select the suitable implementation as a function of the desired accuracy. Proposed truncated multipliers overcome the previously proposed truncated multipliers since provide lower error, lower power dissipation, lower area occupation and also provide higher working frequency. The circuits are also easily implemented and allow an automatic HDL description as a function of bit width and desired error. The complete description of the errors for the truncated multipliers allows the use of these circuits as building blocks for more complex systems. It will be shown how the proposed multiplier can be used to design low area occupation FIR filters and an efficient PI temperature controller

    FPGA Implementation of Gaussian Mixture Model Algorithm for 47 fps Segmentation of 1080p Video

    Get PDF
    Circuits and systems able to process high quality video in real time are fundamental in nowadays imaging systems. The circuit proposed in the paper, aimed at the robust identification of the background in video streams, implements the improved formulation of the Gaussian Mixture Model (GMM) algorithm that is included in the OpenCV library. An innovative, hardware oriented, formulation of the GMM equations, the use of truncated binary multipliers, and ROM compression techniques allow reduced hardware complexity and increased processing capability. The proposed circuit has been designed having commercial FPGA devices as target and provides speed and logic resources occupation that overcome previously proposed implementations. The circuit, when implemented on Virtex6 or StratixIV, processes more than 45 frame per second in 1080p format and uses few percent of FPGA logic resources

    A fast approach for overcomplete sparse decomposition based on smoothed L0 norm

    Full text link
    In this paper, a fast algorithm for overcomplete sparse decomposition, called SL0, is proposed. The algorithm is essentially a method for obtaining sparse solutions of underdetermined systems of linear equations, and its applications include underdetermined Sparse Component Analysis (SCA), atomic decomposition on overcomplete dictionaries, compressed sensing, and decoding real field codes. Contrary to previous methods, which usually solve this problem by minimizing the L1 norm using Linear Programming (LP) techniques, our algorithm tries to directly minimize the L0 norm. It is experimentally shown that the proposed algorithm is about two to three orders of magnitude faster than the state-of-the-art interior-point LP solvers, while providing the same (or better) accuracy.Comment: Accepted in IEEE Transactions on Signal Processing. For MATLAB codes, see (http://ee.sharif.ir/~SLzero). File replaced, because Fig. 5 was missing erroneousl

    High-level power optimisation for Digital Signal Processing in Recon gurable Logic

    No full text
    This thesis is concerned with the optimisation of Digital Signal Processing (DSP) algorithm implementations on recon gurable hardware via the selection of appropriate word-lengths for the signals in these algorithms, in order to minimise system power consumption. Whilst existing word-length optimisation work has concentrated on the minimisation of the area of algorithm implementations, this work introduces the rst set of power consumption models that can be evaluated quickly enough to be used within the search of the enormous design space of multiple word-length optimisation problems. These models achieve their speed by estimating both the power consumed within the arithmetic components of an algorithm and the power in the routing wires that connect these components, using only a high-level description of the algorithm itself. Trading o a small reduction in power model accuracy for a large increase in speed is one of the major contributions of this thesis. In addition to the work on power consumption modelling, this thesis also develops a new technique for selecting the appropriate word-lengths for an algorithm implementation in order to minimise its cost in terms of power (or some other metric for which models are available). The method developed is able to provide tight lower and upper bounds on the optimal cost that can be obtained for a particular word-length optimisation problem and can, as a result, nd provably near-optimal solutions to word-length optimisation problems without resorting to an NP-hard search of the design space. Finally the costs of systems optimised via the proposed technique are compared to those obtainable by word-length optimisation for minimisation of other metrics (such as logic area) and the results compared, providing greater insight into the nature of wordlength optimisation problems and the extent of the improvements obtainable by them

    Designing Approximate Computing Circuits with Scalable and Systematic Data-Driven Techniques

    Get PDF
    Semiconductor feature size has been shrinking significantly in the past decades. This decreasing trend of feature size leads to faster processing speed as well as lower area and power consumption. Among these attributes, power consumption has emerged as the primary concern in the design of integrated circuits in recent years due to the rapid increasing demand of energy efficient Internet of Things (IoT) devices. As a result, low power design approaches for digital circuits have become of great attractive in the past few years. To this end, approximate computing in hardware design has emerged as a promising design technique. It provides design opportunities to improve timing and energy efficiency by relaxing computing quality. This technique is feasible because of the error-resiliency of many emerging resource-hungry computational applications such as multimedia processing and machine learning. Thus, it is reasonable to utilize this characteristic to trade an acceptable amount of computing quality for energy saving. In the literature, most prior works on approximate circuit design focus on using manual design strategies to redesign fundamental computational blocks such as adders and multipliers. However, the manual design techniques are not suitable for system level hardware due to much higher design complexity. In order to tackle this challenge, we focus on designing scalable, systematic and general design methodologies that are applicable on any circuits. In this paper, we present two novel approximate circuit design methods based on machine learning techniques. Both methods skip the complicated manual analysis steps and primarily look at the given input-error pattern to generate approximate circuits. Our first work presents a framework for designing compensation block, an essential component in many approximate circuits, based on feature selection. Our second work further extends and optimizes this framework and integrates data-driven consideration into the design. Several case studies on fixed-width multipliers and other approximate circuits are presented to demonstrate the effectiveness of the proposed design methods. The experimental results show that both of the proposed methods are able to automatically and efficiently design low-error approximate circuits

    Modeling, simulation and analysis of the reaction field for electrostatic interactions in aqueous solution

    Get PDF
    How to deal with the long-range electrostatic interactions theoretically and computationally has been well studied due to their importance in biological processes and time consuming summations in computer simulations. The main focus of our research has been on the design and application of a new type of hybrid model that combines both the explicit and implicit solvent models using a reaction field (RF) approach, for accurate and efficient electrostatic calculations. This hybrid model, named as Image Charge Solvation Model (ICSM), replaces an infinite Coulomb summation by two finite sums over direct interactions plus image charges for RF. To characterize the ICSM, the electrostatic torques and forces using different model parameters are compared through various histogram distributions. The contributions of RF are 20% and 2% of the total electrostatic torques and forces, respectively, suggesting that the main effect of RF is to maintain the orientation of water dipoles in the solution. Considering systematic artifacts of the discontinuous dielectric constant at the edge of the cavity, we modified the image charge formula in an optimal way to better account for the continuously changing dielectric profile near the boundary, which provides a computational procedure to determine the most accurate RF possible for a specified water model. The Periodic Boundary Conditions (PBC) in ICSM reduces the size of the productive region and introduces unphysical correlations between ions in ionic solution. With combination of finite boundary conditions, mean field theory for short-range forces and multiple constraint forces applied to water molecules in a buffer layer, bulk water properties are maintained without problems from imaged ions in a much bigger usable region than before. To summarize, the results presented in this work provide a complete characterization, optimization and improvement of the ICSM for electrostatic calculations

    Combining High- and Low-Level Electronic Structure Theories for the Efficient Exploration of Potential Energy Surfaces

    Get PDF
    The efficient exploration and characterization of potential energy surfaces paves the way for the theoretical elucidation of complex chemical processes. A potential energy surface arises from the application of the Born-Oppenheimer approximation when solving the Schrödinger equation for a molecular system. The extraction of energies and nuclear gradients from the Schrödinger equation is typically cost-prohibitive, which has inspired a plethora of approximations. In this thesis, we present the development of embedding and machine learning methodologies that provide fast and accurate energies and nuclear gradients for different chemical classes by combining high- and low-level electronic structure theories. If a chemical change occurs in a spatially localized region, embedding strategies offer an effective approach for balancing accuracy and computational cost. We first consider embedded mean-field theory (EMFT), which seamlessly combines different mean-field theories for different subsystems to describe the whole molecular system. We analyze the errors in EMFT calculations that occur when subsystems employ different atomic-orbital basis sets. These errors can be alleviated by a Fock-matrix correction scheme or by following general basis set recommendations. Systems exhibiting a more complicated electronic structure require a systematically improvable level of theory for the subsystems, which can be realized by projection-based embedding. Projection-based embedding enables the description of a small part of a molecular system at the level of a correlated wavefunction method while the remainder of the system is described at the mean-field level. We go on to derive and numerically demonstrate the analytical nuclear gradients for projection-based embedding. If description of the entire system at the high level of theory is deemed necessary, molecular-orbital-based machine learning (MOB-ML) calculations offers a framework to predict accurate correlation energies at the cost of obtaining molecular orbitals. We go on to present the derivation, implementation, and numerical demonstration of MOB-ML analytical nuclear gradients. We demonstrate the developed methodologies by exploring potential energy surfaces of organic and transition-metal containing molecules.</p

    Design Exploration of an FPGA-Based Multivariate Gaussian Random Number Generator

    No full text
    Monte Carlo simulation is one of the most widely used techniques for computationally intensive simulations in a variety of applications including mathematical analysis and modeling and statistical physics. A multivariate Gaussian random number generator (MVGRNG) is one of the main building blocks of such a system. Field Programmable Gate Arrays (FPGAs) are gaining increased popularity as an alternative means to the traditional general purpose processors targeting the acceleration of the computationally expensive random number generator block due to their fine grain parallelism and reconfigurability properties and lower power consumption. As well as the ability to achieve hardware designs with high throughput it is also desirable to produce designs with the flexibility to control the resource usage in order to meet given resource constraints. This work proposes a novel approach for mapping a MVGRNG onto an FPGA by optimizing the computational path in terms of hardware resource usage subject to an acceptable error in the approximation of the distribution of interest. An analysis on the impact of the error due to truncation/rounding operation along the computational path is performed and an analytical expression of the error inserted into the system is presented. Extra dimensionality is added to the feature of the proposed algorithm by introducing a novel methodology to map many multivariate Gaussian random number generators onto a single FPGA. The effective resource sharing techniques introduced in this thesis allows further reduction in hardware resource usage. The use of MVGNRG can be found in a wide range of application, especially in financial applications which involve many correlated assets. In this work it is demonstrated that the choice of the objective function employed for the hardware optimization of the MVRNG core has a considerable impact on the final performance of the application of interest. Two of the most important financial applications, Value-at-Risk estimation and option pricing are considered in this work
    corecore