2,497 research outputs found
Truncated Binary Multipliers with minimum Mean Square Error: analytical characterization, circuit implementation and applications
In the wireless multimedia word, DSP systems are ubiquitous. DSP algorithms are computationally intensive and test the limits of battery life in portable device such as cell phones, hearing aids, MP3 players, digital video recorders and so on. Multiplication and squaring are the main operation in many signal processing algorithms (filtering, convolution, FFT, DCT, euclidean distance etc.), hence efficient parallel multipliers are desirable. A full-width digital nxn bits multiplier computes the 2n bits output as a weighted sum of partial products. A multiplier with the output represented on n bits output is useful, as example, in DSP datapaths which saves the output in the same n bits registers of the input. Note that the truncated multipliers are useful not only for DSP but also for digital, computational intensive, ASICs where the bit-widths at the output of the arithmetic blocks are chosen on the basis of system-related accuracy issues. Hence 2n bits of precision at the multiplier output are very often more than required. A truncated multiplier is an nxn multiplier with n bits output. Since in a truncated multiplier the n less-significant bits of the full-width product are discarded, some of the partial products are removed and replaced by a suitable compensation function, to trade-off accuracy with hardware cost. Several techniques have been proposed in the Literature following this basic idea. The difference between the various circuits is in the choice and the implementation of the compensation circuit. The correction techniques proposed in the Literature are obtained through exhaustive search. This means that the results are only available for small n values and that the proposed approach are not extendable to greater bit widths. Furthermore the analytical characterization of the error is not possible. In this dissertation an innovative solution for the design and characterization of truncated multipliers is presented. The proposed circuits are based on the analytical calculation of the error of the truncated multiplier. This approach allows to have the description of a multiplier characterized by a minimum mean square error which gives a fast and low power VLSI implementation. Furthermore the analytical approach yields to a closed form expression of the mean square error and maximum absolute error for the proposed truncated multipliers. In this way the a priori knowledge of the output error is available. The errors are known for every bit width of the multiplier and it is also possible to decide, for a given bit width, which correction circuit has to be used in order to obtain a certain error. This analytical relation between the error and the parameters of hardware implementation is extremely important for the digital designer, since now it is possible to select the suitable implementation as a function of the desired accuracy. Proposed truncated multipliers overcome the previously proposed truncated multipliers since provide lower error, lower power dissipation, lower area occupation and also provide higher working frequency. The circuits are also easily implemented and allow an automatic HDL description as a function of bit width and desired error. The complete description of the errors for the truncated multipliers allows the use of these circuits as building blocks for more complex systems. It will be shown how the proposed multiplier can be used to design low area occupation FIR filters and an efficient PI temperature controller
FPGA Implementation of Gaussian Mixture Model Algorithm for 47 fps Segmentation of 1080p Video
Circuits and systems able to process high quality video in real time are fundamental in nowadays imaging systems. The circuit proposed in the paper, aimed at the robust identification of the background in video streams, implements the improved formulation of the Gaussian Mixture Model (GMM) algorithm that is included in the OpenCV library. An innovative, hardware oriented, formulation of the GMM equations, the use of truncated binary multipliers, and ROM compression techniques allow reduced hardware complexity and increased processing capability. The proposed circuit has been designed having commercial FPGA devices as target and provides speed and logic resources occupation that overcome previously proposed implementations. The circuit, when implemented on Virtex6 or StratixIV, processes more than 45 frame per second in 1080p format and uses few percent of FPGA logic resources
A fast approach for overcomplete sparse decomposition based on smoothed L0 norm
In this paper, a fast algorithm for overcomplete sparse decomposition, called
SL0, is proposed. The algorithm is essentially a method for obtaining sparse
solutions of underdetermined systems of linear equations, and its applications
include underdetermined Sparse Component Analysis (SCA), atomic decomposition
on overcomplete dictionaries, compressed sensing, and decoding real field
codes. Contrary to previous methods, which usually solve this problem by
minimizing the L1 norm using Linear Programming (LP) techniques, our algorithm
tries to directly minimize the L0 norm. It is experimentally shown that the
proposed algorithm is about two to three orders of magnitude faster than the
state-of-the-art interior-point LP solvers, while providing the same (or
better) accuracy.Comment: Accepted in IEEE Transactions on Signal Processing. For MATLAB codes,
see (http://ee.sharif.ir/~SLzero). File replaced, because Fig. 5 was missing
erroneousl
High-level power optimisation for Digital Signal Processing in Recon gurable Logic
This thesis is concerned with the optimisation of Digital Signal Processing (DSP) algorithm
implementations on recon gurable hardware via the selection of appropriate word-lengths
for the signals in these algorithms, in order to minimise system power consumption. Whilst
existing word-length optimisation work has concentrated on the minimisation of the area of
algorithm implementations, this work introduces the rst set of power consumption models
that can be evaluated quickly enough to be used within the search of the enormous design
space of multiple word-length optimisation problems. These models achieve their speed by
estimating both the power consumed within the arithmetic components of an algorithm
and the power in the routing wires that connect these components, using only a high-level
description of the algorithm itself. Trading o a small reduction in power model accuracy
for a large increase in speed is one of the major contributions of this thesis.
In addition to the work on power consumption modelling, this thesis also develops a
new technique for selecting the appropriate word-lengths for an algorithm implementation
in order to minimise its cost in terms of power (or some other metric for which models
are available). The method developed is able to provide tight lower and upper bounds on
the optimal cost that can be obtained for a particular word-length optimisation problem
and can, as a result, nd provably near-optimal solutions to word-length optimisation
problems without resorting to an NP-hard search of the design space.
Finally the costs of systems optimised via the proposed technique are compared to
those obtainable by word-length optimisation for minimisation of other metrics (such as
logic area) and the results compared, providing greater insight into the nature of wordlength
optimisation problems and the extent of the improvements obtainable by them
Designing Approximate Computing Circuits with Scalable and Systematic Data-Driven Techniques
Semiconductor feature size has been shrinking significantly in the past decades. This decreasing trend of feature size leads to faster processing speed as well as lower area and power consumption. Among these attributes, power consumption has emerged as the primary concern in the design of integrated circuits in recent years due to the rapid increasing demand of energy efficient Internet of Things (IoT) devices. As a result, low power design approaches for digital circuits have become of great attractive in the past few years. To this end, approximate computing in hardware design has emerged as a promising design technique. It provides design opportunities to improve timing and energy efficiency by relaxing computing quality. This technique is feasible because of the error-resiliency of many emerging resource-hungry computational applications such as multimedia processing and machine learning. Thus, it is reasonable to utilize this characteristic to trade an acceptable amount of computing quality for energy saving.
In the literature, most prior works on approximate circuit design focus on using manual design strategies to redesign fundamental computational blocks such as adders and multipliers. However, the manual design techniques are not suitable for system level hardware due to much higher design complexity. In order to tackle this challenge, we focus on designing scalable, systematic and general design methodologies that are applicable on any circuits. In this paper, we present two novel approximate circuit design methods based on machine learning techniques. Both methods skip the complicated manual analysis steps and primarily look at the given input-error pattern to generate approximate circuits. Our first work presents a framework for designing compensation block, an essential component in many approximate circuits, based on feature selection. Our second work further extends and optimizes this framework and integrates data-driven consideration into the design. Several case studies on fixed-width multipliers and other approximate circuits are presented to demonstrate the effectiveness of the proposed design methods. The experimental results show that both of the proposed methods are able to automatically and efficiently design low-error approximate circuits
Modeling, simulation and analysis of the reaction field for electrostatic interactions in aqueous solution
How to deal with the long-range electrostatic interactions theoretically and computationally has been well studied due to their importance in biological processes and time consuming summations in computer simulations. The main focus of our research has been on the design and application of a new type of hybrid model that combines both the explicit and implicit solvent models using a reaction field (RF) approach, for accurate and efficient electrostatic calculations. This hybrid model, named as Image Charge Solvation Model (ICSM), replaces an infinite Coulomb summation by two finite sums over direct interactions plus image charges for RF. To characterize the ICSM, the electrostatic torques and forces using different model parameters are compared through various histogram distributions. The contributions of RF are 20% and 2% of the total electrostatic torques and forces, respectively, suggesting that the main effect of RF is to maintain the orientation of water dipoles in the solution. Considering systematic artifacts of the discontinuous dielectric constant at the edge of the cavity, we modified the image charge formula in an optimal way to better account for the continuously changing dielectric profile near the boundary, which provides a computational procedure to determine the most accurate RF possible for a specified water model. The Periodic Boundary Conditions (PBC) in ICSM reduces the size of the productive region and introduces unphysical correlations between ions in ionic solution. With combination of finite boundary conditions, mean field theory for short-range forces and multiple constraint forces applied to water molecules in a buffer layer, bulk water properties are maintained without problems from imaged ions in a much bigger usable region than before. To summarize, the results presented in this work provide a complete characterization, optimization and improvement of the ICSM for electrostatic calculations
Combining High- and Low-Level Electronic Structure Theories for the Efficient Exploration of Potential Energy Surfaces
The efficient exploration and characterization of potential energy surfaces paves the way for the theoretical elucidation of complex chemical processes. A potential energy surface arises from the application of the Born-Oppenheimer approximation when solving the Schrödinger equation for a molecular system. The extraction of energies and nuclear gradients from the Schrödinger equation is typically cost-prohibitive, which has inspired a plethora of approximations. In this thesis, we present the development of embedding and machine learning methodologies that provide fast and accurate energies and nuclear gradients for different chemical classes by combining high- and low-level electronic structure theories. If a chemical change occurs in a spatially localized region, embedding strategies offer an effective approach for balancing accuracy and computational cost. We first consider embedded mean-field theory (EMFT), which seamlessly combines different mean-field theories for different subsystems to describe the whole molecular system. We analyze the errors in EMFT calculations that occur when subsystems employ different atomic-orbital basis sets. These errors can be alleviated by a Fock-matrix correction scheme or by following general basis set recommendations. Systems exhibiting a more complicated electronic structure require a systematically improvable level of theory for the subsystems, which can be realized by projection-based embedding. Projection-based embedding enables the description of a small part of a molecular system at the level of a correlated wavefunction method while the remainder of the system is described at the mean-field level. We go on to derive and numerically demonstrate the analytical nuclear gradients for projection-based embedding. If description of the entire system at the high level of theory is deemed necessary, molecular-orbital-based machine learning (MOB-ML) calculations offers a framework to predict accurate correlation energies at the cost of obtaining molecular orbitals. We go on to present the derivation, implementation, and numerical demonstration of MOB-ML analytical nuclear gradients. We demonstrate the developed methodologies by exploring potential energy surfaces of organic and transition-metal containing molecules.</p
Design Exploration of an FPGA-Based Multivariate Gaussian Random Number Generator
Monte Carlo simulation is one of the most widely used techniques for computationally
intensive simulations in a variety of applications including mathematical
analysis and modeling and statistical physics. A multivariate Gaussian
random number generator (MVGRNG) is one of the main building blocks of
such a system. Field Programmable Gate Arrays (FPGAs) are gaining increased
popularity as an alternative means to the traditional general purpose
processors targeting the acceleration of the computationally expensive random
number generator block due to their fine grain parallelism and reconfigurability
properties and lower power consumption.
As well as the ability to achieve hardware designs with high throughput it
is also desirable to produce designs with the flexibility to control the resource
usage in order to meet given resource constraints. This work proposes a novel
approach for mapping a MVGRNG onto an FPGA by optimizing the computational
path in terms of hardware resource usage subject to an acceptable
error in the approximation of the distribution of interest. An analysis on the
impact of the error due to truncation/rounding operation along the computational path is performed and an analytical expression of the error inserted into
the system is presented.
Extra dimensionality is added to the feature of the proposed algorithm by
introducing a novel methodology to map many multivariate Gaussian random
number generators onto a single FPGA. The effective resource sharing techniques
introduced in this thesis allows further reduction in hardware resource
usage.
The use of MVGNRG can be found in a wide range of application, especially
in financial applications which involve many correlated assets. In this
work it is demonstrated that the choice of the objective function employed
for the hardware optimization of the MVRNG core has a considerable impact
on the final performance of the application of interest. Two of the most important
financial applications, Value-at-Risk estimation and option pricing are
considered in this work
- …