535 research outputs found

    High throughput spatial convolution filters on FPGAs

    Get PDF
    Digital signal processing (DSP) on field- programmable gate arrays (FPGAs) has long been appealing because of the inherent parallelism in these computations that can be easily exploited to accelerate such algorithms. FPGAs have evolved significantly to further enhance the mapping of these algorithms, included additional hard blocks, such as the DSP blocks found in modern FPGAs. Although these DSP blocks can offer more efficient mapping of DSP computations, they are primarily designed for 1-D filter structures. We present a study on spatial convolutional filter implementations on FPGAs, optimizing around the structure of the DSP blocks to offer high throughput while maintaining the coefficient flexibility that other published architectures usually sacrifice. We show that it is possible to implement large filters for large 4K resolution image frames at frame rates of 30–60 FPS, while maintaining functional flexibility

    A Survey on Approximate Multiplier Designs for Energy Efficiency: From Algorithms to Circuits

    Full text link
    Given the stringent requirements of energy efficiency for Internet-of-Things edge devices, approximate multipliers, as a basic component of many processors and accelerators, have been constantly proposed and studied for decades, especially in error-resilient applications. The computation error and energy efficiency largely depend on how and where the approximation is introduced into a design. Thus, this article aims to provide a comprehensive review of the approximation techniques in multiplier designs ranging from algorithms and architectures to circuits. We have implemented representative approximate multiplier designs in each category to understand the impact of the design techniques on accuracy and efficiency. The designs can then be effectively deployed in high-level applications, such as machine learning, to gain energy efficiency at the cost of slight accuracy loss.Comment: 38 pages, 37 figure

    VLSI Design

    Get PDF
    This book provides some recent advances in design nanometer VLSI chips. The selected topics try to present some open problems and challenges with important topics ranging from design tools, new post-silicon devices, GPU-based parallel computing, emerging 3D integration, and antenna design. The book consists of two parts, with chapters such as: VLSI design for multi-sensor smart systems on a chip, Three-dimensional integrated circuits design for thousand-core processors, Parallel symbolic analysis of large analog circuits on GPU platforms, Algorithms for CAD tools VLSI design, A multilevel memetic algorithm for large SAT-encoded problems, etc

    A Study on Efficient Designs of Approximate Arithmetic Circuits

    Get PDF
    Approximate computing is a popular field where accuracy is traded with energy. It can benefit applications such as multimedia, mobile computing and machine learning which are inherently error resilient. Error introduced in these applications to a certain degree is beyond human perception. This flexibility can be exploited to design area, delay and power efficient architectures. However, care must be taken on how approximation compromises the correctness of results. This research work aims to provide approximate hardware architectures with error metrics and design metrics analyzed and their effects in image processing applications. Firstly, we study and propose unsigned array multipliers based on probability statistics and with approximate 4-2 compressors, full adders and half adders. This work deals with a new design approach for approximation of multipliers. The partial products of the multiplier are altered to introduce varying probability terms. Logic complexity of approximation is varied for the accumulation of altered partial products based on their probability. The proposed approximation is utilized in two variants of 16-bit multipliers. Synthesis results reveal that two proposed multipliers achieve power savings of 72% and 38% respectively compared to an exact multiplier. They have better precision when compared to existing approximate multipliers. Mean relative error distance (MRED) figures are as low as 7.6% and 0.02% for the proposed approximate multipliers, which are better than the previous state-of-the-art works. Performance of the proposed multipliers is evaluated with geometric mean filtering application, where one of the proposed models achieves the highest peak signal to noise ratio (PSNR). Second, approximation is proposed for signed Booth multiplication. Approximation is introduced in partial product generation and partial product accumulation circuits. In this work, three multipliers (ABM-M1, ABM-M2, and ABM-M3) are proposed in which the modified Booth algorithm is approximated. In all three designs, approximate Booth partial product generators are designed with different variations of approximation. The approximations are performed by reducing the logic complexity of the Booth partial product generator, and the accumulation of partial products is slightly modified to improve circuit performance. Compared to the exact Booth multiplier, ABM-M1 achieves up to 15% reduction in power consumption with an MRED value of 7.9 Ă— 10-4. ABM-M2 has power savings of up to 60% with an MRED of 1.1 Ă— 10-1. ABM-M3 has power savings of up to 50% with an MRED of 3.4 Ă— 10-3. Compared to existing approximate Booth multipliers, the proposed multipliers ABM-M1 and ABM-M3 achieve up to a 41% reduction in power consumption while exhibiting very similar error metrics. Image multiplication and matrix multiplication are used as case studies to illustrate the high performance of the proposed approximate multipliers. Third, distributed arithmetic based sum of products units approximation is analyzed. Sum of products units are key elements in many digital signal processing applications. Three approximate sum of products models which are based on distributed arithmetic are proposed. They are designed for different levels of accuracy. First model of approximate sum of products achieves an improvement up to 64% on area and 70% on power, when compared to conventional unit. Other two models provide an improvement of 32% and 48% on area and 54% and 58% on power, respectively, with a reduced error rate compared to the first model. Third model achieves MRED and normalized mean error distance (NMED) as low as 0.05% and 0.009%. Performance of approximate units is evaluated with a noisy image smoothing application, where the proposed models are capable of achieving higher PSNR than existing state of the art techniques. Fourth, approximation is applied in division architecture. Two approximation models are proposed for restoring divider. In the first design, approximation is performed at circuit level, where approximate divider cells are utilized in place of exact ones by simplifying the logic equations. In the second model, restoring divider is analyzed strategically and number of restoring divider cells are reduced by finding the portions of divisor and dividend with significant information. An approximation factor pp is used in both designs. In model 1, the design with p=8 has a 58% reduction in both area and power consumption compared to exact design, with a Q-MRED of 1.909 Ă— 10-2 and Q-NMED of 0.449 Ă— 10-2. The second model with an approximation factor p=4 has 54% area savings and 62% power savings compared to exact design. The proposed models are found to have better error metrics compared to existing designs, with better performance at similar error values. A change detection image processing application is used for real time assessment of proposed and existing approximate dividers and one of the models achieves a PSNR of 54.27 dB

    Improving the Hardware Performance of Arithmetic Circuits using Approximate Computing

    Get PDF
    An application that can produce a useful result despite some level of computational error is said to be error resilient. Approximate computing can be applied to error resilient applications by intentionally introducing error to the computation in order to improve performance, and it has been shown that approximation is especially well-suited for application in arithmetic computing hardware. In this thesis, novel approximate arithmetic architectures are proposed for three different operations, namely multiplication, division, and the multiply accumulate (MAC) operation. For all designs, accuracy is evaluated in terms of mean relative error distance (MRED) and normalized mean error distance (NMED), while hardware performance is reported in terms of critical path delay, area, and power consumption. Three approximate Booth multipliers (ABM-M1, ABM-M2, ABM-M3) are designed in which two novel inexact partial product generators are used to reduce the dimensions of the partial product matrix. The proposed multipliers are compared to other state-of-the-art designs in terms of both accuracy and hardware performance, and are found to reduce power consumption by up to 56% when compared to the exact multiplier. The function of the multipliers is verified in several image processing applications. Two approximate restoring dividers (AXRD-M1, AXRD-M2) are proposed along with a novel inexact restoring divider cell. In the first divider, the conventional cells are replaced with the proposed inexact cells in several columns. The second divider computes only a subset of the trial subtractions, after which the divisor and partial remainder are rounded and encoded so that they may be used to estimate the remaining quotient bits. The proposed dividers are evaluated for accuracy and hardware performance alongside several benchmarking designs, and their function is verified using change detection and foreground extraction applications. An approximate MAC unit is presented in which the multiplication is implemented using a modified version of ABM-M3. The delay is reduced by using a fused architecture where the accumulator is summed as part of the multiplier compression. The accuracy and hardware savings of the MAC unit are measured against several works from the literature, and the design is utilized in a number of convolution operations

    NASA Space Engineering Research Center for VLSI System Design

    Get PDF
    This annual report outlines the activities of the past year at the NASA SERC on VLSI Design. Highlights for this year include the following: a significant breakthrough was achieved in utilizing commercial IC foundries for producing flight electronics; the first two flight qualified chips were designed, fabricated, and tested and are now being delivered into NASA flight systems; and a new technology transfer mechanism has been established to transfer VLSI advances into NASA and commercial systems
    • …
    corecore