7,057 research outputs found

    AxOMaP: Designing FPGA-based Approximate Arithmetic Operators using Mathematical Programming

    Full text link
    With the increasing application of machine learning (ML) algorithms in embedded systems, there is a rising necessity to design low-cost computer arithmetic for these resource-constrained systems. As a result, emerging models of computation, such as approximate and stochastic computing, that leverage the inherent error-resilience of such algorithms are being actively explored for implementing ML inference on resource-constrained systems. Approximate computing (AxC) aims to provide disproportionate gains in the power, performance, and area (PPA) of an application by allowing some level of reduction in its behavioral accuracy (BEHAV). Using approximate operators (AxOs) for computer arithmetic forms one of the more prevalent methods of implementing AxC. AxOs provide the additional scope for finer granularity of optimization, compared to only precision scaling of computer arithmetic. To this end, designing platform-specific and cost-efficient approximate operators forms an important research goal. Recently, multiple works have reported using AI/ML-based approaches for synthesizing novel FPGA-based AxOs. However, most of such works limit usage of AI/ML to designing ML-based surrogate functions used during iterative optimization processes. To this end, we propose a novel data analysis-driven mathematical programming-based approach to synthesizing approximate operators for FPGAs. Specifically, we formulate mixed integer quadratically constrained programs based on the results of correlation analysis of the characterization data and use the solutions to enable a more directed search approach for evolutionary optimization algorithms. Compared to traditional evolutionary algorithms-based optimization, we report up to 21% improvement in the hypervolume, for joint optimization of PPA and BEHAV, in the design of signed 8-bit multipliers.Comment: 23 pages, Under review at ACM TRET

    Performance comparison of single-precision SPICE Model-Evaluation on FPGA, GPU, Cell, and multi-core processors

    Get PDF
    Automated code generation and performance tuning techniques for concurrent architectures such as GPUs, Cell and FPGAs can provide integer factor speedups over multi-core processor organizations for data-parallel, floating-point computation in SPICE model-evaluation. Our Verilog AMS compiler produces code for parallel evaluation of non-linear circuit models suitable for use in SPICE simulations where the same model is evaluated several times for all the devices in the circuit. Our compiler uses architecture specific parallelization strategies (OpenMP for multi-core, PThreads for Cell, CUDA for GPU, statically scheduled VLIW for FPGA) when producing code for these different architectures. We automatically explore different implementation configurations (e.g. unroll factor, vector length) using our performance-tuner to identify the best possible configuration for each architecture. We demonstrate speedups of 3- 182times for a Xilinx Virtex5 LX 330T, 1.3-33times for an IBM Cell, and 3-131times for an NVIDIA 9600 GT GPU over a 3 GHz Intel Xeon 5160 implementation for a variety of single-precision device models

    Object-oriented domain specific compilers for programming FPGAs

    No full text
    Published versio

    ASC: A stream compiler for computing with FPGAs

    No full text
    Published versio

    Characterization and Implementation of a Real-World Target Tracking Algorithm on Field Programmable Gate Arrays with Kalman Filter Test Case

    Get PDF
    A one dimensional Kalman Filter algorithm provided in Matlab is used as the basis for a Very High Speed Integrated Circuit Hardware Description Language (VHDL) model. The JAVA programming language is used to create the VHDL code that describes the Kalman filter in hardware which allows for maximum flexibility. A one-dimensional behavioral model of the Kalman Filter is described, as well as a one-dimensional and synthesizable register transfer level (RTL) model with optimizations for speed, area, and power. These optimizations are achieved by a focus on parallelization as well as careful Kalman filter sub-module algorithm selection. Newton-Raphson reciprocal is the chosen algorithm for a fundamental aspect of the Kalman filter, which allows efficient high-speed computation of reciprocals within the overall system. The Newton-Raphson method is also expanded for use in calculating square-roots in an optimized and synthesizable two-dimensional VHDL implementation of the Kalman filter. The two-dimensional Kalman filter expands on the one-dimensional implementation allowing for the tracking of targets on a real-world Cartesian coordinate system

    Wordlength optimization for linear digital signal processing

    No full text
    Published versio
    corecore