17,213 research outputs found

    High throughput spatial convolution filters on FPGAs

    Get PDF
    Digital signal processing (DSP) on field- programmable gate arrays (FPGAs) has long been appealing because of the inherent parallelism in these computations that can be easily exploited to accelerate such algorithms. FPGAs have evolved significantly to further enhance the mapping of these algorithms, included additional hard blocks, such as the DSP blocks found in modern FPGAs. Although these DSP blocks can offer more efficient mapping of DSP computations, they are primarily designed for 1-D filter structures. We present a study on spatial convolutional filter implementations on FPGAs, optimizing around the structure of the DSP blocks to offer high throughput while maintaining the coefficient flexibility that other published architectures usually sacrifice. We show that it is possible to implement large filters for large 4K resolution image frames at frame rates of 30–60 FPS, while maintaining functional flexibility

    A high throughput adaptive DFE for HIPERLAN

    Get PDF

    Image Processing using Approximate Data-path Units

    Get PDF
    abstract: In this work, we present approximate adders and multipliers to reduce data-path complexity of specialized hardware for various image processing systems. These approximate circuits have a lower area, latency and power consumption compared to their accurate counterparts and produce fairly accurate results. We build upon the work on approximate adders and multipliers presented in [23] and [24]. First, we show how choice of algorithm and parallel adder design can be used to implement 2D Discrete Cosine Transform (DCT) algorithm with good performance but low area. Our implementation of the 2D DCT has comparable PSNR performance with respect to the algorithm presented in [23] with ~35-50% reduction in area. Next, we use the approximate 2x2 multiplier presented in [24] to implement parallel approximate multipliers. We demonstrate that if some of the 2x2 multipliers in the design of the parallel multiplier are accurate, the accuracy of the multiplier improves significantly, especially when two large numbers are multiplied. We choose Gaussian FIR Filter and Fast Fourier Transform (FFT) algorithms to illustrate the efficacy of our proposed approximate multiplier. We show that application of the proposed approximate multiplier improves the PSNR performance of 32x32 FFT implementation by 4.7 dB compared to the implementation using the approximate multiplier described in [24]. We also implement a state-of-the-art image enlargement algorithm, namely Segment Adaptive Gradient Angle (SAGA) [29], in hardware. The algorithm is mapped to pipelined hardware blocks and we synthesized the design using 90 nm technology. We show that a 64x64 image can be processed in 496.48 µs when clocked at 100 MHz. The average PSNR performance of our implementation using accurate parallel adders and multipliers is 31.33 dB and that using approximate parallel adders and multipliers is 30.86 dB, when evaluated against the original image. The PSNR performance of both designs is comparable to the performance of the double precision floating point MATLAB implementation of the algorithm.Dissertation/ThesisM.S. Computer Science 201

    A Scalable Correlator Architecture Based on Modular FPGA Hardware, Reuseable Gateware, and Data Packetization

    Full text link
    A new generation of radio telescopes is achieving unprecedented levels of sensitivity and resolution, as well as increased agility and field-of-view, by employing high-performance digital signal processing hardware to phase and correlate large numbers of antennas. The computational demands of these imaging systems scale in proportion to BMN^2, where B is the signal bandwidth, M is the number of independent beams, and N is the number of antennas. The specifications of many new arrays lead to demands in excess of tens of PetaOps per second. To meet this challenge, we have developed a general purpose correlator architecture using standard 10-Gbit Ethernet switches to pass data between flexible hardware modules containing Field Programmable Gate Array (FPGA) chips. These chips are programmed using open-source signal processing libraries we have developed to be flexible, scalable, and chip-independent. This work reduces the time and cost of implementing a wide range of signal processing systems, with correlators foremost among them,and facilitates upgrading to new generations of processing technology. We present several correlator deployments, including a 16-antenna, 200-MHz bandwidth, 4-bit, full Stokes parameter application deployed on the Precision Array for Probing the Epoch of Reionization.Comment: Accepted to Publications of the Astronomy Society of the Pacific. 31 pages. v2: corrected typo, v3: corrected Fig. 1

    A low-complexity feed-forward I/Q imbalance compensation algorithm

    Get PDF
    This paper presents a low-complexity adaptive feed- forward I/Q imbalance compensation algorithm. The feed-forward so- lution has guaranteed stability. Due to its blind nature the algorithm is easily incorporated into an existing receiver design. The algorithm uses three estimators to obtain the necessary parameters for the I/Q imbal- ance compensation structure. The algorithm complexity is low due to 1-bit quantization in the estimators. Simulations show that the compen- sation algorithm is able to attain an image-rejection ratio (IRR) of up to 65 [dB] under various imbalance conditions

    A wideband linear tunable CDTA and its application in field programmable analogue array

    Get PDF
    This document is the Accepted Manuscript version of the following article: Hu, Z., Wang, C., Sun, J. et al. ‘A wideband linear tunable CDTA and its application in field programmable analogue array’, Analog Integrated Circuits and Signal Processing, Vol. 88 (3): 465-483, September 2016. Under embargo. Embargo end date: 6 June 2017. The final publication is available at Springer via https://link.springer.com/article/10.1007%2Fs10470-016-0772-7 © Springer Science+Business Media New York 2016In this paper, a NMOS-based wideband low power and linear tunable transconductance current differencing transconductance amplifier (CDTA) is presented. Based on the NMOS CDTA, a novel simple and easily reconfigurable configurable analogue block (CAB) is designed. Moreover, using the novel CAB, a simple and versatile butterfly-shaped FPAA structure is introduced. The FPAA consists of six identical CABs, and it could realize six order current-mode low pass filter, second order current-mode universal filter, current-mode quadrature oscillator, current-mode multi-phase oscillator and current-mode multiplier for analog signal processing. The Cadence IC Design Tools 5.1.41 post-layout simulation and measurement results are included to confirm the theory.Peer reviewedFinal Accepted Versio
    corecore