5,714 research outputs found
High throughput spatial convolution filters on FPGAs
Digital signal processing (DSP) on field- programmable gate arrays (FPGAs) has long been appealing because of the inherent parallelism in these computations that can be easily exploited to accelerate such algorithms. FPGAs have evolved significantly to further enhance the mapping of these algorithms, included additional hard blocks, such as the DSP blocks found in modern FPGAs. Although these DSP blocks can offer more efficient mapping of DSP computations, they are primarily designed for 1-D filter structures. We present a study on spatial convolutional filter implementations on FPGAs, optimizing around the structure of the DSP blocks to offer high throughput while maintaining the coefficient flexibility that other published architectures usually sacrifice. We show that it is possible to implement large filters for large 4K resolution image frames at frame rates of 30–60 FPS, while maintaining functional flexibility
Efficient hardware implementations of low bit depth motion estimation algorithms
In this paper, we present efficient hardware implementation of multiplication free one-bit transform (MF1BT) based and constraint one-bit transform (C-1BT) based motion estimation (ME) algorithms, in order to provide low bit-depth representation based full search block ME hardware for real-time video encoding. We used a source pixel based linear array (SPBLA) hardware architecture for low bit depth ME for the first time in the literature. The proposed SPBLA based implementation results in a genuine data flow scheme which significantly reduces the number of data reads from the current block memory, which in turn reduces the power consumption by at least 50% compared to conventional 1BT based ME hardware architecture presented in the literature. Because of the binary nature of low bit-depth ME algorithms, their hardware architectures are more efficient than existing 8 bits/pixel representation based ME architectures
Implementation of the Trigonometric LMS Algorithm using Original Cordic Rotation
The LMS algorithm is one of the most successful adaptive filtering
algorithms. It uses the instantaneous value of the square of the error signal
as an estimate of the mean-square error (MSE). The LMS algorithm changes
(adapts) the filter tap weights so that the error signal is minimized in the
mean square sense. In Trigonometric LMS (TLMS) and Hyperbolic LMS (HLMS), two
new versions of LMS algorithms, same formulations are performed as in the LMS
algorithm with the exception that filter tap weights are now expressed using
trigonometric and hyperbolic formulations, in cases for TLMS and HLMS
respectively. Hence appears the CORDIC algorithm as it can efficiently perform
trigonometric, hyperbolic, linear and logarithmic functions. While
hardware-efficient algorithms often exist, the dominance of the software
systems has kept those algorithms out of the spotlight. Among these hardware-
efficient algorithms, CORDIC is an iterative solution for trigonometric and
other transcendental functions. Former researches worked on CORDIC algorithm to
observe the convergence behavior of Trigonometric LMS (TLMS) algorithm and
obtained a satisfactory result in the context of convergence performance of
TLMS algorithm. But revious researches directly used the CORDIC block output in
their simulation ignoring the internal step-by-step rotations of the CORDIC
processor. This gives rise to a need for verification of the convergence
performance of the TLMS algorithm to investigate if it actually performs
satisfactorily if implemented with step-by-step CORDIC rotation. This research
work has done this job. It focuses on the internal operations of the CORDIC
hardware, implements the Trigonometric LMS (TLMS) and Hyperbolic LMS (HLMS)
algorithms using actual CORDIC rotations. The obtained simulation results are
highly satisfactory and also it shows that convergence behavior of HLMS is much
better than TLMS.Comment: 12 pages, 5 figures, 1 table. Published in IJCNC;
http://airccse.org/journal/cnc/0710ijcnc08.pdf,
http://airccse.org/journal/ijc2010.htm
A high performance hardware architecture for one bit transform based motion estimation
Motion Estimation (ME) is the most computationally intensive part of video compression and video enhancement systems. One bit transform (IBT) based ME algorithms have low computational complexity. Therefore, in this paper, we propose a high performance systolic hardware architecture for IBT based ME. The proposed hardware performs full search ME for 4 Macroblocks in parallel and it is the fastest IBT based ME hardware reported in the literature. In addition, it uses less on-chip memory than the previous IBT based ME hardware by using a novel data reuse scheme and memory organization. The proposed hardware is implemented in Verilog HDL. It consumes %34 of the slices in a Xilinx XC2VP30-7 FPGA. It works at 115 MHz in the same FPGA and is capable of processing 50 1920x1080 full High Definition frames per second. Therefore, it can be used in consumer electronics products that require real-time video processing or compression
Hierarchical stack filtering : a bitplane-based algorithm for massively parallel processors
With the development of novel parallel architectures for image processing, the implementation
of well-known image operators needs to be reformulated to take advantage of the so-called
massive parallelism. In this work, we propose a general algorithm that implements a large
class of nonlinear filters, called stack filters, with a 2D-array processor. The proposed method consists of decomposing an image into bitplanes with the bitwise decomposition, and then process every bitplane hierarchically. The filtered image is reconstructed by simply stacking the filtered bitplanes according to their order of significance. Owing to its hierarchical structure, our algorithm allows us to trade-off between image quality and processing time, and to significantly reduce the computation time of low-entropy images. Also, experimental tests show that the processing time of our method is substantially lower than that of classical methods when using large structuring elements. All these features are of interest to a variety of real-time applications based on morphological operations such as video segmentation and video enhancement
- …