1,020 research outputs found

    Hardware Implementation of Compressed Sensing based Low Complex Video Encoder

    Full text link
    This paper presents a memory efficient VLSI architecture of low complex video encoder using three dimensional (3-D) wavelet and Compressed Sensing (CS) is proposed for space and low power video applications. Majority of the conventional video coding schemes are based on hybrid model, which requires complex operations like transform coding (DCT), motion estimation and deblocking filter at the encoder. Complexity of the proposed encoder is reduced by replacing those complex operations by 3-D DWT and CS at the encoder. The proposed architecture uses 3-D DWT to enable the scalability with levels of wavelet decomposition and also to exploit the spatial and the temporal redundancies. CS provides the good error resilience and coding efficiency. At the first stage of the proposed architecture for encoder, 3-D DWT has been applied (Lifting based 2-D DWT in spatial domain and Haar wavelet in temporal domain) on each frame of the group of frames (GOF), and in the second stage CS module exploits the sparsity of the wavelet coefficients. Small set of linear measurements are extracted by projecting the sparse 3-D wavelet coefficients onto random Bernoulli matrix at the encoder. Compared with the best existing 3-D DWT architectures, the proposed architecture for 3-D DWT requires less memory and provide high throughput. For an N?N image, the proposed 3-D DWT architecture consumes a total of only 2?(3N +40P) words of on-chip memory for the one level of decomposition. The proposed architecture for an encoder is first of its kind and to the best of my knowledge, no architecture is noted for comparison. The proposed VLSI architecture of the encoder has been synthesized on 90-nm CMOS process technology and results show that it consumes 90.08 mW power and occupies an area equivalent to 416.799 K equivalent gate at frequency of 158 MHz.Comment: Submitted in IEEE transactions on VLS

    A Novel Reconfigurable Architecture of a DSP Processor for Efficient Mapping of DSP Functions using Field Programmable DSP Arrays

    Full text link
    Development of modern integrated circuit technologies makes it feasible to develop cheaper, faster and smaller special purpose signal processing function circuits. Digital Signal processing functions are generally implemented either on ASICs with inflexibility, or on FPGAs with bottlenecks of relatively smaller utilization factor or lower speed compared to ASIC. Field Programmable DSP Array (FPDA) is the proposed DSP dedicated device, redolent to FPGA, but with basic fixed common modules (CMs) (like adders, subtractors, multipliers, scaling units, shifters) instead of CLBs. This paper introduces the development of reconfigurable system architecture with a focus on FPDA that integrates different DSP functions like DFT, FFT, DCT, FIR, IIR, and DWT etc. The switching between DSP functions is occurred by reconfiguring the interconnection between CMs. Validation of the proposed architecture has been achieved on Virtex5 FPGA. The architecture provides sufficient amount of flexibility, parallelism and scalability.Comment: 8 Pages, 12 Figures, ACM SIGARCH Computer Architecture News. arXiv admin note: substantial text overlap with arXiv:1305.325

    Algorithm/Architecture Co-design of Proportionate-type LMS Adaptive Filters for Sparse System Identification

    Full text link
    This paper investigates the problem of implementing proportionate-type LMS family of algorithms in hardware for sparse adaptive filtering applications especially the network echo cancelation. We derive a re-formulated proportionate type algorithm through algorithm-architecture co-design methodology that can be pipelined and has an efficient architecture for hardware implementation. We study the convergence, steady-state and tracking performances of these re-formulated algorithms for white, color and speech inputs before implementing them in hardware. To the best of our knowledge this is the first attempt to implement proportionate-type algorithms in hardware. We show that Delayed μ\mu-law Proportionate LMS (DMPLMS) algorithm for white input and Delayed Wavelet MPLMS (DWMPLMS) for colored input are the robust VLSI solutions for network echo cancellation where the sparsity of the echo paths can vary with time. We implemented all the designs considering 1616-bit fixed point representation in hardware, synthesized the designs and synthesis results show that DMPLMS algorithm with 25%\approx25\% increase in hardware over conventional DLMS architecture, achieves 3X3X improvement in convergence rate for white input and DWMPLMS algorithm with 58%\approx58\% increase in hardware achieves 15X15X improvement in convergence rate for correlated input conditions.Comment: Under communicatio

    A Low Complexity VLSI Architecture for Multi-Focus Image Fusion in DCT Domain

    Full text link
    Due to the confined focal length of optical sensors, focusing all objects in a scene with a single sensor is a difficult task. To handle such a situation, image fusion methods are used in multi-focus environment. Discrete Cosine Transform (DCT) is a widely used image compression transform, image fusion in DCT domain is an efficient method. This paper presents a low complexity approach for multi-focus image fusion and its VLSI implementation using DCT. The proposed method is evaluated using reference/non-reference fusion measure criteria and the obtained results asserts it's effectiveness. The maximum synthesized frequency on FPGA is found to be 221 MHz and consumes 42% of FPGA resources. The proposed method consumes very less power and can process 4K resolution images at the rate of 60 frames per second which makes the hardware suitable for handheld portable devices such as camera module and wireless image sensors.Comment: Submitting to journa

    A VLSI architecture of JPEG2000 encoder

    Get PDF
    Copyright @ 2004 IEEEThis paper proposes a VLSI architecture of JPEG2000 encoder, which functionally consists of two parts: discrete wavelet transform (DWT) and embedded block coding with optimized truncation (EBCOT). For DWT, a spatial combinative lifting algorithm (SCLA)-based scheme with both 5/3 reversible and 9/7 irreversible filters is adopted to reduce 50% and 42% multiplication computations, respectively, compared with the conventional lifting-based implementation (LBI). For EBCOT, a dynamic memory control (DMC) strategy of Tier-1 encoding is adopted to reduce 60% scale of the on-chip wavelet coefficient storage and a subband parallel-processing method is employed to speed up the EBCOT context formation (CF) process; an architecture of Tier-2 encoding is presented to reduce the scale of on-chip bitstream buffering from full-tile size down to three-code-block size and considerably eliminate the iterations of the rate-distortion (RD) truncation.This work was supported in part by the China National High Technologies Research Program (863) under Grant 2002AA1Z142

    High Speed and Area Efficient 2D DWT Processor based Image Compression" Signal & Image Processing

    Full text link
    This paper presents a high speed and area efficient DWT processor based design for Image Compression applications. In this proposed design, pipelined partially serial architecture has been used to enhance the speed along with optimal utilization and resources available on target FPGA. The proposed model has been designed and simulated using Simulink and System Generator blocks, synthesized with Xilinx Synthesis tool (XST) and implemented on Spartan 2 and 3 based XC2S100-5tq144 and XC3S500E-4fg320 target device. The results show that proposed design can operate at maximum frequency 231 MHz in case of Spartan 3 by consuming power of 117mW at 28 degree/c junction temperature. The result comparison has shown an improvement of 15% in speed.Comment: 10 Pages, 9 figure

    VLSI Friendly Framework for Scalable Video Coding based on Compressed Sensing

    Full text link
    This paper presents a new VLSI friendly framework for scalable video coding based on Compressed Sensing (CS). It achieves scalability through 3-Dimensional Discrete Wavelet Transform (3-D DWT) and better compression ratio by exploiting the inherent sparsity of the high-frequency wavelet sub-bands through CS. By using 3-D DWT and a proposed adaptive measurement scheme called AMS at the encoder, one can succeed in improving the compression ratio and reducing the complexity of the decoder. The proposed video codec uses only 7% of the total number of multipliers needed in a conventional CS-based video coding system. A codebook of Bernoulli matrices with different sizes corresponding to the predefined sparsity levels is maintained at both the encoder and the decoder. Based on the calculated l0-norm of the input vector, one of the sixteen possible Bernoulli matrices will be selected for taking the CS measurements and its index will be transmitted along with the measurements. Based on this index, the corresponding Bernoulli matrix has been used in CS reconstruction algorithm to get back the high-frequency wavelet sub-bands at the decoder. At the decoder, a new Enhanced Approximate Message Passing (EAMP) algorithm has been proposed to reconstruct the wavelet coefficients and apply the inverse wavelet transform for restoring back the video frames. Simulation results have established the superiority of the proposed framework over the existing schemes and have increased its suitability for VLSI implementation. Moreover, the coded video is found to be scalable with an increase in a number of levels of wavelet decomposition

    High speed VLSI architectures for DWT in biometric image compression: A study

    Get PDF
    AbstractBiometrics is a field that navigates through a vast database and extracts only the qualifying data to accelerate the processes of biometric authentication/recognition. Image compression is a vital part of the process. Various Very Large Scale Integration (VLSI) architectures have emerged to satisfy the real time requirements of the online processing of the applications. This paper studies various techniques that help in realizing the fast operation of the transform stage of the image compression processes. Various parameters that may involve in optimizations for high speed like computing time, silicon area, memory size etc are considered in the survey

    Fast Continuous Haar and Fourier Transforms of Rectilinear Polygons from VLSI Layouts

    Full text link
    We develop the pruned continuous Haar transform and the fast continuous Fourier series, two fast and efficient algorithms for rectilinear polygons. Rectilinear polygons are used in VLSI processes to describe design and mask layouts of integrated circuits. The Fourier representation is at the heart of many of these processes and the Haar transform is expected to play a major role in techniques envisioned to speed up VLSI design. To ensure correct printing of the constantly shrinking transistors and simultaneously handle their increasingly large number, ever more computationally intensive techniques are needed. Therefore, efficient algorithms for the Haar and Fourier transforms are vital. We derive the complexity of both algorithms and compare it to that of discrete transforms traditionally used in VLSI. We find a significant reduction in complexity when the number of vertices of the polygons is small, as is the case in VLSI layouts. This analysis is completed by an implementation and a benchmark of the continuous algorithms and their discrete counterpart. We show that on tested VLSI layouts the pruned continuous Haar transform is 5 to 25 times faster, while the fast continuous Fourier series is 1.5 to 3 times faster.Comment: 10 pages, 10 figure

    Implementation of DWT Integrated Log Based FPU with SPIHT Coders on FPGA

    Full text link
    In this work, architecture is designed for integrating lifting based discrete wavelet transform (DWT) structure with logarithmic based floating point arithmetic units. As many algorithms were proposed for coding wavelet coefficients for image compression, Set-Partitioning in hierarchical trees algorithm (SPIHT) is found to be widely used due to its low–computational complexity and better method for compressing the images. However it is suffered from the drawback of occupying high memory space and hence produced less throughput. This drawback is overcome in this work by adopting modified SPIHT algorithm termed as block-based pass-parallel SPIHT (BPS) algorithm. The designed architecture is compared with multi-precision floating point arithmetic units and the synthesis results are presented. From the experimental synthesis results it is proved that the integration of DWT structure integrated with log based FPU core and BPS coder implemented on FPGA devices provided efficient area and high speed of computations. The proposed architecture is designed using Verilog HDL and synthesized on various Xilinx FPGA devices. The architecture designed in this work is useful for compressing the images with good compression ratio, better resolution of images and to obtain high peak to signal ratio
    corecore