1,020 research outputs found
Hardware Implementation of Compressed Sensing based Low Complex Video Encoder
This paper presents a memory efficient VLSI architecture of low complex video
encoder using three dimensional (3-D) wavelet and Compressed Sensing (CS) is
proposed for space and low power video applications. Majority of the
conventional video coding schemes are based on hybrid model, which requires
complex operations like transform coding (DCT), motion estimation and
deblocking filter at the encoder. Complexity of the proposed encoder is reduced
by replacing those complex operations by 3-D DWT and CS at the encoder. The
proposed architecture uses 3-D DWT to enable the scalability with levels of
wavelet decomposition and also to exploit the spatial and the temporal
redundancies. CS provides the good error resilience and coding efficiency. At
the first stage of the proposed architecture for encoder, 3-D DWT has been
applied (Lifting based 2-D DWT in spatial domain and Haar wavelet in temporal
domain) on each frame of the group of frames (GOF), and in the second stage CS
module exploits the sparsity of the wavelet coefficients. Small set of linear
measurements are extracted by projecting the sparse 3-D wavelet coefficients
onto random Bernoulli matrix at the encoder. Compared with the best existing
3-D DWT architectures, the proposed architecture for 3-D DWT requires less
memory and provide high throughput. For an N?N image, the proposed 3-D DWT
architecture consumes a total of only 2?(3N +40P) words of on-chip memory for
the one level of decomposition. The proposed architecture for an encoder is
first of its kind and to the best of my knowledge, no architecture is noted for
comparison. The proposed VLSI architecture of the encoder has been synthesized
on 90-nm CMOS process technology and results show that it consumes 90.08 mW
power and occupies an area equivalent to 416.799 K equivalent gate at frequency
of 158 MHz.Comment: Submitted in IEEE transactions on VLS
A Novel Reconfigurable Architecture of a DSP Processor for Efficient Mapping of DSP Functions using Field Programmable DSP Arrays
Development of modern integrated circuit technologies makes it feasible to
develop cheaper, faster and smaller special purpose signal processing function
circuits. Digital Signal processing functions are generally implemented either
on ASICs with inflexibility, or on FPGAs with bottlenecks of relatively smaller
utilization factor or lower speed compared to ASIC. Field Programmable DSP
Array (FPDA) is the proposed DSP dedicated device, redolent to FPGA, but with
basic fixed common modules (CMs) (like adders, subtractors, multipliers,
scaling units, shifters) instead of CLBs. This paper introduces the development
of reconfigurable system architecture with a focus on FPDA that integrates
different DSP functions like DFT, FFT, DCT, FIR, IIR, and DWT etc. The
switching between DSP functions is occurred by reconfiguring the
interconnection between CMs. Validation of the proposed architecture has been
achieved on Virtex5 FPGA. The architecture provides sufficient amount of
flexibility, parallelism and scalability.Comment: 8 Pages, 12 Figures, ACM SIGARCH Computer Architecture News. arXiv
admin note: substantial text overlap with arXiv:1305.325
Algorithm/Architecture Co-design of Proportionate-type LMS Adaptive Filters for Sparse System Identification
This paper investigates the problem of implementing proportionate-type LMS
family of algorithms in hardware for sparse adaptive filtering applications
especially the network echo cancelation. We derive a re-formulated
proportionate type algorithm through algorithm-architecture co-design
methodology that can be pipelined and has an efficient architecture for
hardware implementation. We study the convergence, steady-state and tracking
performances of these re-formulated algorithms for white, color and speech
inputs before implementing them in hardware. To the best of our knowledge this
is the first attempt to implement proportionate-type algorithms in hardware. We
show that Delayed -law Proportionate LMS (DMPLMS) algorithm for white
input and Delayed Wavelet MPLMS (DWMPLMS) for colored input are the robust VLSI
solutions for network echo cancellation where the sparsity of the echo paths
can vary with time. We implemented all the designs considering -bit fixed
point representation in hardware, synthesized the designs and synthesis results
show that DMPLMS algorithm with increase in hardware over
conventional DLMS architecture, achieves improvement in convergence rate
for white input and DWMPLMS algorithm with increase in hardware
achieves improvement in convergence rate for correlated input conditions.Comment: Under communicatio
A Low Complexity VLSI Architecture for Multi-Focus Image Fusion in DCT Domain
Due to the confined focal length of optical sensors, focusing all objects in
a scene with a single sensor is a difficult task. To handle such a situation,
image fusion methods are used in multi-focus environment. Discrete Cosine
Transform (DCT) is a widely used image compression transform, image fusion in
DCT domain is an efficient method. This paper presents a low complexity
approach for multi-focus image fusion and its VLSI implementation using DCT.
The proposed method is evaluated using reference/non-reference fusion measure
criteria and the obtained results asserts it's effectiveness. The maximum
synthesized frequency on FPGA is found to be 221 MHz and consumes 42% of FPGA
resources. The proposed method consumes very less power and can process 4K
resolution images at the rate of 60 frames per second which makes the hardware
suitable for handheld portable devices such as camera module and wireless image
sensors.Comment: Submitting to journa
A VLSI architecture of JPEG2000 encoder
Copyright @ 2004 IEEEThis paper proposes a VLSI architecture of JPEG2000 encoder, which functionally consists of two parts: discrete wavelet transform (DWT) and embedded block coding with optimized truncation (EBCOT). For DWT, a spatial combinative lifting algorithm (SCLA)-based scheme with both 5/3 reversible and 9/7 irreversible filters is adopted to reduce 50% and 42% multiplication computations, respectively, compared with the conventional lifting-based implementation (LBI). For EBCOT, a dynamic memory control (DMC) strategy of Tier-1 encoding is adopted to reduce 60% scale of the on-chip wavelet coefficient storage and a subband parallel-processing method is employed to speed up the EBCOT context formation (CF) process; an architecture of Tier-2 encoding is presented to reduce the scale of on-chip bitstream buffering from full-tile size down to three-code-block size and considerably eliminate the iterations of the rate-distortion (RD) truncation.This work was supported in part by the China National High Technologies Research Program (863) under Grant 2002AA1Z142
High Speed and Area Efficient 2D DWT Processor based Image Compression" Signal & Image Processing
This paper presents a high speed and area efficient DWT processor based
design for Image Compression applications. In this proposed design, pipelined
partially serial architecture has been used to enhance the speed along with
optimal utilization and resources available on target FPGA. The proposed model
has been designed and simulated using Simulink and System Generator blocks,
synthesized with Xilinx Synthesis tool (XST) and implemented on Spartan 2 and 3
based XC2S100-5tq144 and XC3S500E-4fg320 target device. The results show that
proposed design can operate at maximum frequency 231 MHz in case of Spartan 3
by consuming power of 117mW at 28 degree/c junction temperature. The result
comparison has shown an improvement of 15% in speed.Comment: 10 Pages, 9 figure
VLSI Friendly Framework for Scalable Video Coding based on Compressed Sensing
This paper presents a new VLSI friendly framework for scalable video coding
based on Compressed Sensing (CS). It achieves scalability through 3-Dimensional
Discrete Wavelet Transform (3-D DWT) and better compression ratio by exploiting
the inherent sparsity of the high-frequency wavelet sub-bands through CS. By
using 3-D DWT and a proposed adaptive measurement scheme called AMS at the
encoder, one can succeed in improving the compression ratio and reducing the
complexity of the decoder. The proposed video codec uses only 7% of the total
number of multipliers needed in a conventional CS-based video coding system. A
codebook of Bernoulli matrices with different sizes corresponding to the
predefined sparsity levels is maintained at both the encoder and the decoder.
Based on the calculated l0-norm of the input vector, one of the sixteen
possible Bernoulli matrices will be selected for taking the CS measurements and
its index will be transmitted along with the measurements. Based on this index,
the corresponding Bernoulli matrix has been used in CS reconstruction algorithm
to get back the high-frequency wavelet sub-bands at the decoder. At the
decoder, a new Enhanced Approximate Message Passing (EAMP) algorithm has been
proposed to reconstruct the wavelet coefficients and apply the inverse wavelet
transform for restoring back the video frames. Simulation results have
established the superiority of the proposed framework over the existing schemes
and have increased its suitability for VLSI implementation. Moreover, the coded
video is found to be scalable with an increase in a number of levels of wavelet
decomposition
High speed VLSI architectures for DWT in biometric image compression: A study
AbstractBiometrics is a field that navigates through a vast database and extracts only the qualifying data to accelerate the processes of biometric authentication/recognition. Image compression is a vital part of the process. Various Very Large Scale Integration (VLSI) architectures have emerged to satisfy the real time requirements of the online processing of the applications. This paper studies various techniques that help in realizing the fast operation of the transform stage of the image compression processes. Various parameters that may involve in optimizations for high speed like computing time, silicon area, memory size etc are considered in the survey
Fast Continuous Haar and Fourier Transforms of Rectilinear Polygons from VLSI Layouts
We develop the pruned continuous Haar transform and the fast continuous
Fourier series, two fast and efficient algorithms for rectilinear polygons.
Rectilinear polygons are used in VLSI processes to describe design and mask
layouts of integrated circuits. The Fourier representation is at the heart of
many of these processes and the Haar transform is expected to play a major role
in techniques envisioned to speed up VLSI design. To ensure correct printing of
the constantly shrinking transistors and simultaneously handle their
increasingly large number, ever more computationally intensive techniques are
needed. Therefore, efficient algorithms for the Haar and Fourier transforms are
vital. We derive the complexity of both algorithms and compare it to that of
discrete transforms traditionally used in VLSI. We find a significant reduction
in complexity when the number of vertices of the polygons is small, as is the
case in VLSI layouts. This analysis is completed by an implementation and a
benchmark of the continuous algorithms and their discrete counterpart. We show
that on tested VLSI layouts the pruned continuous Haar transform is 5 to 25
times faster, while the fast continuous Fourier series is 1.5 to 3 times
faster.Comment: 10 pages, 10 figure
Implementation of DWT Integrated Log Based FPU with SPIHT Coders on FPGA
In this work, architecture is designed for integrating lifting based discrete wavelet transform (DWT) structure with logarithmic based floating point arithmetic units. As many algorithms were proposed for coding wavelet coefficients for image compression, Set-Partitioning in hierarchical trees algorithm (SPIHT) is found to be widely used due to its low–computational complexity and better method for compressing the images. However it is suffered from the drawback of occupying high memory space and hence produced less throughput. This drawback is overcome in this work by adopting modified SPIHT algorithm termed as block-based pass-parallel SPIHT (BPS) algorithm. The designed architecture is compared with multi-precision floating point arithmetic units and the synthesis results are presented. From the experimental synthesis results it is proved that the integration of DWT structure integrated with log based FPU core and BPS coder implemented on FPGA devices provided efficient area and high speed of computations. The proposed architecture is designed using Verilog HDL and synthesized on various Xilinx FPGA devices. The architecture designed in this work is useful for compressing the images with good compression ratio, better resolution of images and to obtain high peak to signal ratio
- …