76 research outputs found

    New memory-efficient hardware architecture of 2-D dual-mode lifting-based discrete wavelet transform for JPEG2000

    Get PDF
    [[abstract]]This work presents new algorithms and hardware architectures to improve the critical issues of the 2-D dual-mode (supporting 5/3 lossless and 9/7 lossy coding) lifting-based discrete wavelet transform (LDWT). The proposed 2-D dual-mode LDWT architecture has the advantages of low-transpose memory, low latency, and regular signal flow, which is suitable for VLSI implementation. The transpose memory requirement of the N ?? N 2-D 5/3 mode LDWT is 2N, and that of 2-D 9/7 mode LDWT is 4N. According to the comparison results, the proposed hardware architecture surpasses previous architectures in the aspects of lifting-based low-transpose memory size. It can be applied to real-time visual operations such as JPEG2000, MPEG-4 still texture object decoding, and wavelet-based scalable video coding.[[notice]]需補會議日期、性質、主辦單位[[conferencedate]]20081119~2008112

    Development of Lifting-based VLSI Architectures for Two-Dimensional Discrete Wavelet Transform

    Get PDF
    Two-dimensional discrete wavelet transform (2-D DWT) has evolved as an essential part of a modem compression system. It offers superior compression with good image quality and overcomes disadvantage of the discrete cosine transform, which suffers from blocks artifacts that reduces the quality of the inage. The amount of computations involve in 2-D DWT is enormous and cannot be processed by generalpurpose processors when real-time processing is required. Th·"efore, high speed and low power VLSI architecture that computes 2-D DWT effectively is needed. In this research, several VLSI architectures have been developed that meets real-time requirements for 2-D DWT applications. This research iaitially started off by implementing a software simulation program that decorrelates the original image and reconstructs the original image from the decorrelated image. Then, based on the information gained from implementing the simulation program, a new approach for designing lifting-based VLSI architectures for 2-D forward DWT is introduced. As a result, two high performance VLSI architectures that perform 2-D DWT for 5/3 and 9/7 filters are developed based on overlapped and nonoverlapped scan methods. Then, the intermediate architecture is developed, which aim a·: reducing the power consumption of the overlapped areas without using the expensive line buffer. In order to best meet real-time applications of 2-D DWT with demanding requirements in terms of speed and throughput parallelism is explored. The single pipelined intermediate and overlapped architectures are extended to 2-, 3-, and 4-parallel architectures to achieve speed factors of 2, 3, and 4, respectively. To further demonstrate the effectiveness of the approach single and para.llel VLSI architectures for 2-D inverse discrete wavelet transform (2-D IDWT) are developed. Furthermore, 2-D DWT memory architectures, which have been overlooked in the literature, are also developed. Finally, to show the architectural models developed for 2-D DWT are simple to control, the control algorithms for 4-parallel architecture based on the first scan method is developed. To validate architectures develcped in this work five architectures are implemented and simulated on Altera FPGA. In compliance with the terms of the Copyright Act 1987 and the IP Policy of the university, the copyright of this thesis has been reassigned by the author to the legal entity of the university, Institute of Technology PETRONAS Sdn bhd. Due acknowledgement shall always be made of the use of any material contained in, or derived from, this thesis

    Guest Editorial: Special Issue On Multirate Systems, Filter Banks, Wavelets, And Applications

    Get PDF
    The last decade has seen a tremendous amount of activity and emergence of applications in the areas of filter banks and wavelets. These topics are of such wide interest that there have been papers in many different journals, conferences, and workshops in diverse disciplines. However, many aspects of the theory, design, and application of filter banks and wavelets are of great interest to the circuits and systems community as well. Our editorial team felt that this was a perfect time to put together a special issue with state-of-the-art papers on these popular topics. All of the papers have been peer-reviewed according to the usual practice of this TRANSACTIONS. Almost in parallel, there is also a similar special issue (April 1998) by the IEEE TRANSACTIONS ON SIGNAL PROCESSING, with a slightly greater emphasis on applications. Many well-known authors have contributed articles to these special issues and we expect these to serve as valuable references for a long time to come

    Development Of Efficient Multi-Level Discrete Wavelet Transform Hardware Architecture For Image Compression

    Get PDF
    Berfokuskan pengkomputeran intensif dalam gelombang kecil diskret (DWT), reka bentuk seni bina perkakasan efisen bagi pengkomputeran laju menjadi imperatif terutamanya dalam aplikasi masa nyata. Focusing on the intensive computations involved in the discrete wavelet transform (DWT), the design of efficient hardware architectures for a fast computation of the transform has become imperative, especially for real-time applications

    High-Speed Pipeline VLSI Architectures for Discrete Wavelet Transforms

    Get PDF
    The discrete wavelet transform (DWT) has been widely used in many fields, such as image compression, speech analysis and pattern recognition, because of its capability of decomposing a signal at multiple resolution levels. Due to the intensive computations involved with this transform, the design of efficient VLSI architectures for a fast computation of the transforms have become essential, especially for real-time applications and those requiring processing of high-speed data. The objective of this thesis is to develop a scheme for the design of hardware resource-efficient high-speed pipeline architectures for the computation of the DWT. The goal of high speed is achieved by maximizing the operating frequency and minimizing the number of clock cycles required for the DWT computation with little or no overhead on the hardware resources. In this thesis, an attempt is made to reach this goal by enhancing the inter-stage and intra-stage parallelisms through a systematic exploitation of the characteristics inherent in discrete wavelet transforms. In order to enhance the inter-stage parallelism, a study is undertaken for determining the number of pipeline stages required for the DWT computation so as to synchronize their operations and utilize their hardware resources efficiently. This is achieved by optimally distributing the computational load associated with the various resolution levels to an optimum number of stages of the pipeline. This study has determined that employment of two pipeline stages with the first one performing the task of the first resolution level and the second one that of all the other resolution levels of the 1-D DWT computation, and employment of three pipeline stages with the first and second ones performing the tasks of the first and second resolution levels and the third one performing that of the remaining resolution levels of the 2-D DWT computation, are the optimum choices for the development of 1-D and 2-D pipeline architectures, respectively. The enhancement of the intra-stage parallelism is based on two main ideas. The first idea, which stems from the fact that in each consecutive resolution level the input data are decimated by a factor of two along each dimension, is to decompose the filtering operation into subtasks that can be performed in parallel by operating on even- and odd-numbered samples along each dimension of the data. It is shown that each subtask, which is essentially a set of multiply-accumulate operations, can be performed by employing a MAC-cell network consisting of a two-dimensional array of bit-wise adders. The second idea in enhancing the intra-stage parallelism is to maximally extend the bit-wise addition operations of this network horizontally through a suitable arrangement of bit-wise adders so as to minimize the delay of its critical path. In order to validate the proposed scheme, design and implementation of two specific examples of pipeline architectures for the 1-D and 2-D DWT computations are considered. The simulation results show that the pipeline architectures designed using the proposed scheme are able to operate at high clock frequencies, and their performances, in terms of the processing speed and area-time product, are superior to those of the architectures designed based on other schemes and utilizing similar or higher amount of hardware resources. Finally, the two pipeline architectures designed using the proposed scheme are implemented in FPGA. The test results of the FPGA implementations validate the feasibility and effectiveness of the proposed scheme for designing DWT pipeline architectures

    Exploiting parallelism within multidimensional multirate digital signal processing systems

    Get PDF
    The intense requirements for high processing rates of multidimensional Digital Signal Processing systems in practical applications justify the Application Specific Integrated Circuits designs and parallel processing implementations. In this dissertation, we propose novel theories, methodologies and architectures in designing high-performance VLSI implementations for general multidimensional multirate Digital Signal Processing systems by exploiting the parallelism within those applications. To systematically exploit the parallelism within the multidimensional multirate DSP algorithms, we develop novel transformations including (1) nonlinear I/O data space transforms, (2) intercalation transforms, and (3) multidimensional multirate unfolding transforms. These transformations are applied to the algorithms leading to systematic methodologies in high-performance architectural designs. With the novel design methodologies, we develop several architectures with parallel and distributed processing features for implementing multidimensional multirate applications. Experimental results have shown that those architectures are much more efficient in terms of execution time and/or hardware cost compared with existing hardware implementations

    Algorithms and architectures for the multirate additive synthesis of musical tones

    Get PDF
    In classical Additive Synthesis (AS), the output signal is the sum of a large number of independently controllable sinusoidal partials. The advantages of AS for music synthesis are well known as is the high computational cost. This thesis is concerned with the computational optimisation of AS by multirate DSP techniques. In note-based music synthesis, the expected bounds of the frequency trajectory of each partial in a finite lifecycle tone determine critical time-invariant partial-specific sample rates which are lower than the conventional rate (in excess of 40kHz) resulting in computational savings. Scheduling and interpolation (to suppress quantisation noise) for many sample rates is required, leading to the concept of Multirate Additive Synthesis (MAS) where these overheads are minimised by synthesis filterbanks which quantise the set of available sample rates. Alternative AS optimisations are also appraised. It is shown that a hierarchical interpretation of the QMF filterbank preserves AS generality and permits efficient context-specific adaptation of computation to required note dynamics. Practical QMF implementation and the modifications necessary for MAS are discussed. QMF transition widths can be logically excluded from the MAS paradigm, at a cost. Therefore a novel filterbank is evaluated where transition widths are physically excluded. Benchmarking of a hypothetical orchestral synthesis application provides a tentative quantitative analysis of the performance improvement of MAS over AS. The mapping of MAS into VLSI is opened by a review of sine computation techniques. Then the functional specification and high-level design of a conceptual MAS Coprocessor (MASC) is developed which functions with high autonomy in a loosely-coupled master- slave configuration with a Host CPU which executes filterbanks in software. Standard hardware optimisation techniques are used, such as pipelining, based upon the principle of an application-specific memory hierarchy which maximises MASC throughput
    corecore