1,644 research outputs found

    Time and frequency domain algorithms for speech coding

    Get PDF
    The promise of digital hardware economies (due to recent advances in VLSI technology), has focussed much attention on more complex and sophisticated speech coding algorithms which offer improved quality at relatively low bit rates. This thesis describes the results (obtained from computer simulations) of research into various efficient (time and frequency domain) speech encoders operating at a transmission bit rate of 16 Kbps. In the time domain, Adaptive Differential Pulse Code Modulation (ADPCM) systems employing both forward and backward adaptive prediction were examined. A number of algorithms were proposed and evaluated, including several variants of the Stochastic Approximation Predictor (SAP). A Backward Block Adaptive (BBA) predictor was also developed and found to outperform the conventional stochastic methods, even though its complexity in terms of signal processing requirements is lower. A simplified Adaptive Predictive Coder (APC) employing a single tap pitch predictor considered next provided a slight improvement in performance over ADPCM, but with rather greater complexity. The ultimate test of any speech coding system is the perceptual performance of the received speech. Recent research has indicated that this may be enhanced by suitable control of the noise spectrum according to the theory of auditory masking. Various noise shaping ADPCM configurations were examined, and it was demonstrated that a proposed pre-/post-filtering arrangement which exploits advantageously the predictor-quantizer interaction, leads to the best subjective performance in both forward and backward prediction systems. Adaptive quantization is instrumental to the performance of ADPCM systems. Both the forward adaptive quantizer (AQF) and the backward oneword memory adaptation (AQJ) were examined. In addition, a novel method of decreasing quantization noise in ADPCM-AQJ coders, which involves the application of correction to the decoded speech samples, provided reduced output noise across the spectrum, with considerable high frequency noise suppression. More powerful (and inevitably more complex) frequency domain speech coders such as the Adaptive Transform Coder (ATC) and the Sub-band Coder (SBC) offer good quality speech at 16 Kbps. To reduce complexity and coding delay, whilst retaining the advantage of sub-band coding, a novel transform based split-band coder (TSBC) was developed and found to compare closely in performance with the SBC. To prevent the heavy side information requirement associated with a large number of bands in split-band coding schemes from impairing coding accuracy, without forgoing the efficiency provided by adaptive bit allocation, a method employing AQJs to code the sub-band signals together with vector quantization of the bit allocation patterns was also proposed. Finally, 'pipeline' methods of bit allocation and step size estimation (using the Fast Fourier Transform (FFT) on the input signal) were examined. Such methods, although less accurate, are nevertheless useful in limiting coding delay associated with SRC schemes employing Quadrature Mirror Filters (QMF)

    WAVELET-DCT BASED IMAGE CODER FOR VIDEO CODING APPLICATIONS

    Get PDF
    This project is about the implementation ofWavelet-DCT intra-frame coder for video coding applications. Wavelet-DCT is a novel algorithm that uses Forward Discrete Wavelet Transform (DWT) to compute DCT. It is proved that the algorithm has better compression performance for difference images compared to conventional DCT. This is possible since the algorithm allows discarding insignificant DWT coefficients or more popularly known thresholding the DWT coefficients while computing the DCT. In video coder applications, wavelet-DCT is capable to achieve greater compression. This project is a feasibility study on the performance ofWavelet-DCT in video coder applications. ASIMULINK model for conventional intra-frame coder is developed and tested, with very significant data bit reduction achieved. Then, the conventional DCT block has been replaced with a Wavelet-DCT block. In the study, on one hand, experiment is conducted on difference image for conventional intra-frame coder; on the other, the same difference image with Wavelet-DCT based intra-frame coder. The thresholding algorithm is used to remove some ofthe insignificant DWT coefficients from the difference image. The main objective is to achieve a better compression capability for difference image within video coding applications. The project's experimental results supports our claim that implementation ofWavelet-DCT in intraframe coder within a video coding application could improve the system's performance with a greater compression ratio at the same Mean Squared Error

    Low bit rate digital apeech signal processing systems

    Get PDF
    Imperial Users onl

    Orthogonal transforms and their application to image coding

    Get PDF
    Imperial Users onl

    Contributions in image and video coding

    Get PDF
    Orientador: Max Henrique Machado CostaTese (doutorado) - Universidade Estadual de Campinas, Faculdade de Engenharia Elétrica e de ComputaçãoResumo: A comunidade de codificação de imagens e vídeo vem também trabalhando em inovações que vão além das tradicionais técnicas de codificação de imagens e vídeo. Este trabalho é um conjunto de contribuições a vários tópicos que têm recebido crescente interesse de pesquisadores na comunidade, nominalmente, codificação escalável, codificação de baixa complexidade para dispositivos móveis, codificação de vídeo de múltiplas vistas e codificação adaptativa em tempo real. A primeira contribuição estuda o desempenho de três transformadas 3-D rápidas por blocos em um codificador de vídeo de baixa complexidade. O codificador recebeu o nome de Fast Embedded Video Codec (FEVC). Novos métodos de implementação e ordens de varredura são propostos para as transformadas. Os coeficiente 3-D são codificados por planos de bits pelos codificadores de entropia, produzindo um fluxo de bits (bitstream) de saída totalmente embutida. Todas as implementações são feitas usando arquitetura com aritmética inteira de 16 bits. Somente adições e deslocamentos de bits são necessários, o que reduz a complexidade computacional. Mesmo com essas restrições, um bom desempenho em termos de taxa de bits versus distorção pôde ser obtido e os tempos de codificação são significativamente menores (em torno de 160 vezes) quando comparados ao padrão H.264/AVC. A segunda contribuição é a otimização de uma recente abordagem proposta para codificação de vídeo de múltiplas vistas em aplicações de video-conferência e outras aplicações do tipo "unicast" similares. O cenário alvo nessa abordagem é fornecer vídeo com percepção real em 3-D e ponto de vista livre a boas taxas de compressão. Para atingir tal objetivo, pesos são atribuídos a cada vista e mapeados em parâmetros de quantização. Neste trabalho, o mapeamento ad-hoc anteriormente proposto entre pesos e parâmetros de quantização é mostrado ser quase-ótimo para uma fonte Gaussiana e um mapeamento ótimo é derivado para fonte típicas de vídeo. A terceira contribuição explora várias estratégias para varredura adaptativa dos coeficientes da transformada no padrão JPEG XR. A ordem de varredura original, global e adaptativa do JPEG XR é comparada com os métodos de varredura localizados e híbridos propostos neste trabalho. Essas novas ordens não requerem mudanças nem nos outros estágios de codificação e decodificação, nem na definição da bitstream A quarta e última contribuição propõe uma transformada por blocos dependente do sinal. As transformadas hierárquicas usualmente exploram a informação residual entre os níveis no estágio da codificação de entropia, mas não no estágio da transformada. A transformada proposta neste trabalho é uma técnica de compactação de energia que também explora as similaridades estruturais entre os níveis de resolução. A idéia central da técnica é incluir na transformada hierárquica um número de funções de base adaptativas derivadas da resolução menor do sinal. Um codificador de imagens completo foi desenvolvido para medir o desempenho da nova transformada e os resultados obtidos são discutidos neste trabalhoAbstract: The image and video coding community has often been working on new advances that go beyond traditional image and video architectures. This work is a set of contributions to various topics that have received increasing attention from researchers in the community, namely, scalable coding, low-complexity coding for portable devices, multiview video coding and run-time adaptive coding. The first contribution studies the performance of three fast block-based 3-D transforms in a low complexity video codec. The codec has received the name Fast Embedded Video Codec (FEVC). New implementation methods and scanning orders are proposed for the transforms. The 3-D coefficients are encoded bit-plane by bit-plane by entropy coders, producing a fully embedded output bitstream. All implementation is performed using 16-bit integer arithmetic. Only additions and bit shifts are necessary, thus lowering computational complexity. Even with these constraints, reasonable rate versus distortion performance can be achieved and the encoding time is significantly smaller (around 160 times) when compared to the H.264/AVC standard. The second contribution is the optimization of a recent approach proposed for multiview video coding in videoconferencing applications or other similar unicast-like applications. The target scenario in this approach is providing realistic 3-D video with free viewpoint video at good compression rates. To achieve such an objective, weights are computed for each view and mapped into quantization parameters. In this work, the previously proposed ad-hoc mapping between weights and quantization parameters is shown to be quasi-optimum for a Gaussian source and an optimum mapping is derived for a typical video source. The third contribution exploits several strategies for adaptive scanning of transform coefficients in the JPEG XR standard. The original global adaptive scanning order applied in JPEG XR is compared with the localized and hybrid scanning methods proposed in this work. These new orders do not require changes in either the other coding and decoding stages or in the bitstream definition. The fourth and last contribution proposes an hierarchical signal dependent block-based transform. Hierarchical transforms usually exploit the residual cross-level information at the entropy coding step, but not at the transform step. The transform proposed in this work is an energy compaction technique that can also exploit these cross-resolution-level structural similarities. The core idea of the technique is to include in the hierarchical transform a number of adaptive basis functions derived from the lower resolution of the signal. A full image codec is developed in order to measure the performance of the new transform and the obtained results are discussed in this workDoutoradoTelecomunicações e TelemáticaDoutor em Engenharia Elétric

    Implementation of BMA based motion estimation hardware accelerator in HDL

    Full text link
    Motion Estimation in MPEG (Motion Pictures Experts Group) video is a temporal prediction technique. The basic principle of motion estimation is that in most cases, consecutive video frames will be similar except for changes induced by objects moving within the frames. Motion Estimation performs a comprehensive 2-dimensional spatial search for each luminance macroblock (16x16 pixel block). MPEG does not define how this search should be performed. This is a detail that the system designer can choose to implement in one of many possible ways. It is well known that a full, exhaustive search over a wide 2-dimensional area yields the best matching results in most cases, but this performance comes at an extreme computational cost to the encoder. Some lower cost encoders might choose to limit the pixel search range, or use other techniques usually at some cost to the video quality which gives rise to a trade-off; Such algorithms used in image processing are generally computationally expensive. FPGAs are capable of running graphics algorithms at the speed comparable to dedicated graphics chips. At the same time they are configurable through high-level programming languages, e.g. Verilog, VHDL. The work presented entirely focuses upon a Hardware Accelerator capable of performing Motion Estimation, based upon Block Matching Algorithm. The SAD based Full Search Motion Estimation coded using Verilog HDL, relies upon a 32x32 pixel search area to find the best match for single 16x16 macroblock; Keywords. Motion Estimation, MPEG, macroblock, FPGA, SAD, Verilog, VHDL

    DCT Implementation on GPU

    Get PDF
    There has been a great progress in the field of graphics processors. Since, there is no rise in the speed of the normal CPU processors; Designers are coming up with multi-core, parallel processors. Because of their popularity in parallel processing, GPUs are becoming more and more attractive for many applications. With the increasing demand in utilizing GPUs, there is a great need to develop operating systems that handle the GPU to full capacity. GPUs offer a very efficient environment for many image processing applications. This thesis explores the processing power of GPUs for digital image compression using Discrete cosine transform

    Dynamically Reconfigurable Architectures and Systems for Time-varying Image Constraints (DRASTIC) for Image and Video Compression

    Get PDF
    In the current information booming era, image and video consumption is ubiquitous. The associated image and video coding operations require significant computing resources for both small-scale computing systems as well as over larger network systems. For different scenarios, power, bitrate and image quality can impose significant time-varying constraints. For example, mobile devices (e.g., phones, tablets, laptops, UAVs) come with significant constraints on energy and power. Similarly, computer networks provide time-varying bandwidth that can depend on signal strength (e.g., wireless networks) or network traffic conditions. Alternatively, the users can impose different constraints on image quality based on their interests. Traditional image and video coding systems have focused on rate-distortion optimization. More recently, distortion measures (e.g., PSNR) are being replaced by more sophisticated image quality metrics. However, these systems are based on fixed hardware configurations that provide limited options over power consumption. The use of dynamic partial reconfiguration with Field Programmable Gate Arrays (FPGAs) provides an opportunity to effectively control dynamic power consumption by jointly considering software-hardware configurations. This dissertation extends traditional rate-distortion optimization to rate-quality-power/energy optimization and demonstrates a wide variety of applications in both image and video compression. In each application, a family of Pareto-optimal configurations are developed that allow fine control in the rate-quality-power/energy optimization space. The term Dynamically Reconfiguration Architecture Systems for Time-varying Image Constraints (DRASTIC) is used to describe the derived systems. DRASTIC covers both software-only as well as software-hardware configurations to achieve fine optimization over a set of general modes that include: (i) maximum image quality, (ii) minimum dynamic power/energy, (iii) minimum bitrate, and (iv) typical mode over a set of opposing constraints to guarantee satisfactory performance. In joint software-hardware configurations, DRASTIC provides an effective approach for dynamic power optimization. For software configurations, DRASTIC provides an effective method for energy consumption optimization by controlling processing times. The dissertation provides several applications. First, stochastic methods are given for computing quantization tables that are optimal in the rate-quality space and demonstrated on standard JPEG compression. Second, a DRASTIC implementation of the DCT is used to demonstrate the effectiveness of the approach on motion JPEG. Third, a reconfigurable deblocking filter system is investigated for use in the current H.264/AVC systems. Fourth, the dissertation develops DRASTIC for all 35 intra-prediction modes as well as intra-encoding for the emerging High Efficiency Video Coding standard (HEVC)

    Side information exploitation, quality control and low complexity implementation for distributed video coding

    Get PDF
    Distributed video coding (DVC) is a new video coding methodology that shifts the highly complex motion search components from the encoder to the decoder, such a video coder would have a great advantage in encoding speed and it is still able to achieve similar rate-distortion performance as the conventional coding solutions. Applications include wireless video sensor networks, mobile video cameras and wireless video surveillance, etc. Although many progresses have been made in DVC over the past ten years, there is still a gap in RD performance between conventional video coding solutions and DVC. The latest development of DVC is still far from standardization and practical use. The key problems remain in the areas such as accurate and efficient side information generation and refinement, quality control between Wyner-Ziv frames and key frames, correlation noise modelling and decoder complexity, etc. Under this context, this thesis proposes solutions to improve the state-of-the-art side information refinement schemes, enable consistent quality control over decoded frames during coding process and implement highly efficient DVC codec. This thesis investigates the impact of reference frames on side information generation and reveals that reference frames have the potential to be better side information than the extensively used interpolated frames. Based on this investigation, we also propose a motion range prediction (MRP) method to exploit reference frames and precisely guide the statistical motion learning process. Extensive simulation results show that choosing reference frames as SI performs competitively, and sometimes even better than interpolated frames. Furthermore, the proposed MRP method is shown to significantly reduce the decoding complexity without degrading any RD performance. To minimize the block artifacts and achieve consistent improvement in both subjective and objective quality of side information, we propose a novel side information synthesis framework working on pixel granularity. We synthesize the SI at pixel level to minimize the block artifacts and adaptively change the correlation noise model according to the new SI. Furthermore, we have fully implemented a state-of-the-art DVC decoder with the proposed framework using serial and parallel processing technologies to identify bottlenecks and areas to further reduce the decoding complexity, which is another major challenge for future practical DVC system deployments. The performance is evaluated based on the latest transform domain DVC codec and compared with different standard codecs. Extensive experimental results show substantial and consistent rate-distortion gains over standard video codecs and significant speedup over serial implementation. In order to bring the state-of-the-art DVC one step closer to practical use, we address the problem of distortion variation introduced by typical rate control algorithms, especially in a variable bit rate environment. Simulation results show that the proposed quality control algorithm is capable to meet user defined target distortion and maintain a rather small variation for sequence with slow motion and performs similar to fixed quantization for fast motion sequence at the cost of some RD performance. Finally, we propose the first implementation of a distributed video encoder on a Texas Instruments TMS320DM6437 digital signal processor. The WZ encoder is efficiently implemented, using rate adaptive low-density-parity-check accumulative (LDPCA) codes, exploiting the hardware features and optimization techniques to improve the overall performance. Implementation results show that the WZ encoder is able to encode at 134M instruction cycles per QCIF frame on a TMS320DM6437 DSP running at 700MHz. This results in encoder speed 29 times faster than non-optimized encoder implementation. We also implemented a highly efficient DVC decoder using both serial and parallel technology based on a PC-HPC (high performance cluster) architecture, where the encoder is running in a general purpose PC and the decoder is running in a multicore HPC. The experimental results show that the parallelized decoder can achieve about 10 times speedup under various bit-rates and GOP sizes compared to the serial implementation and significant RD gains with regards to the state-of-the-art DISCOVER codec
    • …
    corecore