1,644 research outputs found
Time and frequency domain algorithms for speech coding
The promise of digital hardware economies (due to recent advances in
VLSI technology), has focussed much attention on more complex and sophisticated
speech coding algorithms which offer improved quality at relatively
low bit rates.
This thesis describes the results (obtained from computer simulations)
of research into various efficient (time and frequency domain) speech
encoders operating at a transmission bit rate of 16 Kbps.
In the time domain, Adaptive Differential Pulse Code Modulation (ADPCM)
systems employing both forward and backward adaptive prediction were
examined. A number of algorithms were proposed and evaluated, including
several variants of the Stochastic Approximation Predictor (SAP). A
Backward Block Adaptive (BBA) predictor was also developed and found to
outperform the conventional stochastic methods, even though its complexity
in terms of signal processing requirements is lower. A simplified
Adaptive Predictive Coder (APC) employing a single tap pitch predictor
considered next provided a slight improvement in performance over ADPCM,
but with rather greater complexity.
The ultimate test of any speech coding system is the perceptual performance
of the received speech. Recent research has indicated that this
may be enhanced by suitable control of the noise spectrum according to
the theory of auditory masking. Various noise shaping ADPCM
configurations were examined, and it was demonstrated that a proposed
pre-/post-filtering arrangement which exploits advantageously the
predictor-quantizer interaction, leads to the best subjective
performance in both forward and backward prediction systems.
Adaptive quantization is instrumental to the performance of ADPCM systems.
Both the forward adaptive quantizer (AQF) and the backward oneword
memory adaptation (AQJ) were examined. In addition, a novel method
of decreasing quantization noise in ADPCM-AQJ coders, which involves the
application of correction to the decoded speech samples, provided
reduced output noise across the spectrum, with considerable high frequency
noise suppression.
More powerful (and inevitably more complex) frequency domain speech
coders such as the Adaptive Transform Coder (ATC) and the Sub-band Coder
(SBC) offer good quality speech at 16 Kbps. To reduce complexity and
coding delay, whilst retaining the advantage of sub-band coding, a novel
transform based split-band coder (TSBC) was developed and found to compare
closely in performance with the SBC.
To prevent the heavy side information requirement associated with a
large number of bands in split-band coding schemes from impairing coding
accuracy, without forgoing the efficiency provided by adaptive bit
allocation, a method employing AQJs to code the sub-band signals together
with vector quantization of the bit allocation patterns was also
proposed.
Finally, 'pipeline' methods of bit allocation and step size estimation
(using the Fast Fourier Transform (FFT) on the input signal) were examined.
Such methods, although less accurate, are nevertheless useful in
limiting coding delay associated with SRC schemes employing Quadrature
Mirror Filters (QMF)
WAVELET-DCT BASED IMAGE CODER FOR VIDEO CODING APPLICATIONS
This project is about the implementation ofWavelet-DCT intra-frame coder for video
coding applications. Wavelet-DCT is a novel algorithm that uses Forward Discrete
Wavelet Transform (DWT) to compute DCT. It is proved that the algorithm has better
compression performance for difference images compared to conventional DCT. This
is possible since the algorithm allows discarding insignificant DWT coefficients or
more popularly known thresholding the DWT coefficients while computing the DCT.
In video coder applications, wavelet-DCT is capable to achieve greater compression.
This project is a feasibility study on the performance ofWavelet-DCT in video coder
applications. ASIMULINK model for conventional intra-frame coder is developed
and tested, with very significant data bit reduction achieved. Then, the conventional
DCT block has been replaced with a Wavelet-DCT block. In the study, on one hand,
experiment is conducted on difference image for conventional intra-frame coder; on
the other, the same difference image with Wavelet-DCT based intra-frame coder. The
thresholding algorithm is used to remove some ofthe insignificant DWT coefficients
from the difference image. The main objective is to achieve a better compression
capability for difference image within video coding applications. The project's
experimental results supports our claim that implementation ofWavelet-DCT in intraframe
coder within a video coding application could improve the system's
performance with a greater compression ratio at the same Mean Squared Error
Low bit rate digital apeech signal processing systems
Imperial Users onl
Orthogonal transforms and their application to image coding
Imperial Users onl
Contributions in image and video coding
Orientador: Max Henrique Machado CostaTese (doutorado) - Universidade Estadual de Campinas, Faculdade de Engenharia ElĂ©trica e de ComputaçãoResumo: A comunidade de codificação de imagens e vĂdeo vem tambĂ©m trabalhando em inovações que vĂŁo alĂ©m das tradicionais tĂ©cnicas de codificação de imagens e vĂdeo. Este trabalho Ă© um conjunto de contribuições a vários tĂłpicos que tĂŞm recebido crescente interesse de pesquisadores na comunidade, nominalmente, codificação escalável, codificação de baixa complexidade para dispositivos mĂłveis, codificação de vĂdeo de mĂşltiplas vistas e codificação adaptativa em tempo real. A primeira contribuição estuda o desempenho de trĂŞs transformadas 3-D rápidas por blocos em um codificador de vĂdeo de baixa complexidade. O codificador recebeu o nome de Fast Embedded Video Codec (FEVC). Novos mĂ©todos de implementação e ordens de varredura sĂŁo propostos para as transformadas. Os coeficiente 3-D sĂŁo codificados por planos de bits pelos codificadores de entropia, produzindo um fluxo de bits (bitstream) de saĂda totalmente embutida. Todas as implementações sĂŁo feitas usando arquitetura com aritmĂ©tica inteira de 16 bits. Somente adições e deslocamentos de bits sĂŁo necessários, o que reduz a complexidade computacional. Mesmo com essas restrições, um bom desempenho em termos de taxa de bits versus distorção pĂ´de ser obtido e os tempos de codificação sĂŁo significativamente menores (em torno de 160 vezes) quando comparados ao padrĂŁo H.264/AVC. A segunda contribuição Ă© a otimização de uma recente abordagem proposta para codificação de vĂdeo de mĂşltiplas vistas em aplicações de video-conferĂŞncia e outras aplicações do tipo "unicast" similares. O cenário alvo nessa abordagem Ă© fornecer vĂdeo com percepção real em 3-D e ponto de vista livre a boas taxas de compressĂŁo. Para atingir tal objetivo, pesos sĂŁo atribuĂdos a cada vista e mapeados em parâmetros de quantização. Neste trabalho, o mapeamento ad-hoc anteriormente proposto entre pesos e parâmetros de quantização Ă© mostrado ser quase-Ăłtimo para uma fonte Gaussiana e um mapeamento Ăłtimo Ă© derivado para fonte tĂpicas de vĂdeo. A terceira contribuição explora várias estratĂ©gias para varredura adaptativa dos coeficientes da transformada no padrĂŁo JPEG XR. A ordem de varredura original, global e adaptativa do JPEG XR Ă© comparada com os mĂ©todos de varredura localizados e hĂbridos propostos neste trabalho. Essas novas ordens nĂŁo requerem mudanças nem nos outros estágios de codificação e decodificação, nem na definição da bitstream A quarta e Ăşltima contribuição propõe uma transformada por blocos dependente do sinal. As transformadas hierárquicas usualmente exploram a informação residual entre os nĂveis no estágio da codificação de entropia, mas nĂŁo no estágio da transformada. A transformada proposta neste trabalho Ă© uma tĂ©cnica de compactação de energia que tambĂ©m explora as similaridades estruturais entre os nĂveis de resolução. A idĂ©ia central da tĂ©cnica Ă© incluir na transformada hierárquica um nĂşmero de funções de base adaptativas derivadas da resolução menor do sinal. Um codificador de imagens completo foi desenvolvido para medir o desempenho da nova transformada e os resultados obtidos sĂŁo discutidos neste trabalhoAbstract: The image and video coding community has often been working on new advances that go beyond traditional image and video architectures. This work is a set of contributions to various topics that have received increasing attention from researchers in the community, namely, scalable coding, low-complexity coding for portable devices, multiview video coding and run-time adaptive coding. The first contribution studies the performance of three fast block-based 3-D transforms in a low complexity video codec. The codec has received the name Fast Embedded Video Codec (FEVC). New implementation methods and scanning orders are proposed for the transforms. The 3-D coefficients are encoded bit-plane by bit-plane by entropy coders, producing a fully embedded output bitstream. All implementation is performed using 16-bit integer arithmetic. Only additions and bit shifts are necessary, thus lowering computational complexity. Even with these constraints, reasonable rate versus distortion performance can be achieved and the encoding time is significantly smaller (around 160 times) when compared to the H.264/AVC standard. The second contribution is the optimization of a recent approach proposed for multiview video coding in videoconferencing applications or other similar unicast-like applications. The target scenario in this approach is providing realistic 3-D video with free viewpoint video at good compression rates. To achieve such an objective, weights are computed for each view and mapped into quantization parameters. In this work, the previously proposed ad-hoc mapping between weights and quantization parameters is shown to be quasi-optimum for a Gaussian source and an optimum mapping is derived for a typical video source. The third contribution exploits several strategies for adaptive scanning of transform coefficients in the JPEG XR standard. The original global adaptive scanning order applied in JPEG XR is compared with the localized and hybrid scanning methods proposed in this work. These new orders do not require changes in either the other coding and decoding stages or in the bitstream definition. The fourth and last contribution proposes an hierarchical signal dependent block-based transform. Hierarchical transforms usually exploit the residual cross-level information at the entropy coding step, but not at the transform step. The transform proposed in this work is an energy compaction technique that can also exploit these cross-resolution-level structural similarities. The core idea of the technique is to include in the hierarchical transform a number of adaptive basis functions derived from the lower resolution of the signal. A full image codec is developed in order to measure the performance of the new transform and the obtained results are discussed in this workDoutoradoTelecomunicações e TelemáticaDoutor em Engenharia ElĂ©tric
Implementation of BMA based motion estimation hardware accelerator in HDL
Motion Estimation in MPEG (Motion Pictures Experts Group) video is a temporal prediction technique. The basic principle of motion estimation is that in most cases, consecutive video frames will be similar except for changes induced by objects moving within the frames. Motion Estimation performs a comprehensive 2-dimensional spatial search for each luminance macroblock (16x16 pixel block). MPEG does not define how this search should be performed. This is a detail that the system designer can choose to implement in one of many possible ways. It is well known that a full, exhaustive search over a wide 2-dimensional area yields the best matching results in most cases, but this performance comes at an extreme computational cost to the encoder. Some lower cost encoders might choose to limit the pixel search range, or use other techniques usually at some cost to the video quality which gives rise to a trade-off; Such algorithms used in image processing are generally computationally expensive. FPGAs are capable of running graphics algorithms at the speed comparable to dedicated graphics chips. At the same time they are configurable through high-level programming languages, e.g. Verilog, VHDL. The work presented entirely focuses upon a Hardware Accelerator capable of performing Motion Estimation, based upon Block Matching Algorithm. The SAD based Full Search Motion Estimation coded using Verilog HDL, relies upon a 32x32 pixel search area to find the best match for single 16x16 macroblock; Keywords. Motion Estimation, MPEG, macroblock, FPGA, SAD, Verilog, VHDL
DCT Implementation on GPU
There has been a great progress in the field of graphics processors. Since, there is no rise in the speed of the normal CPU processors; Designers are coming up with multi-core, parallel processors. Because of their popularity in parallel processing, GPUs are becoming more and more attractive for many applications. With the increasing demand in utilizing GPUs, there is a great need to develop operating systems that handle the GPU to full capacity. GPUs offer a very efficient environment for many image processing applications. This thesis explores the processing power of GPUs for digital image compression using Discrete cosine transform
Dynamically Reconfigurable Architectures and Systems for Time-varying Image Constraints (DRASTIC) for Image and Video Compression
In the current information booming era, image and video consumption is ubiquitous. The associated image and video coding operations require significant computing resources for both small-scale computing systems as well as over larger network systems. For different scenarios, power, bitrate and image quality can impose significant time-varying constraints. For example, mobile devices (e.g., phones, tablets, laptops, UAVs) come with significant constraints on energy and power. Similarly, computer networks provide time-varying bandwidth that can depend on signal strength (e.g., wireless networks) or network traffic conditions. Alternatively, the users can impose different constraints on image quality based on their interests. Traditional image and video coding systems have focused on rate-distortion optimization. More recently, distortion measures (e.g., PSNR) are being replaced by more sophisticated image quality metrics. However, these systems are based on fixed hardware configurations that provide limited options over power consumption. The use of dynamic partial reconfiguration with Field Programmable Gate Arrays (FPGAs) provides an opportunity to effectively control dynamic power consumption by jointly considering software-hardware configurations. This dissertation extends traditional rate-distortion optimization to rate-quality-power/energy optimization and demonstrates a wide variety of applications in both image and video compression. In each application, a family of Pareto-optimal configurations are developed that allow fine control in the rate-quality-power/energy optimization space. The term Dynamically Reconfiguration Architecture Systems for Time-varying Image Constraints (DRASTIC) is used to describe the derived systems. DRASTIC covers both software-only as well as software-hardware configurations to achieve fine optimization over a set of general modes that include: (i) maximum image quality, (ii) minimum dynamic power/energy, (iii) minimum bitrate, and (iv) typical mode over a set of opposing constraints to guarantee satisfactory performance. In joint software-hardware configurations, DRASTIC provides an effective approach for dynamic power optimization. For software configurations, DRASTIC provides an effective method for energy consumption optimization by controlling processing times. The dissertation provides several applications. First, stochastic methods are given for computing quantization tables that are optimal in the rate-quality space and demonstrated on standard JPEG compression. Second, a DRASTIC implementation of the DCT is used to demonstrate the effectiveness of the approach on motion JPEG. Third, a reconfigurable deblocking filter system is investigated for use in the current H.264/AVC systems. Fourth, the dissertation develops DRASTIC for all 35 intra-prediction modes as well as intra-encoding for the emerging High Efficiency Video Coding standard (HEVC)
Side information exploitation, quality control and low complexity implementation for distributed video coding
Distributed video coding (DVC) is a new video coding methodology that shifts the highly complex motion search components from the encoder to the decoder, such a video coder would have a great advantage in encoding speed and it is still able to achieve similar rate-distortion performance as the conventional coding solutions. Applications include wireless video sensor networks, mobile video cameras and wireless video surveillance, etc. Although many progresses have been made in DVC over the past ten years, there is still a gap in RD performance between conventional video coding solutions and DVC. The latest development of DVC is still far from standardization and practical use. The key problems remain in the areas such as accurate and efficient side information generation and refinement, quality control between Wyner-Ziv frames and key frames, correlation noise modelling and decoder complexity, etc.
Under this context, this thesis proposes solutions to improve the state-of-the-art side information refinement schemes, enable consistent quality control over decoded frames during coding process and implement highly efficient DVC codec.
This thesis investigates the impact of reference frames on side information generation and reveals that reference frames have the potential to be better side information than the extensively used interpolated frames. Based on this investigation, we also propose a motion range prediction (MRP) method to exploit reference frames and precisely guide the statistical motion learning process. Extensive simulation results show that choosing reference frames as SI performs competitively, and sometimes even better than interpolated frames. Furthermore, the proposed MRP method is shown to significantly reduce the decoding complexity without degrading any RD performance.
To minimize the block artifacts and achieve consistent improvement in both subjective and objective quality of side information, we propose a novel side information synthesis framework working on pixel granularity. We synthesize the SI at pixel level to minimize the block artifacts and adaptively change the correlation noise model according to the new SI. Furthermore, we have fully implemented a state-of-the-art DVC decoder with the proposed framework using serial and parallel processing technologies to identify bottlenecks and areas to further reduce the decoding complexity, which is another major challenge for future practical DVC system deployments. The performance is evaluated based on the latest transform domain DVC codec and compared with different standard codecs. Extensive experimental results show substantial and consistent rate-distortion gains over standard video codecs and significant speedup over serial implementation.
In order to bring the state-of-the-art DVC one step closer to practical use, we address the problem of distortion variation introduced by typical rate control algorithms, especially in a variable bit rate environment. Simulation results show that the proposed quality control algorithm is capable to meet user defined target distortion and maintain a rather small variation for sequence with slow motion and performs similar to fixed quantization for fast motion sequence at the cost of some RD performance.
Finally, we propose the first implementation of a distributed video encoder on a Texas Instruments TMS320DM6437 digital signal processor. The WZ encoder is
efficiently implemented, using rate adaptive low-density-parity-check accumulative (LDPCA) codes, exploiting the hardware features and optimization techniques to improve the overall performance. Implementation results show that the WZ encoder is able to encode at 134M instruction cycles per QCIF frame on a TMS320DM6437 DSP running at 700MHz. This results in encoder speed 29 times faster than non-optimized encoder implementation. We also implemented a highly efficient DVC decoder using both serial and parallel technology based on a PC-HPC (high performance cluster) architecture, where the encoder is running in a general purpose PC and the decoder is running in a multicore HPC. The experimental results show that the parallelized decoder can achieve about 10 times speedup under various bit-rates and GOP sizes compared to the serial implementation and significant RD gains with regards to the state-of-the-art DISCOVER codec
- …