35 research outputs found

    MIJ2K: Enhanced video transmission based on conditional replenishment of JPEG2000 tiles with motion compensation

    Get PDF
    A video compressed as a sequence of JPEG2000 images can achieve the scalability, flexibility, and accessibility that is lacking in current predictive motion-compensated video coding standards. However, streaming JPEG2000-based sequences would consume considerably more bandwidth. With the aim of solving this problem, this paper describes a new patent pending method, called MIJ2K. MIJ2K reduces the inter-frame redundancy present in common JPEG2000 sequences (also called MJP2). We apply a real-time motion detection system to perform conditional tile replenishment. This will significantly reduce the bit rate necessary to transmit JPEG2000 video sequences, also improving their quality. The MIJ2K technique can be used both to improve JPEG2000-based real-time video streaming services or as a new codec for video storage. MIJ2K relies on a fast motion compensation technique, especially designed for real-time video streaming purposes. In particular, we propose transmitting only the tiles that change in each JPEG2000 frame. This paper describes and evaluates the method proposed for real-time tile change detection, as well as the overall MIJ2K architecture. We compare MIJ2K against other intra-frame codecs, like standard Motion JPEG2000, Motion JPEG, and the latest H.264-Intra, comparing performance in terms of compression ratio and video quality, measured by standard peak signal-to-noise ratio, structural similarity and visual quality metric metrics.This work was supported in part by Projects CICYT TIN2008– 06742-C02–02/TSI, CICYT TEC2008–06732-C02–02/TEC, SINPROB, CAM MADRINET S-0505/TIC/0255 and DPS2008–07029-C02–02.Publicad

    Motion compensation with minimal residue dispersion matching criteria

    Get PDF
    Dissertação (mestrado)—Universidade de Brasília, Faculdade de Tecnoloigia, 2016.Com a crescente demanda por serviços de vídeo, técnicas de compressão de vídeo tornaram-se uma tecnologia de importância central para os sistemas de comunicação modernos. Padrões para codificação de vídeo foram criados pela indústria, permitindo a integração entre esses serviços e os mais diversos dispositivos para acessá-los. A quase totalidade desses padrões adota um modelo de codificação híbrida, que combina métodos de codificação diferencial e de codificação por transformadas, utilizando a compensação de movimento por blocos (CMB) como técnica central na etapa de predição. O método CMB tornou-se a mais importante técnica para explorar a forte redundância temporal típica da maioria das sequências de vídeo. De fato, muito do aprimoramento em termos de e ciência na codificação de vídeo observado nas últimas duas décadas pode ser atribuído a refinamentos incrementais na técnica de CMB. Neste trabalho, apresentamos um novo refinamento a essa técnica. Uma questão central à abordagem de CMB é a estimação de movimento (EM), ou seja, a seleção de vetores de movimento (VM) apropriados. Padrões de codificação tendem a regular estritamente a sintaxe de codificação e os processos de decodificação para VM's e informação de resíduo, mas o algoritmo de EM em si é deixado a critério dos projetistas do codec. No entanto, embora praticamente qualquer critério de seleção permita uma decodi cação correta, uma seleção de VM criteriosa é vital para a e ciência global do codec, garantindo ao codi cador uma vantagem competitiva no mercado. A maioria do algoritmos de EM baseia-se na minimização de uma função de custo para os blocos candidatos a predição para um dado bloco alvo, geralmente a soma das diferenças absolutas (SDA) ou a soma das diferenças quadradas (SDQ). A minimização de qualquer uma dessas funções de custo selecionará a predição que resulta no menor resíduo, cada uma em um sentido diferente porém bem de nido. Neste trabalho, mostramos que a predição de mínima dispersão de resíduo é frequentemente mais e ciente que a tradicional predição com resíduo de mínimo tamanho. Como prova de conceito, propomos o algoritmo de duplo critério de correspondência (ADCC), um algoritmo simples em dois estágios para explorar ambos esses critérios de seleção em turnos. Estágios de minimização de dispersão e de minimização de tamanho são executadas independentemente. O codificador então compara o desempenho dessas predições em termos da relação taxa-distorção e efetivamente codifica somente a mais eficiente. Para o estágio de minimização de dispersão do ADCC, propomos ainda o desvio absoluto total com relação à média (DATM) como a medida de dispersão a ser minimizada no processo de EM. A tradicional SDA é utilizada como a função de custo para EM no estágio de minimização de tamanho. O ADCC com SDA/DATM foi implementado em uma versão modificada do software de referência JM para o amplamente difundido padrão H.264/AVC de codificação. Absoluta compatibilidade a esse padrão foi mantida, de forma que nenhuma modificação foi necessária no lado do decodificador. Os resultados mostram aprimoramentos significativos com relação ao codificador H.264/AVC não modificado.With the ever growing demand for video services, video compression techniques have become a technology of central importance for communication systems. Industry standards for video coding have emerged, allowing the integration between these services and the most diverse devices. The almost entirety of these standards adopt a hybrid coding model combining di erential and transform coding methods, with block-based motion compensation (BMC) at the core of its prediction step. The BMC method have become the single most important technique to exploit the strong temporal redundancy typical of most video sequences. In fact, much of the improvements in video coding e ciency over the past two decades can be attributed to incremental re nements to the BMC technique. In this work, we propose another such re nement. A key issue to the BMC framework is motion estimation (ME), i.e., the selection of appropriate motion vectors (MV). Coding standards tend to strictly regulate the coding syntax and decoding processes for MV's and residual information, but the ME algorithm itself is left at the discretion of the codec designers. However, though virtually any MV selection criterion will allow for correct decoding, judicious MV selection is critical to the overall codec performance, providing the encoder with a competitive edge in the market. Most ME algorithms rely on the minimization of a cost function for the candidate prediction blocks given a target block, usually the sum of absolute di erences (SAD) or the sum of squared di erences (SSD). The minimization of any of these cost functions will select the prediction that results in the smallest residual, each in a di erent but well de ned sense. In this work, we show that the prediction of minimal residue dispersion is frequently more e cient than the usual prediction of minimal residue size. As proof of concept, we propose the double matching criterion algorithm (DMCA), a simple two-pass algorithm to exploit both of these MV selection criteria in turns. Dispersion minimizing and size minimizing predictions are carried out independently. The encoder then compares these predictions in terms of rate-distortion performance and outputs only the most e cient one. For the dispersion minimizing pass of the DMCA, we also propose the total absolute deviation from the mean (TADM) as the measure of residue dispersion to be minimized in ME. The usual SAD is used as the ME cost function in the size minimizing pass. The DMCA with SAD/TADM was implemented in a modi ed version of the JM reference software encoder for the widely popular H.264/AVC coding standard. Absolute compliance to the standard was maintained, so that no modi cations on the decoder side were necessary. Results show signi cant improvements over the unmodi ed H.264/AVC encoder

    Video compression based on wavelet transform

    Get PDF
    Cílem této diplomové práce je nastudovat současné možnosti využití vlnkové transformace při kompresi videosignálu. Část práce je věnována teorii potřebné k praktické realizaci úkolu. Zabývá se popisem videosignálu a jeho vlastností, vlnkovou transformací a metodami komprimace. Druhá část práce je zaměřena na popis vybrané metody komprese. Jedná se o algoritmus SPIHT (Set Partitioning In Hierarchical Trees), určený pro kompresi statických obrazových dat. Algoritmus je modifikován na použití při kompresi videosignálu, který je charakteristický svou časovou redundancí. Protože algoritmus pracuje v prostorové i časové doméně, je nazýván 3D SPIHT. Algoritmus je implementován v programovém prostředí MATLAB, který poskytuje důmyslnou podporu vlnkové transformaci (Wavelet Toolbox). Pro účely snadného a intuitivního ovládání kodéru je vytvořena aplikace, která poskytuje grafické uživatelské prostředí. Uživatel má možnost měnit parametry kodéru a sledovat změny na zobrazovaných náhledech snímků a naměřených grafech. K dispozici jsou čtyři testovací sekvence snímků, které obsahují různé scény, s různými charakteristickými rysy. Závěr práce je věnován testování navrženého kódovacího schématu, pro různé testovací sekvence snímků a různé nastavené parametry kodéru. Naměřené hodnoty jsou graficky zobrazeny a vyhodnoceny.This diploma thesis focuses on current possibilities concerning the employment of wavelet transformation for video signal compression. One part of the work is devoted to the necessary execution of this task in practise. This deals with video signal and its features description, wavelet transformation and compression methods. The second part concentrates on description of selected compression method. It is the SPIHT (Set Partitioning In Hierarchical Trees) algorithm which is intended for compression of static image data. The algorithm is modified for usage with video signal compression which is specific for its time redundancy. The algorithm is called 3D SPIHT as it works in the spatial and time domain. The algorithm is implemented in the MATLAB programming environment which provides a sophisticated support for wavelet transformation (Wavelet Toolbox). To provide a simple and intuitive encoder control there has been developed an application delivering graphical user interface (GUI). On displayed image previews and measured graphs the user can change encoder parameters and monitor performed changes. There are four image test-sequential modes containing various scenes with different features. The final part of the work is focused on testing of the proposed encoding scheme, various image test-sequential modes and encoder settings. Measured values are graphically displayed and analyzed.

    Optimising Faster R-CNN training to enable video camera compression for assisted and automated driving systems

    Get PDF
    Advanced driving assistance systems based on only one camera or one RADAR are evolving into the current assisted and automated driving functions delivering SAE Level 2 and above capabilities. A suite of environmental perception sensors is required to achieve safe and reliable planning and navigation in future vehicles equipped with these capabilities. The sensor suite, based on several cameras, LiDARs, RADARs and ultrasonic sensors, needs to be adequate to provide sufficient (and redundant, depending on the level of driving automation) spatial and temporal coverage of the environment around the vehicle. However, the data amount produced by the sensor suite can easily exceed a few tens of Gb/s, with a single ‘average’ automotive camera producing more than 3 Gb/s. It is therefore important to consider leveraging traditional video compression techniques as well as to investigate novel ones to reduce the amount of video camera data to be transmitted to the vehicle processing unit(s). In this paper, we demonstrate that lossy compression schemes, with high compression ratios (up to 1:1,000) can be applied safely to the camera video data stream when machine learning based object detection is used to consume the sensor data. We show that transfer learning can be used to re-train a deep neural network with H.264 and H.265 compliant compressed data, and it allows the network performance to be optimised based on the compression level of the generated sensor data. Moreover, this form of transfer learning improves the neural network performance when evaluating uncompressed data, increasing its robustness to real world variations of the data

    Serviços Web para o processamento e gestão de conteúdo A/V em ambientes profissionais de TV

    Get PDF
    Estágio realizado na MOG Solutions, S. ATese de mestrado integrado. Engenharia Electrotécnica e de Computadores - Major Telecommunications. Faculdade de Engenharia. Universidade do Porto. 200

    Selected topics on distributed video coding

    Get PDF
    Distributed Video Coding (DVC) is a new paradigm for video compression based on the information theoretical results of Slepian and Wolf (SW), and Wyner and Ziv (WZ). While conventional coding has a rigid complexity allocation as most of the complex tasks are performed at the encoder side, DVC enables a flexible complexity allocation between the encoder and the decoder. The most novel and interesting case is low complexity encoding and complex decoding, which is the opposite of conventional coding. While the latter is suitable for applications where the cost of the decoder is more critical than the encoder's one, DVC opens the door for a new range of applications where low complexity encoding is required and the decoder's complexity is not critical. This is interesting with the deployment of small and battery-powered multimedia mobile devices all around in our daily life. Further, since DVC operates as a reversed-complexity scheme when compared to conventional coding, DVC also enables the interesting scenario of low complexity encoding and decoding between two ends by transcoding between DVC and conventional coding. More specifically, low complexity encoding is possible by DVC at one end. Then, the resulting stream is decoded and conventionally re-encoded to enable low complexity decoding at the other end. Multiview video is attractive for a wide range of applications such as free viewpoint television, which is a system that allows viewing the scene from a viewpoint chosen by the viewer. Moreover, multiview can be beneficial for monitoring purposes in video surveillance. The increased use of multiview video systems is mainly due to the improvements in video technology and the reduced cost of cameras. While a multiview conventional codec will try to exploit the correlation among the different cameras at the encoder side, DVC allows for separate encoding of correlated video sources. Therefore, DVC requires no communication between the cameras in a multiview scenario. This is an advantage since communication is time consuming (i.e. more delay) and requires complex networking. Another appealing feature of DVC is the fact that it is based on a statistical framework. Moreover, DVC behaves as a natural joint source-channel coding solution. This results in an improved error resilience performance when compared to conventional coding. Further, DVC-based scalable codecs do not require a deterministic knowledge of the lower layers. In other words, the enhancement layers are completely independent from the base layer codec. This is called the codec-independent scalability feature, which offers a high flexibility in the way the various layers are distributed in a network. This thesis addresses the following topics: First, the theoretical foundations of DVC as well as the practical DVC scheme used in this research are presented. The potential applications for DVC are also outlined. DVC-based schemes use conventional coding to compress parts of the data, while the rest is compressed in a distributed fashion. Thus, different conventional codecs are studied in this research as they are compared in terms of compression efficiency for a rich set of sequences. This includes fine tuning the compression parameters such that the best performance is achieved for each codec. Further, DVC tools for improved Side Information (SI) and Error Concealment (EC) are introduced for monoview DVC using a partially decoded frame. The improved SI results in a significant gain in reconstruction quality for video with high activity and motion. This is done by re-estimating the erroneous motion vectors using the partially decoded frame to improve the SI quality. The latter is then used to enhance the reconstruction of the finally decoded frame. Further, the introduced spatio-temporal EC improves the quality of decoded video in the case of erroneously received packets, outperforming both spatial and temporal EC. Moreover, it also outperforms error-concealed conventional coding in different modes. Then, multiview DVC is studied in terms of SI generation, which differentiates it from the monoview case. More specifically, different multiview prediction techniques for SI generation are described and compared in terms of prediction quality, complexity and compression efficiency. Further, a technique for iterative multiview SI is introduced, where the final SI is used in an enhanced reconstruction process. The iterative SI outperforms the other SI generation techniques, especially for high motion video content. Finally, fusion techniques of temporal and inter-view side informations are introduced as well, which improves the performance of multiview DVC over monoview coding. DVC is also used to enable scalability for image and video coding. Since DVC is based on a statistical framework, the base and enhancement layers are completely independent, which is an interesting property called codec-independent scalability. Moreover, the introduced DVC scalable schemes show a good robustness to errors as the quality of decoded video steadily decreases with error rate increase. On the other hand, conventional coding exhibits a cliff effect as the performance drops dramatically after a certain error rate value. Further, the issue of privacy protection is addressed for DVC by transform domain scrambling, which is used to alter regions of interest in video such that the scene is still understood and privacy is preserved as well. The proposed scrambling techniques are shown to provide a good level of security without impairing the performance of the DVC scheme when compared to the one without scrambling. This is particularly attractive for video surveillance scenarios, which is one of the most promising applications for DVC. Finally, a practical DVC demonstrator built during this research is described, where the main requirements as well as the observed limitations are presented. Furthermore, it is defined in a setup being as close as possible to a complete real application scenario. This shows that it is actually possible to implement a complete end-to-end practical DVC system relying only on realistic assumptions. Even though DVC is inferior in terms of compression efficiency to the state of the art conventional coding for the moment, strengths of DVC reside in its good error resilience properties and the codec-independent scalability feature. Therefore, DVC offers promising possibilities for video compression with transmission over error-prone environments requirement as it significantly outperforms conventional coding in this case

    Metadata-driven multimedia access

    Get PDF
    With the growing ubiquity and mobility of multimedia-enabled devices, universal multimedia access (UMA) is emerging as one of the important components for the next generation of multimedia applications. The basic concept underlying UMA is universal or seamless access to multimedia content, by automatic selection and adaptation of content based on the user's environment. UMA promises an integration of these different perspectives into a new class of content adaptive applications that could allow users to access multimedia content without concern for specific coding formats, terminal capabilities, or network conditions. We discuss methods that support UMA and the tools provided by MPEG-7 to achieve this. We also discuss the inclusion of metadata in JPEG 2000 encoded images. We present these methods in the typical order that they may be used in an actual application. Therefore, we first discuss the (personalized) selection of desired content from all available content, followed by the organization of related variations of a single piece of content. Then, we discuss segmentation and summarization of audio video (AV) content, and finally, transcoding of AV content

    Scalable video compression with optimized visual performance and random accessibility

    Full text link
    This thesis is concerned with maximizing the coding efficiency, random accessibility and visual performance of scalable compressed video. The unifying theme behind this work is the use of finely embedded localized coding structures, which govern the extent to which these goals may be jointly achieved. The first part focuses on scalable volumetric image compression. We investigate 3D transform and coding techniques which exploit inter-slice statistical redundancies without compromising slice accessibility. Our study shows that the motion-compensated temporal discrete wavelet transform (MC-TDWT) practically achieves an upper bound to the compression efficiency of slice transforms. From a video coding perspective, we find that most of the coding gain is attributed to offsetting the learning penalty in adaptive arithmetic coding through 3D code-block extension, rather than inter-frame context modelling. The second aspect of this thesis examines random accessibility. Accessibility refers to the ease with which a region of interest is accessed (subband samples needed for reconstruction are retrieved) from a compressed video bitstream, subject to spatiotemporal code-block constraints. We investigate the fundamental implications of motion compensation for random access efficiency and the compression performance of scalable interactive video. We demonstrate that inclusion of motion compensation operators within the lifting steps of a temporal subband transform incurs a random access penalty which depends on the characteristics of the motion field. The final aspect of this thesis aims to minimize the perceptual impact of visible distortion in scalable reconstructed video. We present a visual optimization strategy based on distortion scaling which raises the distortion-length slope of perceptually significant samples. This alters the codestream embedding order during post-compression rate-distortion optimization, thus allowing visually sensitive sites to be encoded with higher fidelity at a given bit-rate. For visual sensitivity analysis, we propose a contrast perception model that incorporates an adaptive masking slope. This versatile feature provides a context which models perceptual significance. It enables scene structures that otherwise suffer significant degradation to be preserved at lower bit-rates. The novelty in our approach derives from a set of "perceptual mappings" which account for quantization noise shaping effects induced by motion-compensated temporal synthesis. The proposed technique reduces wavelet compression artefacts and improves the perceptual quality of video
    corecore