2,150 research outputs found

    Motion estimation and CABAC VLSI co-processors for real-time high-quality H.264/AVC video coding

    Get PDF
    Real-time and high-quality video coding is gaining a wide interest in the research and industrial community for different applications. H.264/AVC, a recent standard for high performance video coding, can be successfully exploited in several scenarios including digital video broadcasting, high-definition TV and DVD-based systems, which require to sustain up to tens of Mbits/s. To that purpose this paper proposes optimized architectures for H.264/AVC most critical tasks, Motion estimation and context adaptive binary arithmetic coding. Post synthesis results on sub-micron CMOS standard-cells technologies show that the proposed architectures can actually process in real-time 720 × 480 video sequences at 30 frames/s and grant more than 50 Mbits/s. The achieved circuit complexity and power consumption budgets are suitable for their integration in complex VLSI multimedia systems based either on AHB bus centric on-chip communication system or on novel Network-on-Chip (NoC) infrastructures for MPSoC (Multi-Processor System on Chip

    Practical Full Resolution Learned Lossless Image Compression

    Full text link
    We propose the first practical learned lossless image compression system, L3C, and show that it outperforms the popular engineered codecs, PNG, WebP and JPEG 2000. At the core of our method is a fully parallelizable hierarchical probabilistic model for adaptive entropy coding which is optimized end-to-end for the compression task. In contrast to recent autoregressive discrete probabilistic models such as PixelCNN, our method i) models the image distribution jointly with learned auxiliary representations instead of exclusively modeling the image distribution in RGB space, and ii) only requires three forward-passes to predict all pixel probabilities instead of one for each pixel. As a result, L3C obtains over two orders of magnitude speedups when sampling compared to the fastest PixelCNN variant (Multiscale-PixelCNN). Furthermore, we find that learning the auxiliary representation is crucial and outperforms predefined auxiliary representations such as an RGB pyramid significantly.Comment: Updated preprocessing and Table 1, see A.1 in supplementary. Code and models: https://github.com/fab-jul/L3C-PyTorc

    Towards visualization and searching :a dual-purpose video coding approach

    Get PDF
    In modern video applications, the role of the decoded video is much more than filling a screen for visualization. To offer powerful video-enabled applications, it is increasingly critical not only to visualize the decoded video but also to provide efficient searching capabilities for similar content. Video surveillance and personal communication applications are critical examples of these dual visualization and searching requirements. However, current video coding solutions are strongly biased towards the visualization needs. In this context, the goal of this work is to propose a dual-purpose video coding solution targeting both visualization and searching needs by adopting a hybrid coding framework where the usual pixel-based coding approach is combined with a novel feature-based coding approach. In this novel dual-purpose video coding solution, some frames are coded using a set of keypoint matches, which not only allow decoding for visualization, but also provide the decoder valuable feature-related information, extracted at the encoder from the original frames, instrumental for efficient searching. The proposed solution is based on a flexible joint Lagrangian optimization framework where pixel-based and feature-based processing are combined to find the most appropriate trade-off between the visualization and searching performances. Extensive experimental results for the assessment of the proposed dual-purpose video coding solution under meaningful test conditions are presented. The results show the flexibility of the proposed coding solution to achieve different optimization trade-offs, notably competitive performance regarding the state-of-the-art HEVC standard both in terms of visualization and searching performance.Em modernas aplicaçÔes de vĂ­deo, o papel do vĂ­deo decodificado Ă© muito mais que simplesmente preencher uma tela para visualização. Para oferecer aplicaçÔes mais poderosas por meio de sinais de vĂ­deo,Ă© cada vez mais crĂ­tico nĂŁo apenas considerar a qualidade do conteĂșdo objetivando sua visualização, mas tambĂ©m possibilitar meios de realizar busca por conteĂșdos semelhantes. Requisitos de visualização e de busca sĂŁo considerados, por exemplo, em modernas aplicaçÔes de vĂ­deo vigilĂąncia e comunicaçÔes pessoais. No entanto, as atuais soluçÔes de codificação de vĂ­deo sĂŁo fortemente voltadas aos requisitos de visualização. Nesse contexto, o objetivo deste trabalho Ă© propor uma solução de codificação de vĂ­deo de propĂłsito duplo, objetivando tanto requisitos de visualização quanto de busca. Para isso, Ă© proposto um arcabouço de codificação em que a abordagem usual de codificação de pixels Ă© combinada com uma nova abordagem de codificação baseada em features visuais. Nessa solução, alguns quadros sĂŁo codificados usando um conjunto de pares de keypoints casados, possibilitando nĂŁo apenas visualização, mas tambĂ©m provendo ao decodificador valiosas informaçÔes de features visuais, extraĂ­das no codificador a partir do conteĂșdo original, que sĂŁo instrumentais em aplicaçÔes de busca. A solução proposta emprega um esquema flexĂ­vel de otimização Lagrangiana onde o processamento baseado em pixel Ă© combinado com o processamento baseado em features visuais objetivando encontrar um compromisso adequado entre os desempenhos de visualização e de busca. Os resultados experimentais mostram a flexibilidade da solução proposta em alcançar diferentes compromissos de otimização, nomeadamente desempenho competitivo em relação ao padrĂŁo HEVC tanto em termos de visualização quanto de busca

    Current video compression algorithms: Comparisons, optimizations, and improvements

    Full text link
    Compression algorithms have evolved significantly in recent years. Audio, still image, and video can be compressed significantly by taking advantage of the natural redundancies that occur within them. Video compression in particular has made significant advances. MPEG-1 and MPEG-2, two of the major video compression standards, allowed video to be compressed at very low bit rates compared to the original video. The compression ratio for video that is perceptually lossless (losses can\u27t be visually perceived) can even be as high as 40 or 50 to 1 for certain videos. Videos with a small degradation in quality can be compressed at 100 to 1 or more; Although the MPEG standards provided low bit rate compression, even higher quality compression is required for efficient transmission over limited bandwidth networks, wireless networks, and broadcast mediums. Significant gains have been made over the current MPEG-2 standard in a newly developed standard called the Advanced Video Coder, also known as H.264 and MPEG-4 part 10. (Abstract shortened by UMI.)

    Context Adaptive Space Quantization for Image Coding

    Get PDF
    One of the most widely used lossy image compression formats in the world is JPEG. It operates by splitting an image into blocks, applying a frequency transform to each block, quantizing each transformed block, and entropy coding the resulting quantized values. Its popularity is a results of its simple technical description and its ability to achieve very good compression ratios. Given the enormous popularity of JPEG, much work has been done over the past two decades on quantizer optimization. Early works focused on optimizing the table of quantizer step sizes in JPEG in an adaptive manner, yielding significant gains in rate-distortion (RD) performance when compared to using the sample quantization table provided in the JPEG standard; this type of quantizer optimization is referred to as hard decision quantization (HDQ). To address the problem of local adaptivity in JPEG, optimization of the quantized values themselves was then considered in addition to optimizing the quantization table; this type of optimization is referred to as soft decision quantization (SDQ). But even SDQ methods cannot fully overcome the problem of local adaptivity in JPEG; nonetheless, the results from SDQ optimization suggest that overcoming this problem has potentially significant gains in RD performance. In this thesis, we propose a new kind of quantization called context adaptive space quantization (CASQ), where each block in an image is quantized and subsequently entropy coded conditioned on a quantization context. This facilitates the use of different quantizers for different parts of an image. If an image contains regions of varying amounts of detail, for example, then those regions may be assigned different quantization contexts so that they may be quantized differently; then, quantizer optimization may be performed over local regions of an image rather than other the entire image at once. In some sense, CASQ provides the ability to overcome the problem of local adaptivity. We also formulate and solve the problem of quantizer optimization in both the HDQ and SDQ settings using CASQ. We then propose a practical image coder based on JPEG using CASQ optimization. With our coder, significant gains in RD performance are observed. On average, in the case of Huffman coding under HDQ we see a gain of 1.78 dB PSNR compared to baseline JPEG and 0.23 dB compared to the state-of-the-art method. In the worst cases, our image coder performs no worse than state-of-the-art methods. Furthermore, the additional computational complexity of our image coder when compared to baseline JPEG encoding without optimization is very small, on the order of 150 ms for a 2048 x 2560 image in the HDQ case and 4000 ms in the SDQ case

    Seminario sullo Standard MPEG-4: utilizzo ed aspetti implementativi

    Get PDF
    Una delle tecnologie chiave che hanno permesso il grande sviluppo della televisione digitale ù la compressione video. La tecnologia di codifica video nota come MPEG-2, sviluppata nei primi anni novanta, ù diventata lo standard di trasmissione DTV (Digital TV) sia satellitare sia terrestre in quasi tutti i paesi del mondo. Da allora la velocità dei microprocessori e le capacità di memoria dei dispositivi hardware per la codifica e la decodifica sono migliorate significativamente rendendo possibile lo sviluppo e l’implementazione di algoritmi di codifica innovativi in grado di abbattere significativamente i limiti di compressione dello standard MPEG-2. Tali innovazioni, sfociate nel 2003 nello standard MPEG-4 AVC (Advanced Video Coding), non hanno permesso di mantenere la compatibilità all’indietro con l’MPEG-2, e questo ha inizialmente costituito un limite alla loro introduzione nei sistemi di trasmissione DTV. Tuttavia, negli ultimi anni la codifica MPEG-4 AVC si ù diffusa rapidamente, ù stata adottata dal progetto DVB, recentemente dall’ATSC, ed ù lo standard di codifica nell’IPTV. L’obiettivo di questo seminario, che si articola in due giornate, ù quello di presentare lo standard di codifica MPEG-4 AVC con particolare attenzione agli aspetti implementativi del livello di codifica video.2008-11-18Sardegna Ricerche, Edificio 2, Località Piscinamanna 09010 Pula (CA) - ItaliaSeminario sullo Standard MPEG-4: utilizzo ed aspetti implementativ

    Image and Video Coding/Transcoding: A Rate Distortion Approach

    Get PDF
    Due to the lossy nature of image/video compression and the expensive bandwidth and computation resources in a multimedia system, one of the key design issues for image and video coding/transcoding is to optimize trade-off among distortion, rate, and/or complexity. This thesis studies the application of rate distortion (RD) optimization approaches to image and video coding/transcoding for exploring the best RD performance of a video codec compatible to the newest video coding standard H.264 and for designing computationally efficient down-sampling algorithms with high visual fidelity in the discrete Cosine transform (DCT) domain. RD optimization for video coding in this thesis considers two objectives, i.e., to achieve the best encoding efficiency in terms of minimizing the actual RD cost and to maintain decoding compatibility with the newest video coding standard H.264. By the actual RD cost, we mean a cost based on the final reconstruction error and the entire coding rate. Specifically, an operational RD method is proposed based on a soft decision quantization (SDQ) mechanism, which has its root in a fundamental RD theoretic study on fixed-slope lossy data compression. Using SDQ instead of hard decision quantization, we establish a general framework in which motion prediction, quantization, and entropy coding in a hybrid video coding scheme such as H.264 are jointly designed to minimize the actual RD cost on a frame basis. The proposed framework is applicable to optimize any hybrid video coding scheme, provided that specific algorithms are designed corresponding to coding syntaxes of a given standard codec, so as to maintain compatibility with the standard. Corresponding to the baseline profile syntaxes and the main profile syntaxes of H.264, respectively, we have proposed three RD algorithms---a graph-based algorithm for SDQ given motion prediction and quantization step sizes, an algorithm for residual coding optimization given motion prediction, and an iterative overall algorithm for jointly optimizing motion prediction, quantization, and entropy coding---with them embedded in the indicated order. Among the three algorithms, the SDQ design is the core, which is developed based on a given entropy coding method. Specifically, two SDQ algorithms have been developed based on the context adaptive variable length coding (CAVLC) in H.264 baseline profile and the context adaptive binary arithmetic coding (CABAC) in H.264 main profile, respectively. Experimental results for the H.264 baseline codec optimization show that for a set of typical testing sequences, the proposed RD method for H.264 baseline coding achieves a better trade-off between rate and distortion, i.e., 12\% rate reduction on average at the same distortion (ranging from 30dB to 38dB by PSNR) when compared with the RD optimization method implemented in H.264 baseline reference codec. Experimental results for optimizing H.264 main profile coding with CABAC show 10\% rate reduction over a main profile reference codec using CABAC, which also suggests 20\% rate reduction over the RD optimization method implemented in H.264 baseline reference codec, leading to our claim of having developed the best codec in terms of RD performance, while maintaining the compatibility with H.264. By investigating trade-off between distortion and complexity, we have also proposed a designing framework for image/video transcoding with spatial resolution reduction, i.e., to down-sample compressed images/video with an arbitrary ratio in the DCT domain. First, we derive a set of DCT-domain down-sampling methods, which can be represented by a linear transform with double-sided matrix multiplication (LTDS) in the DCT domain. Then, for a pre-selected pixel-domain down-sampling method, we formulate an optimization problem for finding an LTDS to approximate the given pixel-domain method to achieve the best trade-off between visual quality and computational complexity. The problem is then solved by modeling an LTDS with a multi-layer perceptron network and using a structural learning with forgetting algorithm for training the network. Finally, by selecting a pixel-domain reference method with the popular Butterworth lowpass filtering and cubic B-spline interpolation, the proposed framework discovers an LTDS with better visual quality and lower computational complexity when compared with state-of-the-art methods in the literature

    Efficient compression of motion compensated residuals

    Get PDF
    EThOS - Electronic Theses Online ServiceGBUnited Kingdo
    • 

    corecore