30 research outputs found

    Video Traffic Characteristics of Modern Encoding Standards: H.264/AVC with SVC and MVC Extensions and H.265/HEVC

    Get PDF
    abstract: Video encoding for multimedia services over communication networks has significantly advanced in recent years with the development of the highly efficient and flexible H.264/AVC video coding standard and its SVC extension. The emerging H.265/HEVC video coding standard as well as 3D video coding further advance video coding for multimedia communications. This paper first gives an overview of these new video coding standards and then examines their implications for multimedia communications by studying the traffic characteristics of long videos encoded with the new coding standards. We review video coding advances from MPEG-2 and MPEG-4 Part 2 to H.264/AVC and its SVC and MVC extensions as well as H.265/HEVC. For single-layer (nonscalable) video, we compare H.265/HEVC and H.264/AVC in terms of video traffic and statistical multiplexing characteristics. Our study is the first to examine the H.265/HEVC traffic variability for long videos. We also illustrate the video traffic characteristics and statistical multiplexing of scalable video encoded with the SVC extension of H.264/AVC as well as 3D video encoded with the MVC extension of H.264/AVC.View the article as published at https://www.hindawi.com/journals/tswj/2014/189481

    Video Traffic Characteristics of Modern Encoding Standards: H.264/AVC with SVC and MVC Extensions and H.265/HEVC

    Get PDF
    Video encoding for multimedia services over communication networks has significantly advanced in recent years with the development of the highly efficient and flexible H.264/AVC video coding standard and its SVC extension. The emerging H.265/HEVC video coding standard as well as 3D video coding further advance video coding for multimedia communications. This paper first gives an overview of these new video coding standards and then examines their implications for multimedia communications by studying the traffic characteristics of long videos encoded with the new coding standards. We review video coding advances from MPEG-2 and MPEG-4 Part 2 to H.264/AVC and its SVC and MVC extensions as well as H.265/HEVC. For single-layer (nonscalable) video, we compare H.265/HEVC and H.264/AVC in terms of video traffic and statistical multiplexing characteristics. Our study is the first to examine the H.265/HEVC traffic variability for long videos. We also illustrate the video traffic characteristics and statistical multiplexing of scalable video encoded with the SVC extension of H.264/AVC as well as 3D video encoded with the MVC extension of H.264/AVC

    Contributions to the solution of the rate-distorsion optimization problem in video coding

    Get PDF
    In the last two decades, we have witnessed significant changes concerning the demand of video codecs. The diversity of services has significantly increased, high definition (HD) and beyond-HD resolutions have become a reality, the video traffic coming from mobile devices and tablets is increasing, the video-on-demand services are now playing a prominent role, and so on. All of these advances have converged to demand more powerful standard video codecs, the more recent ones being the H.264/Advanced Video Coding (H.264/AVC) and the latest High Efficiency Video Coding (HEVC), both generated by the Joint Collaborative Team on Video Coding (JCT-VC), a partnership of the ITU-T Video Coding Expert Group (VCEG) and the ISO/IED Moving Picture Expert Group (MEPG). These two standards (and many others starting with the ITU-T H.261) rely on a hybrid model known as Differential Pulse Code Modulation (DPCM)/Discrete Cosine Transform (DCT) hybrid video coder, which involves a motion estimation and compensation phase followed by a transformation and quantization stages and an entropy coder. Moreover, each of these main subsystems is made of a number of interdependent and parametric modules that can be adapted to the particular video content. The main problem arising from this approach is how to choose as best as possible the combination of the different parametrizations to achieve the most efficient coding of the current content. To solve this problem, one of the solutions proposed (and the one adopted in both the H.264/AVC and the HEVC reference encoder implementations) is the process referred to as rate-distortion optimization, which chooses a parametrization of the encoder based on the minimization of a cost function that considers the trade-off between rate and distortion, weighted by a Lagrange multiplier (��) which has been empirically obtained for both the H.264/AVC and the HEVC reference encoder implementations, aiming to provide a robust solution for a variety of video contents. In this PhD. thesis, an exhaustive study of the influence of this Lagrangian parameter on different video sequences reveals that there are some common features that appear frequently in video sequences for which the adopted �� model (the reference model) becomes ineffective. Furthermore, we have found a notable margin of improvement in the coding efficiency of both coders when using a more adequate model for the Lagrangian parameter. Thus, contributions of this thesis are the following: (i) to prove that the reference Lagrangian model becomes ineffective in certain common situations; and (ii), propose generalized solutions to improve the robustness of the reference model, both for the H.264/AVC and the HEVC standards, obtaining important improvements in the coding efficiency. In both proposals, changes in the nature over the video sequence are taken into account, proposing models that adaptively consider the video content and minimize the increment in computational complexity.En las últimas dos décadas hemos sido testigos de importantes cambios en la demanda de codificadores de vídeo debido a múltiples factores: la diversidad de servicios se ha visto incrementada significativamente, la resolución high definition (HD) (e incluso mayores) se ha hecho realidad, el tráfico de vídeo procedente de dispositivos móviles y tabletas está aumentando y los servicios de vídeo bajo demanda son cada vez más comunes, entre otros muchos ejemplos. Todos estos avances convergen en la demanda de estándares de codificación de vídeo más potentes, siendo los más importantes el H.264/Advanced Video Coding (AVC) y el más reciente High Efficiency Video Coding (HEVC), ambos definidos por el Joint Collaborative Team on Video Coding (JCT-VC), una colaboraci´on entre el ITU-T Video Coding Expert Group (VCEG) y el ISO/IED Moving Picture Expert Group (MPEG). Estos dos estándares (y otros muchos, empezando con el ITU-T H.261) se basan en un modelo híbrido de codificador conocido como Differential Pulse Code Modulation (DPCM)/Discrete Cosine Transform (DCT), que está formado por una estimación y compensación de movimiento seguida de una etapa de transformación y cuantificación y un codificador entrópico. Además, cada uno de estos subsistemas está formado por un cierto número de módulos interdependientes y paramétricos que pueden adaptarse al contenido específico de cada secuencia de vídeo. El principal problema que surge de esta aproximación es cómo elegir de la forma más adecuada la combinación de las distintas parametrizaciones con el objetivo de alcanzar la codificación más eficiente posible del contenido que se está procesando. Para resolver este problema, una de las soluciones propuestas es el proceso conocido como optimización tasa-distorsión, que se encarga de elegir una parametrización para el codificador basada en la minimización de una función de coste que considera el compromiso existente entre la tasa y la distorsión, ponderado por un multiplicador de Lagrange (�) que ha sido obtenido de forma empírica para las implementaciones de referencia del codificador tanto del estándar H.264/AVC como del estándar HEVC, con el objetivo de proponer una solución robusta para distintos tipos de contenidos de vídeo. En esta tesis doctoral, un estudio exhaustivo de la influencia de este parámetro lagrangiano en distintas secuencias de vídeo revela que existen algunas características comunes que aparecen frecuentemente en secuencias de vídeo para las que el modelo � adoptado en las implementaciones de referencia resulta poco efectivo. Además, hemos encontrado un notable margen de mejora en la eficiencia de codificación de ambos codificadores usando un modelo más adecuado para este parámetro lagrangiano. Por consiguiente, las contribuciones de esta tesis son las que siguen: (i) probar que el modelo lagrangiano de referencia resulta inefectivo bajo ciertas situaciones comunes; y (ii), proponer soluciones generalizadas para mejorar la robustez del modelo de referencia, tanto en el caso de H.264/AVC como en el de HEVC, obteniendo mejoras importantes en eficiencia de codificación. En ambas propuestas se tienen en cuenta los cambios en la naturaleza del contenido de una secuencia de vídeo proponiendo modelos que se adaptan dinámicamente a dicho contenido variable y que tienen en cuenta el incremento en la complejidad computacional del codificador.Programa Oficial de Doctorado en Multimedia y ComunicacionesPresidente: José Prades Nebot.- Secretario: Carmen Peláez Moreno.- Vocal: Julián Cabrera Quesad

    Non-local Attention Optimized Deep Image Compression

    Full text link
    This paper proposes a novel Non-Local Attention Optimized Deep Image Compression (NLAIC) framework, which is built on top of the popular variational auto-encoder (VAE) structure. Our NLAIC framework embeds non-local operations in the encoders and decoders for both image and latent feature probability information (known as hyperprior) to capture both local and global correlations, and apply attention mechanism to generate masks that are used to weigh the features for the image and hyperprior, which implicitly adapt bit allocation for different features based on their importance. Furthermore, both hyperpriors and spatial-channel neighbors of the latent features are used to improve entropy coding. The proposed model outperforms the existing methods on Kodak dataset, including learned (e.g., Balle2019, Balle2018) and conventional (e.g., BPG, JPEG2000, JPEG) image compression methods, for both PSNR and MS-SSIM distortion metrics

    Towards visualization and searching :a dual-purpose video coding approach

    Get PDF
    In modern video applications, the role of the decoded video is much more than filling a screen for visualization. To offer powerful video-enabled applications, it is increasingly critical not only to visualize the decoded video but also to provide efficient searching capabilities for similar content. Video surveillance and personal communication applications are critical examples of these dual visualization and searching requirements. However, current video coding solutions are strongly biased towards the visualization needs. In this context, the goal of this work is to propose a dual-purpose video coding solution targeting both visualization and searching needs by adopting a hybrid coding framework where the usual pixel-based coding approach is combined with a novel feature-based coding approach. In this novel dual-purpose video coding solution, some frames are coded using a set of keypoint matches, which not only allow decoding for visualization, but also provide the decoder valuable feature-related information, extracted at the encoder from the original frames, instrumental for efficient searching. The proposed solution is based on a flexible joint Lagrangian optimization framework where pixel-based and feature-based processing are combined to find the most appropriate trade-off between the visualization and searching performances. Extensive experimental results for the assessment of the proposed dual-purpose video coding solution under meaningful test conditions are presented. The results show the flexibility of the proposed coding solution to achieve different optimization trade-offs, notably competitive performance regarding the state-of-the-art HEVC standard both in terms of visualization and searching performance.Em modernas aplicações de vídeo, o papel do vídeo decodificado é muito mais que simplesmente preencher uma tela para visualização. Para oferecer aplicações mais poderosas por meio de sinais de vídeo,é cada vez mais crítico não apenas considerar a qualidade do conteúdo objetivando sua visualização, mas também possibilitar meios de realizar busca por conteúdos semelhantes. Requisitos de visualização e de busca são considerados, por exemplo, em modernas aplicações de vídeo vigilância e comunicações pessoais. No entanto, as atuais soluções de codificação de vídeo são fortemente voltadas aos requisitos de visualização. Nesse contexto, o objetivo deste trabalho é propor uma solução de codificação de vídeo de propósito duplo, objetivando tanto requisitos de visualização quanto de busca. Para isso, é proposto um arcabouço de codificação em que a abordagem usual de codificação de pixels é combinada com uma nova abordagem de codificação baseada em features visuais. Nessa solução, alguns quadros são codificados usando um conjunto de pares de keypoints casados, possibilitando não apenas visualização, mas também provendo ao decodificador valiosas informações de features visuais, extraídas no codificador a partir do conteúdo original, que são instrumentais em aplicações de busca. A solução proposta emprega um esquema flexível de otimização Lagrangiana onde o processamento baseado em pixel é combinado com o processamento baseado em features visuais objetivando encontrar um compromisso adequado entre os desempenhos de visualização e de busca. Os resultados experimentais mostram a flexibilidade da solução proposta em alcançar diferentes compromissos de otimização, nomeadamente desempenho competitivo em relação ao padrão HEVC tanto em termos de visualização quanto de busca

    Resource management for power-constrained HEVC transcoding using reinforcement learning

    Get PDF
    The advent of online video streaming applications and services along with the users' demand for high-quality contents require High Efficiency Video Coding (HEVC), which provides higher video quality and more compression at the cost of increased complexity. On one hand, HEVC exposes a set of dynamically tunable parameters to provide trade-offs among Quality-of-Service (QoS), performance, and power consumption of multi-core servers on the video providers' data center. On the other hand, resource management of modern multi-core servers is in charge of adapting system-level parameters, such as operating frequency and multithreading, to deal with concurrent applications and their requirements. Therefore, efficient multi-user HEVC streaming necessitates joint adaptation of application- and system-level parameters. Nonetheless, dealing with such a large and dynamic design space is challenging and difficult to address through conventional resource management strategies. Thus, in this work, we develop a multi-agent Reinforcement Learning framework to jointly adjust application- and system-level parameters at runtime to satisfy the QoS of multi-user HEVC streaming in power-constrained servers. In particular, the design space, composed of all design parameters, is split into smaller independent sub-spaces. Each design sub-space is assigned to a particular agent so that it can explore it faster, yet accurately. The benefits of our approach are revealed in terms of adaptability and quality (with up to to 4x improvements in terms of QoS when compared to a static resource management scheme), and learning time (6 x faster than an equivalent mono-agent implementation). Finally, we show that the power-capping techniques formulated outperform the hardware-based power capping with respect to quality
    corecore