8 research outputs found

    Image and Video Coding Techniques for Ultra-low Latency

    Get PDF
    The next generation of wireless networks fosters the adoption of latency-critical applications such as XR, connected industry, or autonomous driving. This survey gathers implementation aspects of different image and video coding schemes and discusses their tradeoffs. Standardized video coding technologies such as HEVC or VVC provide a high compression ratio, but their enormous complexity sets the scene for alternative approaches like still image, mezzanine, or texture compression in scenarios with tight resource or latency constraints. Regardless of the coding scheme, we found inter-device memory transfers and the lack of sub-frame coding as limitations of current full-system and software-programmable implementations.publishedVersionPeer reviewe

    Neural Video Compression with Diverse Contexts

    Full text link
    For any video codecs, the coding efficiency highly relies on whether the current signal to be encoded can find the relevant contexts from the previous reconstructed signals. Traditional codec has verified more contexts bring substantial coding gain, but in a time-consuming manner. However, for the emerging neural video codec (NVC), its contexts are still limited, leading to low compression ratio. To boost NVC, this paper proposes increasing the context diversity in both temporal and spatial dimensions. First, we guide the model to learn hierarchical quality patterns across frames, which enriches long-term and yet high-quality temporal contexts. Furthermore, to tap the potential of optical flow-based coding framework, we introduce a group-based offset diversity where the cross-group interaction is proposed for better context mining. In addition, this paper also adopts a quadtree-based partition to increase spatial context diversity when encoding the latent representation in parallel. Experiments show that our codec obtains 23.5% bitrate saving over previous SOTA NVC. Better yet, our codec has surpassed the under-developing next generation traditional codec/ECM in both RGB and YUV420 colorspaces, in terms of PSNR. The codes are at https://github.com/microsoft/DCVC.Comment: Accepted by CVPR 2023. Codes are at https://github.com/microsoft/DCV

    Learned Video Compression via Heterogeneous Deformable Compensation Network

    Full text link
    Learned video compression has recently emerged as an essential research topic in developing advanced video compression technologies, where motion compensation is considered one of the most challenging issues. In this paper, we propose a learned video compression framework via heterogeneous deformable compensation strategy (HDCVC) to tackle the problems of unstable compression performance caused by single-size deformable kernels in downsampled feature domain. More specifically, instead of utilizing optical flow warping or single-size-kernel deformable alignment, the proposed algorithm extracts features from the two adjacent frames to estimate content-adaptive heterogeneous deformable (HetDeform) kernel offsets. Then we transform the reference features with the HetDeform convolution to accomplish motion compensation. Moreover, we design a Spatial-Neighborhood-Conditioned Divisive Normalization (SNCDN) to achieve more effective data Gaussianization combined with the Generalized Divisive Normalization. Furthermore, we propose a multi-frame enhanced reconstruction module for exploiting context and temporal information for final quality enhancement. Experimental results indicate that HDCVC achieves superior performance than the recent state-of-the-art learned video compression approaches

    Neighbor Correspondence Matching for Flow-based Video Frame Synthesis

    Full text link
    Video frame synthesis, which consists of interpolation and extrapolation, is an essential video processing technique that can be applied to various scenarios. However, most existing methods cannot handle small objects or large motion well, especially in high-resolution videos such as 4K videos. To eliminate such limitations, we introduce a neighbor correspondence matching (NCM) algorithm for flow-based frame synthesis. Since the current frame is not available in video frame synthesis, NCM is performed in a current-frame-agnostic fashion to establish multi-scale correspondences in the spatial-temporal neighborhoods of each pixel. Based on the powerful motion representation capability of NCM, we further propose to estimate intermediate flows for frame synthesis in a heterogeneous coarse-to-fine scheme. Specifically, the coarse-scale module is designed to leverage neighbor correspondences to capture large motion, while the fine-scale module is more computationally efficient to speed up the estimation process. Both modules are trained progressively to eliminate the resolution gap between training dataset and real-world videos. Experimental results show that NCM achieves state-of-the-art performance on several benchmarks. In addition, NCM can be applied to various practical scenarios such as video compression to achieve better performance.Comment: Accepted to ACM MM 202

    Neural Video Compression with Temporal Layer-Adaptive Hierarchical B-frame Coding

    Full text link
    Neural video compression (NVC) is a rapidly evolving video coding research area, with some models achieving superior coding efficiency compared to the latest video coding standard Versatile Video Coding (VVC). In conventional video coding standards, the hierarchical B-frame coding, which utilizes a bidirectional prediction structure for higher compression, had been well-studied and exploited. In NVC, however, limited research has investigated the hierarchical B scheme. In this paper, we propose an NVC model exploiting hierarchical B-frame coding with temporal layer-adaptive optimization. We first extend an existing unidirectional NVC model to a bidirectional model, which achieves -21.13% BD-rate gain over the unidirectional baseline model. However, this model faces challenges when applied to sequences with complex or large motions, leading to performance degradation. To address this, we introduce temporal layer-adaptive optimization, incorporating methods such as temporal layer-adaptive quality scaling (TAQS) and temporal layer-adaptive latent scaling (TALS). The final model with the proposed methods achieves an impressive BD-rate gain of -39.86% against the baseline. It also resolves the challenges in sequences with large or complex motions with up to -49.13% more BD-rate gains than the simple bidirectional extension. This improvement is attributed to the allocation of more bits to lower temporal layers, thereby enhancing overall reconstruction quality with smaller bits. Since our method has little dependency on a specific NVC model architecture, it can serve as a general tool for extending unidirectional NVC models to the ones with hierarchical B-frame coding

    Deep Video Compression

    Get PDF

    Aproximaciones en la preparación de contenido de vídeo para la transmisión de vídeo bajo demanda (VOD) con DASH

    Get PDF
    El consumo de contenido multimedia a través de Internet, especialmente el vídeo, está experimentado un crecimiento constante, convirtiéndose en una actividad cotidiana entre individuos de todo el mundo. En este contexto, en los últimos años se han desarrollado numerosos estudios enfocados en la preparación, distribución y transmisión de contenido multimedia, especialmente en el ámbito del vídeo bajo demanda (VoD). Esta tesis propone diferentes contribuciones en el campo de la codificación de vídeo para VoD que será transmitido usando el estándar Dynamic Adaptive Streaming over HTTP (DASH). El objetivo es encontrar un equilibrio entre el uso eficiente de recursos computacionales y la garantía de ofrecer una calidad experiencia (QoE) alta para el espectador final. Como punto de partida, se ofrece un estudio exhaustivo sobre investigaciones relacionadas con técnicas de codificación y transcodificación de vídeo en la nube, enfocándose especialmente en la evolución del streaming y la relevancia del proceso de codificación. Además, se examinan las propuestas en función del tipo de virtualización y modalidades de entrega de contenido. Se desarrollan dos enfoques de codificación adaptativa basada en la calidad, con el objetivo de ajustar la calidad de toda la secuencia de vídeo a un nivel deseado. Los resultados indican que las soluciones propuestas pueden reducir el tamaño del vídeo manteniendo la misma calidad a lo largo de todos los segmentos del vídeo. Además, se propone una solución de codificación basada en escenas y se analiza el impacto de utilizar vídeo a baja resolución (downscaling) para detectar escenas en términos de tiempo, calidad y tamaño. Los resultados muestran que se reduce el tiempo total de codificación, el consumo de recursos computacionales y el tamaño del vídeo codificado. La investigación también presenta una arquitectura que paraleliza los trabajos involucrados en la preparación de contenido DASH utilizando el paradigma FaaS (Function-as-a-Service), en una plataforma serverless. Se prueba esta arquitectura con tres funciones encapsuladas en contenedores, para codificar y analizar la calidad de los vídeos, obteniendo resultados prometedores en términos de escalabilidad y distribución de trabajos. Finalmente, se crea una herramienta llamada VQMTK, que integra 14 métricas de calidad de vídeo en un contenedor con Docker, facilitando la evaluación de la calidad del vídeo en diversos entornos. Esta herramienta puede ser de gran utilidad en el ámbito de la codificación de vídeo, en la generación de conjuntos de datos para entrenar redes neuronales profundas y en entornos científicos como educativos. En resumen, la tesis ofrece soluciones y herramientas innovadoras para mejorar la eficiencia y la calidad en la preparación y transmisión de contenido multimedia en la nube, proporcionando una base sólida para futuras investigaciones y desarrollos en este campo que está en constante evolución.The consumption of multimedia content over the Internet, especially video, is growing steadily, becoming a daily activity among people around the world. In this context, several studies have been developed in recent years focused on the preparation, distribution, and transmission of multimedia content, especially in the field of video on demand (VoD). This thesis proposes different contributions in the field of video coding for transmission in VoD scenarios using Dynamic Adaptive Streaming over HTTP (DASH) standard. The goal is to find a balance between the efficient use of computational resources and the guarantee of delivering a high-quality experience (QoE) for the end viewer. As a starting point, a comprehensive survey on research related to video encoding and transcoding techniques in the cloud is provided, focusing especially on the evolution of streaming and the relevance of the encoding process. In addition, proposals are examined as a function of the type of virtualization and content delivery modalities. Two quality-based adaptive coding approaches are developed with the objective of adjusting the quality of the entire video sequence to a desired level. The results indicate that the proposed solutions can reduce the video size while maintaining the same quality throughout all video segments. In addition, a scene-based coding solution is proposed and the impact of using downscaling video to detect scenes in terms of time, quality and size is analyzed. The results show that the required encoding time, computational resource consumption and the size of the encoded video are reduced. The research also presents an architecture that parallelizes the jobs involved in content preparation using the FaaS (Function-as-a-Service) paradigm, on a serverless platform. This architecture is tested with three functions encapsulated in containers, to encode and analyze the quality of the videos, obtaining promising results in terms of scalability and job distribution. Finally, a tool called VQMTK is developed, which integrates 14 video quality metrics in a container with Docker, facilitating the evaluation of video quality in various environments. This tool can be of great use in the field of video coding, in the generation of datasets to train deep neural networks, and in scientific environments such as educational. In summary, the thesis offers innovative solutions and tools to improve efficiency and quality in the preparation and transmission of multimedia content in the cloud, providing a solid foundation for future research and development in this constantly evolving field
    corecore