8 research outputs found
Image and Video Coding Techniques for Ultra-low Latency
The next generation of wireless networks fosters the adoption of latency-critical applications such as XR, connected industry, or autonomous driving. This survey gathers implementation aspects of different image and video coding schemes and discusses their tradeoffs. Standardized video coding technologies such as HEVC or VVC provide a high compression ratio, but their enormous complexity sets the scene for alternative approaches like still image, mezzanine, or texture compression in scenarios with tight resource or latency constraints. Regardless of the coding scheme, we found inter-device memory transfers and the lack of sub-frame coding as limitations of current full-system and software-programmable implementations.publishedVersionPeer reviewe
Neural Video Compression with Diverse Contexts
For any video codecs, the coding efficiency highly relies on whether the
current signal to be encoded can find the relevant contexts from the previous
reconstructed signals. Traditional codec has verified more contexts bring
substantial coding gain, but in a time-consuming manner. However, for the
emerging neural video codec (NVC), its contexts are still limited, leading to
low compression ratio. To boost NVC, this paper proposes increasing the context
diversity in both temporal and spatial dimensions. First, we guide the model to
learn hierarchical quality patterns across frames, which enriches long-term and
yet high-quality temporal contexts. Furthermore, to tap the potential of
optical flow-based coding framework, we introduce a group-based offset
diversity where the cross-group interaction is proposed for better context
mining. In addition, this paper also adopts a quadtree-based partition to
increase spatial context diversity when encoding the latent representation in
parallel. Experiments show that our codec obtains 23.5% bitrate saving over
previous SOTA NVC. Better yet, our codec has surpassed the under-developing
next generation traditional codec/ECM in both RGB and YUV420 colorspaces, in
terms of PSNR. The codes are at https://github.com/microsoft/DCVC.Comment: Accepted by CVPR 2023. Codes are at https://github.com/microsoft/DCV
Learned Video Compression via Heterogeneous Deformable Compensation Network
Learned video compression has recently emerged as an essential research topic
in developing advanced video compression technologies, where motion
compensation is considered one of the most challenging issues. In this paper,
we propose a learned video compression framework via heterogeneous deformable
compensation strategy (HDCVC) to tackle the problems of unstable compression
performance caused by single-size deformable kernels in downsampled feature
domain. More specifically, instead of utilizing optical flow warping or
single-size-kernel deformable alignment, the proposed algorithm extracts
features from the two adjacent frames to estimate content-adaptive
heterogeneous deformable (HetDeform) kernel offsets. Then we transform the
reference features with the HetDeform convolution to accomplish motion
compensation. Moreover, we design a Spatial-Neighborhood-Conditioned Divisive
Normalization (SNCDN) to achieve more effective data Gaussianization combined
with the Generalized Divisive Normalization. Furthermore, we propose a
multi-frame enhanced reconstruction module for exploiting context and temporal
information for final quality enhancement. Experimental results indicate that
HDCVC achieves superior performance than the recent state-of-the-art learned
video compression approaches
Neighbor Correspondence Matching for Flow-based Video Frame Synthesis
Video frame synthesis, which consists of interpolation and extrapolation, is
an essential video processing technique that can be applied to various
scenarios. However, most existing methods cannot handle small objects or large
motion well, especially in high-resolution videos such as 4K videos. To
eliminate such limitations, we introduce a neighbor correspondence matching
(NCM) algorithm for flow-based frame synthesis. Since the current frame is not
available in video frame synthesis, NCM is performed in a
current-frame-agnostic fashion to establish multi-scale correspondences in the
spatial-temporal neighborhoods of each pixel. Based on the powerful motion
representation capability of NCM, we further propose to estimate intermediate
flows for frame synthesis in a heterogeneous coarse-to-fine scheme.
Specifically, the coarse-scale module is designed to leverage neighbor
correspondences to capture large motion, while the fine-scale module is more
computationally efficient to speed up the estimation process. Both modules are
trained progressively to eliminate the resolution gap between training dataset
and real-world videos. Experimental results show that NCM achieves
state-of-the-art performance on several benchmarks. In addition, NCM can be
applied to various practical scenarios such as video compression to achieve
better performance.Comment: Accepted to ACM MM 202
Neural Video Compression with Temporal Layer-Adaptive Hierarchical B-frame Coding
Neural video compression (NVC) is a rapidly evolving video coding research
area, with some models achieving superior coding efficiency compared to the
latest video coding standard Versatile Video Coding (VVC). In conventional
video coding standards, the hierarchical B-frame coding, which utilizes a
bidirectional prediction structure for higher compression, had been
well-studied and exploited. In NVC, however, limited research has investigated
the hierarchical B scheme. In this paper, we propose an NVC model exploiting
hierarchical B-frame coding with temporal layer-adaptive optimization. We first
extend an existing unidirectional NVC model to a bidirectional model, which
achieves -21.13% BD-rate gain over the unidirectional baseline model. However,
this model faces challenges when applied to sequences with complex or large
motions, leading to performance degradation. To address this, we introduce
temporal layer-adaptive optimization, incorporating methods such as temporal
layer-adaptive quality scaling (TAQS) and temporal layer-adaptive latent
scaling (TALS). The final model with the proposed methods achieves an
impressive BD-rate gain of -39.86% against the baseline. It also resolves the
challenges in sequences with large or complex motions with up to -49.13% more
BD-rate gains than the simple bidirectional extension. This improvement is
attributed to the allocation of more bits to lower temporal layers, thereby
enhancing overall reconstruction quality with smaller bits. Since our method
has little dependency on a specific NVC model architecture, it can serve as a
general tool for extending unidirectional NVC models to the ones with
hierarchical B-frame coding
Aproximaciones en la preparación de contenido de vídeo para la transmisión de vídeo bajo demanda (VOD) con DASH
El consumo de contenido multimedia a través de Internet, especialmente el vídeo, está experimentado un crecimiento constante, convirtiéndose en una actividad cotidiana entre individuos de todo el mundo. En este contexto, en los últimos años se han desarrollado numerosos estudios enfocados en la preparación, distribución y transmisión de contenido multimedia, especialmente en el ámbito del vídeo bajo demanda (VoD).
Esta tesis propone diferentes contribuciones en el campo de la codificación de vídeo para VoD que será transmitido usando el estándar Dynamic Adaptive Streaming over HTTP (DASH).
El objetivo es encontrar un equilibrio entre el uso eficiente de recursos computacionales y la garantía de ofrecer una calidad experiencia (QoE) alta para el espectador final.
Como punto de partida, se ofrece un estudio exhaustivo sobre investigaciones relacionadas con técnicas de codificación y transcodificación de vídeo en la nube, enfocándose especialmente en la evolución del streaming y la relevancia del proceso de codificación. Además, se examinan las propuestas en función del tipo de virtualización y modalidades de entrega de contenido.
Se desarrollan dos enfoques de codificación adaptativa basada en la calidad, con el objetivo de ajustar la calidad de toda la secuencia de vídeo a un nivel deseado. Los resultados indican que las soluciones propuestas pueden reducir el tamaño del vídeo manteniendo la misma calidad a lo largo de todos los segmentos del vídeo.
Además, se propone una solución de codificación basada en escenas y se analiza el impacto de utilizar vídeo a baja resolución (downscaling) para detectar escenas en términos de tiempo, calidad y tamaño. Los resultados muestran que se reduce el tiempo total de codificación, el consumo de recursos computacionales y el tamaño del vídeo codificado.
La investigación también presenta una arquitectura que paraleliza los trabajos involucrados en la preparación de contenido DASH utilizando el paradigma FaaS (Function-as-a-Service), en una plataforma serverless. Se prueba esta arquitectura con tres funciones encapsuladas en contenedores, para codificar y analizar la calidad de los vídeos, obteniendo resultados prometedores en términos de escalabilidad y distribución de trabajos.
Finalmente, se crea una herramienta llamada VQMTK, que integra 14 métricas de calidad de vídeo en un contenedor con Docker, facilitando la evaluación de la calidad del vídeo en diversos entornos. Esta herramienta puede ser de gran utilidad en el ámbito de la codificación de vídeo, en la generación de conjuntos de datos para entrenar redes neuronales profundas y en entornos científicos como educativos.
En resumen, la tesis ofrece soluciones y herramientas innovadoras para mejorar la eficiencia y la calidad en la preparación y transmisión de contenido multimedia en la nube, proporcionando una base sólida para futuras investigaciones y desarrollos en este campo que está en constante evolución.The consumption of multimedia content over the Internet, especially video, is growing steadily, becoming a daily activity among people around the world. In this context, several studies have been developed in recent years focused on the preparation, distribution, and transmission of multimedia content, especially in the field of video on demand (VoD).
This thesis proposes different contributions in the field of video coding for transmission in VoD scenarios using Dynamic Adaptive Streaming over HTTP (DASH) standard.
The goal is to find a balance between the efficient use of computational resources and the guarantee of delivering a high-quality experience (QoE) for the end viewer.
As a starting point, a comprehensive survey on research related to video encoding and transcoding techniques in the cloud is provided, focusing especially on the evolution of streaming and the relevance of the encoding process. In addition, proposals are examined as a function of the type of virtualization and content delivery modalities.
Two quality-based adaptive coding approaches are developed with the objective of adjusting the quality of the entire video sequence to a desired level. The results indicate that the proposed solutions can reduce the video size while maintaining the same quality throughout all video segments.
In addition, a scene-based coding solution is proposed and the impact of using downscaling video to detect scenes in terms of time, quality and size is analyzed. The results show that the required encoding time, computational resource consumption and the size of the encoded video are reduced.
The research also presents an architecture that parallelizes the jobs involved in content preparation using the FaaS (Function-as-a-Service) paradigm, on a serverless platform. This architecture is tested with three functions encapsulated in containers, to encode and analyze the quality of the videos, obtaining promising results in terms of scalability and job distribution.
Finally, a tool called VQMTK is developed, which integrates 14 video quality metrics in a container with Docker, facilitating the evaluation of video quality in various environments. This tool can be of great use in the field of video coding, in the generation of datasets to train deep neural networks, and in scientific environments such as educational.
In summary, the thesis offers innovative solutions and tools to improve efficiency and quality in the preparation and transmission of multimedia content in the cloud, providing a solid foundation for future research and development in this constantly evolving field