45 research outputs found
Towards Hybrid-Optimization Video Coding
Video coding is a mathematical optimization problem of rate and distortion
essentially. To solve this complex optimization problem, two popular video
coding frameworks have been developed: block-based hybrid video coding and
end-to-end learned video coding. If we rethink video coding from the
perspective of optimization, we find that the existing two frameworks represent
two directions of optimization solutions. Block-based hybrid coding represents
the discrete optimization solution because those irrelevant coding modes are
discrete in mathematics. It searches for the best one among multiple starting
points (i.e. modes). However, the search is not efficient enough. On the other
hand, end-to-end learned coding represents the continuous optimization solution
because the gradient descent is based on a continuous function. It optimizes a
group of model parameters efficiently by the numerical algorithm. However,
limited by only one starting point, it is easy to fall into the local optimum.
To better solve the optimization problem, we propose to regard video coding as
a hybrid of the discrete and continuous optimization problem, and use both
search and numerical algorithm to solve it. Our idea is to provide multiple
discrete starting points in the global space and optimize the local optimum
around each point by numerical algorithm efficiently. Finally, we search for
the global optimum among those local optimums. Guided by the hybrid
optimization idea, we design a hybrid optimization video coding framework,
which is built on continuous deep networks entirely and also contains some
discrete modes. We conduct a comprehensive set of experiments. Compared to the
continuous optimization framework, our method outperforms pure learned video
coding methods. Meanwhile, compared to the discrete optimization framework, our
method achieves comparable performance to HEVC reference software HM16.10 in
PSNR
3D coding tools final report
Livrable D4.3 du projet ANR PERSEECe rapport a été réalisé dans le cadre du projet ANR PERSEE (n° ANR-09-BLAN-0170). Exactement il correspond au livrable D4.3 du projet. Son titre : 3D coding tools final repor
Livrable D5.2 of the PERSEE project : 2D/3D Codec architecture
Livrable D5.2 du projet ANR PERSEECe rapport a été réalisé dans le cadre du projet ANR PERSEE (n° ANR-09-BLAN-0170). Exactement il correspond au livrable D5.2 du projet. Son titre : 2D/3D Codec architectur
Low complexity in-loop perceptual video coding
The tradition of broadcast video is today complemented with user generated content, as portable devices support video coding. Similarly, computing is becoming ubiquitous, where Internet of Things (IoT) incorporate heterogeneous networks to communicate with personal and/or infrastructure devices. Irrespective, the emphasises is on bandwidth and processor efficiencies, meaning increasing the signalling options in video encoding. Consequently, assessment for pixel differences applies uniform cost to be processor efficient, in contrast the Human Visual System (HVS) has non-uniform sensitivity based upon lighting, edges and textures. Existing perceptual assessments, are natively incompatible and processor demanding, making perceptual video coding (PVC) unsuitable for these environments. This research allows existing perceptual assessment at the native level using low complexity techniques, before producing new pixel-base image quality assessments (IQAs). To manage these IQAs a framework was developed and implemented in the high efficiency video coding (HEVC) encoder. This resulted in bit-redistribution, where greater bits and smaller partitioning were allocated to perceptually significant regions. Using a HEVC optimised processor the timing increase was < +4% and < +6% for video streaming and recording applications respectively, 1/3 of an existing low complexity PVC solution. Future work should be directed towards perceptual quantisation which offers the potential for perceptual coding gain
Contributions to the solution of the rate-distorsion optimization problem in video coding
In the last two decades, we have witnessed significant changes concerning the demand of video codecs. The diversity of services has significantly increased, high definition (HD) and beyond-HD resolutions have become a reality, the video traffic coming from mobile devices and tablets is increasing, the video-on-demand services are now playing a prominent role, and so on. All of these advances have converged to demand more powerful standard video codecs, the more recent ones being the H.264/Advanced Video Coding (H.264/AVC) and the latest High Efficiency Video Coding (HEVC), both generated by the Joint Collaborative Team on Video Coding (JCT-VC), a partnership of the ITU-T Video Coding Expert Group (VCEG) and the ISO/IED Moving Picture Expert Group (MEPG).
These two standards (and many others starting with the ITU-T H.261) rely on a hybrid model known as Differential Pulse Code Modulation (DPCM)/Discrete Cosine Transform (DCT) hybrid video coder, which involves a motion estimation and compensation phase followed by a transformation and quantization stages and an entropy coder. Moreover, each of these main subsystems is made of a number of interdependent and parametric modules that can be adapted to the particular video content.
The main problem arising from this approach is how to choose as best as possible the combination of the different parametrizations to achieve the most efficient coding of the current content. To solve this problem, one of the solutions proposed (and the one adopted in both the H.264/AVC and the HEVC reference encoder implementations) is the process referred to as rate-distortion optimization, which chooses a parametrization of the encoder based on the minimization of a cost function that considers the trade-off between rate and distortion, weighted by a Lagrange multiplier (��) which has been empirically obtained for both the H.264/AVC and the HEVC reference encoder implementations, aiming to provide a robust solution for a variety of video contents.
In this PhD. thesis, an exhaustive study of the influence of this Lagrangian parameter on different video sequences reveals that there are some common features that appear frequently in video sequences for which the adopted �� model (the reference model) becomes ineffective. Furthermore, we have found a notable margin of improvement in the coding efficiency of both coders when using a more adequate model for the Lagrangian parameter.
Thus, contributions of this thesis are the following: (i) to prove that the reference
Lagrangian model becomes ineffective in certain common situations; and (ii), propose generalized solutions to improve the robustness of the reference model, both for the H.264/AVC and the HEVC standards, obtaining important improvements in the coding efficiency. In both proposals, changes in the nature over the video sequence are taken into account, proposing models that adaptively consider the video content and minimize the increment in computational complexity.En las últimas dos décadas hemos sido testigos de importantes cambios en la demanda de codificadores de vÃdeo debido a múltiples factores: la diversidad de servicios se ha visto incrementada significativamente, la resolución high definition (HD) (e incluso mayores) se ha hecho realidad, el tráfico de vÃdeo procedente de dispositivos móviles y tabletas está aumentando y los servicios de vÃdeo bajo demanda son cada vez más comunes, entre otros muchos ejemplos. Todos estos avances convergen en la demanda de estándares de codificación de vÃdeo más potentes, siendo los más importantes el H.264/Advanced Video Coding (AVC) y el más reciente High Efficiency Video Coding (HEVC), ambos definidos por el Joint Collaborative Team on Video Coding (JCT-VC), una colaboraci´on entre el ITU-T Video Coding Expert
Group (VCEG) y el ISO/IED Moving Picture Expert Group (MPEG).
Estos dos estándares (y otros muchos, empezando con el ITU-T H.261) se basan en un modelo hÃbrido de codificador conocido como Differential Pulse Code Modulation (DPCM)/Discrete Cosine Transform (DCT), que está formado por una estimación y compensación de movimiento seguida de una etapa de transformación y cuantificación y un codificador entrópico. Además, cada uno de estos subsistemas está formado por un cierto número de módulos interdependientes y paramétricos que pueden adaptarse al contenido especÃfico de cada secuencia de vÃdeo.
El principal problema que surge de esta aproximación es cómo elegir de la forma más adecuada la combinación de las distintas parametrizaciones con el objetivo de alcanzar la codificación más eficiente posible del contenido que se está procesando.
Para resolver este problema, una de las soluciones propuestas es el proceso conocido como optimización tasa-distorsión, que se encarga de elegir una parametrización para el codificador basada en la minimización de una función de coste que considera el compromiso existente entre la tasa y la distorsión, ponderado por un multiplicador de Lagrange (�) que ha sido obtenido de forma empÃrica para las implementaciones de referencia del codificador tanto del estándar H.264/AVC como del estándar HEVC, con el objetivo de proponer una solución robusta para distintos tipos de contenidos de vÃdeo.
En esta tesis doctoral, un estudio exhaustivo de la influencia de este parámetro
lagrangiano en distintas secuencias de vÃdeo revela que existen algunas caracterÃsticas comunes que aparecen frecuentemente en secuencias de vÃdeo para las que el modelo � adoptado en las implementaciones de referencia resulta poco efectivo. Además, hemos encontrado un notable margen de mejora en la eficiencia de codificación de ambos codificadores usando un modelo más adecuado para este parámetro lagrangiano.
Por consiguiente, las contribuciones de esta tesis son las que siguen: (i) probar que el modelo lagrangiano de referencia resulta inefectivo bajo ciertas situaciones comunes; y (ii), proponer soluciones generalizadas para mejorar la robustez del modelo de referencia, tanto en el caso de H.264/AVC como en el de HEVC, obteniendo mejoras importantes en eficiencia de codificación. En ambas propuestas se tienen en cuenta los cambios en la naturaleza del contenido de una secuencia de vÃdeo proponiendo modelos que se adaptan dinámicamente a dicho contenido variable y que tienen en cuenta el incremento en la complejidad computacional del codificador.Programa Oficial de Doctorado en Multimedia y ComunicacionesPresidente: José Prades Nebot.- Secretario: Carmen Peláez Moreno.- Vocal: Julián Cabrera Quesad