60 research outputs found

    Visual Content Characterization Based on Encoding Rate-Distortion Analysis

    Get PDF
    Visual content characterization is a fundamentally important but under exploited step in dataset construction, which is essential in solving many image processing and computer vision problems. In the era of machine learning, this has become ever more important, because with the explosion of image and video content nowadays, scrutinizing all potential content is impossible and source content selection has become increasingly difficult. In particular, in the area of image/video coding and quality assessment, it is highly desirable to characterize/select source content and subsequently construct image/video datasets that demonstrate strong representativeness and diversity of the visual world, such that the visual coding and quality assessment methods developed from and validated using such datasets exhibit strong generalizability. Encoding Rate-Distortion (RD) analysis is essential for many multimedia applications. Examples of applications that explicitly use RD analysis include image encoder RD optimization, video quality assessment (VQA), and Quality of Experience (QoE) optimization of streaming videos etc. However, encoding RD analysis has not been well investigated in the context of visual content characterization. This thesis focuses on applying encoding RD analysis as a visual source content characterization method with image/video coding and quality assessment applications in mind. We first conduct a video quality subjective evaluation experiment for state-of-the-art video encoder performance analysis and comparison, where our observations reveal severe problems that motivate the needs of better source content characterization and selection methods. Then the effectiveness of RD analysis in visual source content characterization is demonstrated through a proposed quality control mechanism for video coding by eigen analysis in the space of General Quality Parameter (GQP) functions. Finally, by combining encoding RD analysis with submodular set function optimization, we propose a novel method for automating the process of representative source content selection, which helps boost the RD performance of visual encoders trained with the selected visual contents

    Beyond Transmitting Bits: Context, Semantics, and Task-Oriented Communications

    Get PDF
    Communication systems to date primarily aim at reliably communicating bit sequences. Such an approach provides efficient engineering designs that are agnostic to the meanings of the messages or to the goal that the message exchange aims to achieve. Next generation systems, however, can be potentially enriched by folding message semantics and goals of communication into their design. Further, these systems can be made cognizant of the context in which communication exchange takes place, thereby providing avenues for novel design insights. This tutorial summarizes the efforts to date, starting from its early adaptations, semantic-aware and task-oriented communications, covering the foundations, algorithms and potential implementations. The focus is on approaches that utilize information theory to provide the foundations, as well as the significant role of learning in semantics and task-aware communications

    Algorithms for complexity management in video coding

    Get PDF
    Nowadays, the applications based on video services are becoming very popular, e.g., the transmission of video sequences over the Internet or mobile networks, or the increasingly common use of the High Definition (HD) video signals in television or Blu-Ray systems. Thanks to this popularity of video services, video coding has become an essential tool to send and store digital video sequences. The standardization organizations have developed several video coding standards, being the most recent H.264/AVC and HEVC. Both standards achieve great results compressing the video signal by virtue of a set of spatio-temporal predictive techniques. Nevertheless, the efficacy of these techniques comes in exchange for a high increase in the computational cost of the video coding process. Due to the high complexity of these standards, a variety of algorithms attempting to control the computational burden of video coding have been developed. The goal of these algorithms is to control the coder complexity, using a specific amount of coding resources while keeping the coding efficiency as high as possible. In this PhD Thesis, we propose two algorithms devoted to control the complexity of the H.264/AVC and HEVC standards. Relying on the statistical properties of the video sequences, we will demonstrate that the developed methods are able to control the computational burden avoiding relevant losses in coding efficiency. Moreover, our proposals are designed to adapt their behavior according to the video content, as well as to different target complexities. The proposed methods have been thoroughly tested and compared with other state-of-the-art proposals for a variety of video resolutions, video sequences and coding configurations. The obtained results proved that our methods outperform other approaches and revealed that they are suitable for practical implementations of coding standards, where the computational complexity becomes a key feature for a proper design of the system.En la actualidad, la popularidad de las aplicaciones basadas en servicios de vídeo, como su transmisión sobre Internet o redes móviles, o el uso de la alta definición (HD) en sistemas de televisión o Blu-Ray, ha hecho que la codificación de vídeo se haya convertido en una herramienta imprescindible para poder transmitir y almacenar eficientemente secuencias de vídeo digitalizadas. Los organismos de estandarización han desarrollado diversos estándares de codificación de vídeo, siendo los más recientes H.264/AVC y HEVC. Ambos consiguen excelentes resultados a la hora de comprimir señales de vídeo, gracias a una serie de técnicas predictivas espacio-temporales. Sin embargo, la eficacia de estas técnicas tiene como contrapartida un considerable aumento en el coste computacional del proceso de codificación. Debido a la alta complejidad de estos estándares, se han desarrollado una gran cantidad de métodos para controlar el coste computacional del proceso de codificación. El objetivo de estos métodos es controlar la complejidad del codificador, utilizando para ello una cantidad de recursos específica mientras procuran maximizar la eficiencia del sistema. En esta Tesis, se proponen dos algoritmos dedicados a controlar la complejidad de los estándares H.264/AVC y HEVC. Apoyándose en las propiedades estadísticas de las secuencias de vídeo, demostraremos que los métodos desarrollados son capaces de controlar la complejidad sin incurrir en graves pérdidas de eficiencia de codificación. Además, nuestras propuestas se han diseñado para adaptar su funcionamiento al contenido de la secuencia de vídeo, así como a diferentes complejidades objetivo. Los métodos propuestos han sido ampliamente evaluados y comparados con otros sistemas del estado de la técnica, utilizando para ello una gran variedad de secuencias, resoluciones, y configuraciones de codificación, demostrando que alcanzan resultados superiores a los métodos con los que se han comparado. Adicionalmente, se ha puesto de manifiesto que resultan adecuados para implementaciones prácticas de los estándares de codificación, donde la complejidad computacional es un parámetro clave para el correcto diseño del sistema.Programa Oficial de Doctorado en Multimedia y ComunicacionesPresidente: Fernando Jaureguizar Núñez.- Secretario: Iván González Díaz.- Vocal: Javier Ruiz Hidalg

    Adaptive Streaming: From Bitrate Maximization to Rate-Distortion Optimization

    Get PDF
    The fundamental conflict between the increasing consumer demand for better Quality-of-Experience (QoE) and the limited supply of network resources has become significant challenges to modern video delivery systems. State-of-the-art adaptive bitrate (ABR) streaming algorithms are dedicated to drain available bandwidth in hope to improve viewers' QoE, resulting in inefficient use of network resources. In this thesis, we develop an alternative design paradigm, namely rate-distortion optimized streaming (RDOS), to balance the contrast demands from video consumers and service providers. Distinct from the traditional bitrate maximization paradigm, RDOS must operate at any given point along the rate-distortion curve, as specified by a trade-off parameter. The new paradigm has found plausible explanations in information theory, economics, and visual perception. To instantiate the new philosophy, we decompose adaptive streaming algorithms into three mutually independent components, including throughput predictor, reward function, and bitrate selector. We provide a unified framework to understand the connections among all existing ABR algorithms. The new perspective also illustrates the fundamental limitations of each algorithm by going behind its underlying assumptions. Based on the insights, we propose novel improvements to each of the three functional components. To alleviate a series of unrealistic assumptions behind bitrate-based QoE models, we develop a theoretically-grounded objective QoE model. The new objective QoE model combines the information from subject-rated streaming videos and the prior knowledge about human visual system (HVS) in a principled way. By analyzing a corpus of psychophysical experiments, we show the QoE function estimation can be formulated as a projection onto convex sets problem. The proposed model presents strong generalization capability over a broad range of source contents, video encoders, and viewing conditions. Most importantly, the QoE model disentangles bitrate with quality, making it an ideal component in the RDOS framework. In contrast to the existing throughput estimators that approximate the marginal probability distribution over all connections, we optimize the throughput predictor conditioned on each client. Although there are lack of training data for each Internet Protocol connection, we can leverage the latest advances in meta learning to incorporate the knowledge embedded in similar tasks. With a deliberately designed objective function, the algorithm learns to identify similar structures among different network characteristics from millions of realistic throughput traces. During the test phase, the model can quickly adapt to connection-level network characteristics with only a small amount of training data from novel streaming video clients with a small number of gradient steps. The enormous space of streaming videos, constantly progressing encoding schemes, and great diversity of throughput characteristics make it extremely challenging for modern data-driven bitrate selectors that are trained with limited samples to generalize well. To this end, we propose a Bayesian bitrate selection algorithm by adaptively fusing an online, robust, and short-term optimal controller with an offline, susceptible, and long-term optimal planner. Depending on the reliability of the two controllers in certain system states, the algorithm dynamically prioritizes the one of the two decision rules to obtain the optimal decision. To faithfully evaluate the performance of RDOS, we construct a large-scale streaming video dataset -- the Waterloo Streaming Video database. It contains a wide variety of high quality source contents, encoders, encoding profiles, realistic throughput traces, and viewing devices. Extensive objective evaluation demonstrates the proposed algorithm can deliver identical QoE to state-of-the-art ABR algorithms at a much lower cost. The improvement is also supported by so-far the largest subjective video quality assessment experiment

    Beyond Transmitting Bits: Context, Semantics, and Task-Oriented Communications

    Full text link
    Communication systems to date primarily aim at reliably communicating bit sequences. Such an approach provides efficient engineering designs that are agnostic to the meanings of the messages or to the goal that the message exchange aims to achieve. Next generation systems, however, can be potentially enriched by folding message semantics and goals of communication into their design. Further, these systems can be made cognizant of the context in which communication exchange takes place, providing avenues for novel design insights. This tutorial summarizes the efforts to date, starting from its early adaptations, semantic-aware and task-oriented communications, covering the foundations, algorithms and potential implementations. The focus is on approaches that utilize information theory to provide the foundations, as well as the significant role of learning in semantics and task-aware communications.Comment: 28 pages, 14 figure
    corecore