826 research outputs found

    MASCOT : metadata for advanced scalable video coding tools : final report

    Get PDF
    The goal of the MASCOT project was to develop new video coding schemes and tools that provide both an increased coding efficiency as well as extended scalability features compared to technology that was available at the beginning of the project. Towards that goal the following tools would be used: - metadata-based coding tools; - new spatiotemporal decompositions; - new prediction schemes. Although the initial goal was to develop one single codec architecture that was able to combine all new coding tools that were foreseen when the project was formulated, it became clear that this would limit the selection of the new tools. Therefore the consortium decided to develop two codec frameworks within the project, a standard hybrid DCT-based codec and a 3D wavelet-based codec, which together are able to accommodate all tools developed during the course of the project

    Resource-Constrained Low-Complexity Video Coding for Wireless Transmission

    Get PDF

    Efficient Scalable Video Coding Based on Matching Pursuits

    Get PDF

    Motion Scalability for Video Coding with Flexible Spatio-Temporal Decompositions

    Get PDF
    PhDThe research presented in this thesis aims to extend the scalability range of the wavelet-based video coding systems in order to achieve fully scalable coding with a wide range of available decoding points. Since the temporal redundancy regularly comprises the main portion of the global video sequence redundancy, the techniques that can be generally termed motion decorrelation techniques have a central role in the overall compression performance. For this reason the scalable motion modelling and coding are of utmost importance, and specifically, in this thesis possible solutions are identified and analysed. The main contributions of the presented research are grouped into two interrelated and complementary topics. Firstly a flexible motion model with rateoptimised estimation technique is introduced. The proposed motion model is based on tree structures and allows high adaptability needed for layered motion coding. The flexible structure for motion compensation allows for optimisation at different stages of the adaptive spatio-temporal decomposition, which is crucial for scalable coding that targets decoding on different resolutions. By utilising an adaptive choice of wavelet filterbank, the model enables high compression based on efficient mode selection. Secondly, solutions for scalable motion modelling and coding are developed. These solutions are based on precision limiting of motion vectors and creation of a layered motion structure that describes hierarchically coded motion. The solution based on precision limiting relies on layered bit-plane coding of motion vector values. The second solution builds on recently established techniques that impose scalability on a motion structure. The new approach is based on two major improvements: the evaluation of distortion in temporal Subbands and motion search in temporal subbands that finds the optimal motion vectors for layered motion structure. Exhaustive tests on the rate-distortion performance in demanding scalable video coding scenarios show benefits of application of both developed flexible motion model and various solutions for scalable motion coding

    Scalable light field representation and coding

    Get PDF
    This Thesis aims to advance the state-of-the-art in light field representation and coding. In this context, proposals to improve functionalities like light field random access and scalability are also presented. As the light field representation constrains the coding approach to be used, several light field coding techniques to exploit the inherent characteristics of the most popular types of light field representations are proposed and studied, which are normally based on micro-images or sub-aperture-images. To encode micro-images, two solutions are proposed, aiming to exploit the redundancy between neighboring micro-images using a high order prediction model, where the model parameters are either explicitly transmitted or inferred at the decoder, respectively. In both cases, the proposed solutions are able to outperform low order prediction solutions. To encode sub-aperture-images, an HEVC-based solution that exploits their inherent intra and inter redundancies is proposed. In this case, the light field image is encoded as a pseudo video sequence, where the scanning order is signaled, allowing the encoder and decoder to optimize the reference picture lists to improve coding efficiency. A novel hybrid light field representation coding approach is also proposed, by exploiting the combined use of both micro-image and sub-aperture-image representation types, instead of using each representation individually. In order to aid the fast deployment of the light field technology, this Thesis also proposes scalable coding and representation approaches that enable adequate compatibility with legacy displays (e.g., 2D, stereoscopic or multiview) and with future light field displays, while maintaining high coding efficiency. Additionally, viewpoint random access, allowing to improve the light field navigation and to reduce the decoding delay, is also enabled with a flexible trade-off between coding efficiency and viewpoint random access.Esta Tese tem como objetivo avançar o estado da arte em representação e codificação de campos de luz. Neste contexto, são também apresentadas propostas para melhorar funcionalidades como o acesso aleatório ao campo de luz e a escalabilidade. Como a representação do campo de luz limita a abordagem de codificação a ser utilizada, são propostas e estudadas várias técnicas de codificação de campos de luz para explorar as características inerentes aos seus tipos mais populares de representação, que são normalmente baseadas em micro-imagens ou imagens de sub-abertura. Para codificar as micro-imagens, são propostas duas soluções, visando explorar a redundância entre micro-imagens vizinhas utilizando um modelo de predição de alta ordem, onde os parâmetros do modelo são explicitamente transmitidos ou inferidos no decodificador, respetivamente. Em ambos os casos, as soluções propostas são capazes de superar as soluções de predição de baixa ordem. Para codificar imagens de sub-abertura, é proposta uma solução baseada em HEVC que explora a inerente redundância intra e inter deste tipo de imagens. Neste caso, a imagem do campo de luz é codificada como uma pseudo-sequência de vídeo, onde a ordem de varrimento é sinalizada, permitindo ao codificador e decodificador otimizar as listas de imagens de referência para melhorar a eficiência da codificação. Também é proposta uma nova abordagem de codificação baseada na representação híbrida do campo de luz, explorando o uso combinado dos tipos de representação de micro-imagem e sub-imagem, em vez de usar cada representação individualmente. A fim de facilitar a rápida implantação da tecnologia de campo de luz, esta Tese também propõe abordagens escaláveis de codificação e representação que permitem uma compatibilidade adequada com monitores tradicionais (e.g., 2D, estereoscópicos ou multivista) e com futuros monitores de campo de luz, mantendo ao mesmo tempo uma alta eficiência de codificação. Além disso, o acesso aleatório de pontos de vista, permitindo melhorar a navegação no campo de luz e reduzir o atraso na descodificação, também é permitido com um equilíbrio flexível entre eficiência de codificação e acesso aleatório de pontos de vista

    Receiver-Driven Video Adaptation

    Get PDF
    In the span of a single generation, video technology has made an incredible impact on daily life. Modern use cases for video are wildly diverse, including teleconferencing, live streaming, virtual reality, home entertainment, social networking, surveillance, body cameras, cloud gaming, and autonomous driving. As these applications continue to grow more sophisticated and heterogeneous, a single representation of video data can no longer satisfy all receivers. Instead, the initial encoding must be adapted to each receiver's unique needs. Existing adaptation strategies are fundamentally flawed, however, because they discard the video's initial representation and force the content to be re-encoded from scratch. This process is computationally expensive, does not scale well with the number of videos produced, and throws away important information embedded in the initial encoding. Therefore, a compelling need exists for the development of new strategies that can adapt video content without fully re-encoding it. To better support the unique needs of smart receivers, diverse displays, and advanced applications, general-use video systems should produce and offer receivers a more flexible compressed representation that supports top-down adaptation strategies from an original, compressed-domain ground truth. This dissertation proposes an alternate model for video adaptation that addresses these challenges. The key idea is to treat the initial compressed representation of a video as the ground truth, and allow receivers to drive adaptation by dynamically selecting which subsets of the captured data to receive. In support of this model, three strategies for top-down, receiver-driven adaptation are proposed. First, a novel, content-agnostic entropy coding technique is implemented in which symbols are selectively dropped from an input abstract symbol stream based on their estimated probability distributions to hit a target bit rate. Receivers are able to guide the symbol dropping process by supplying the encoder with an appropriate rate controller algorithm that fits their application needs and available bandwidths. Next, a domain-specific adaptation strategy is implemented for H.265/HEVC coded video in which the prediction data from the original source is reused directly in the adapted stream, but the residual data is recomputed as directed by the receiver. By tracking the changes made to the residual, the encoder can compensate for decoder drift to achieve near-optimal rate-distortion performance. Finally, a fully receiver-driven strategy is proposed in which the syntax elements of a pre-coded video are cataloged and exposed directly to clients through an HTTP API. Instead of requesting the entire stream at once, clients identify the exact syntax elements they wish to receive using a carefully designed query language. Although an implementation of this concept is not provided, an initial analysis shows that such a system could save bandwidth and computation when used by certain targeted applications.Doctor of Philosoph

    Flexible distribution of complexity by hybrid predictive-distributed video coding

    Get PDF
    There is currently limited flexibility for distributing complexity in a video coding system. While rate-distortion-complexity (RDC) optimization techniques have been proposed for conventional predictive video coding with encoder-side motion estimation, they fail to offer true flexible distribution of complexity between encoder and decoder since the encoder is assumed to have always more computational resources available than the decoder. On the other hand, distributed video coding solutions with decoder-side motion estimation have been proposed, but hardly any RDC optimized systems have been developed. To offer more flexibility for video applications involving multi-tasking or battery-constrained devices, in this paper, we propose a codec combining predictive video coding concepts and techniques from distributed video coding and show the flexibility of this method in distributing complexity. We propose several modes to code frames, and provide complexity analysis illustrating encoder and decoder computational complexity for each mode. Rate distortion results for each mode indicate that the coding efficiency is similar. We describe a method to choose which mode to use for coding each inter frame, taking into account encoder and decoder complexity constraints, and illustrate how complexity is distributed more flexibly
    corecore