157 research outputs found

    High Performance Multiview Video Coding

    Get PDF
    Following the standardization of the latest video coding standard High Efficiency Video Coding in 2013, in 2014, multiview extension of HEVC (MV-HEVC) was published and brought significantly better compression performance of around 50% for multiview and 3D videos compared to multiple independent single-view HEVC coding. However, the extremely high computational complexity of MV-HEVC demands significant optimization of the encoder. To tackle this problem, this work investigates the possibilities of using modern parallel computing platforms and tools such as single-instruction-multiple-data (SIMD) instructions, multi-core CPU, massively parallel GPU, and computer cluster to significantly enhance the MVC encoder performance. The aforementioned computing tools have very different computing characteristics and misuse of the tools may result in poor performance improvement and sometimes even reduction. To achieve the best possible encoding performance from modern computing tools, different levels of parallelism inside a typical MVC encoder are identified and analyzed. Novel optimization techniques at various levels of abstraction are proposed, non-aggregation massively parallel motion estimation (ME) and disparity estimation (DE) in prediction unit (PU), fractional and bi-directional ME/DE acceleration through SIMD, quantization parameter (QP)-based early termination for coding tree unit (CTU), optimized resource-scheduled wave-front parallel processing for CTU, and workload balanced, cluster-based multiple-view parallel are proposed. The result shows proposed parallel optimization techniques, with insignificant loss to coding efficiency, significantly improves the execution time performance. This , in turn, proves modern parallel computing platforms, with appropriate platform-specific algorithm design, are valuable tools for improving the performance of computationally intensive applications

    Low Complexity Multiview Video Coding

    Get PDF
    3D video is a technology that has seen a tremendous attention in the recent years. Multiview Video Coding (MVC) is an extension of the popular H.264 video coding standard and is commonly used to compress 3D videos. It offers an improvement of 20% to 50% in compression efficiency over simulcast encoding of multiview videos using the conventional H.264 video coding standard. However, there are two important problems associated with it: (i) its superior compression performance comes at the cost of significantly higher computational complexity which hampers the real-world realization of MVC encoder in applications such as 3D live broadcasting and interactive Free Viewpoint Television (FTV), and (ii) compressed 3D videos can suffer from packet loss during transmission, which can degrade the viewing quality of the 3D video at the decoder. This thesis aims to solve these problems by presenting techniques to reduce the computational complexity of the MVC encoder and by proposing a consistent error concealment technique for frame losses in 3D video transmission. The thesis first analyses the complexity of the MVC encoder. It then proposes two novel techniques to reduce the complexity of motion and disparity estimation. The first method achieves complexity reduction in the disparity estimation process by exploiting the relationship between temporal levels, type of macroblocks and search ranges while the second method achieves it by exploiting the geometrical relation- ship between motion and disparity vectors in stereo frames. These two methods are then combined with other state-of-the-art methods in a unique framework where gains add up. Experimental results show that the proposed low-complexity framework can reduce the encoding time of the standard MVC encoder by over 93% while maintaining similar compression efficiency performance. The addition of new View Synthesis Prediction (VSP) modes to the MVC encoding framework improves the compression efficiency of MVC. However, testing additional modes comes at the cost of increased encoding complexity. In order to reduce the encoding complexity, the thesis, next, proposes a bayesian early mode decision technique for a VSP enhanced MVC coder. It exploits the statistical similarities between the RD costs of the VSP SKIP mode in neighbouring views to terminate the mode decision process early. Results indicate that the proposed technique can reduce the encoding time of the enhanced MVC coder by over 33% at similar compression efficiency levels. Finally, compressed 3D videos are usually required to be broadcast to a large number of users where transmission errors can lead to frame losses which can degrade the video quality at the decoder. A simple reconstruction of the lost frames can lead to inconsistent reconstruction of the 3D scene which may negatively affect the viewing experience of a user. In order to solve this problem, the thesis proposes, at the end, a consistency model for recovering frames lost during transmission. The proposed consistency model is used to evaluate inter-view and temporal consistencies while selecting candidate blocks for concealment. Experimental results show that the proposed technique is able to recover the lost frames with high consistency and better quality than two standard error concealment methods and a baseline technique based on the boundary matching algorithm

    Scalable light field representation and coding

    Get PDF
    This Thesis aims to advance the state-of-the-art in light field representation and coding. In this context, proposals to improve functionalities like light field random access and scalability are also presented. As the light field representation constrains the coding approach to be used, several light field coding techniques to exploit the inherent characteristics of the most popular types of light field representations are proposed and studied, which are normally based on micro-images or sub-aperture-images. To encode micro-images, two solutions are proposed, aiming to exploit the redundancy between neighboring micro-images using a high order prediction model, where the model parameters are either explicitly transmitted or inferred at the decoder, respectively. In both cases, the proposed solutions are able to outperform low order prediction solutions. To encode sub-aperture-images, an HEVC-based solution that exploits their inherent intra and inter redundancies is proposed. In this case, the light field image is encoded as a pseudo video sequence, where the scanning order is signaled, allowing the encoder and decoder to optimize the reference picture lists to improve coding efficiency. A novel hybrid light field representation coding approach is also proposed, by exploiting the combined use of both micro-image and sub-aperture-image representation types, instead of using each representation individually. In order to aid the fast deployment of the light field technology, this Thesis also proposes scalable coding and representation approaches that enable adequate compatibility with legacy displays (e.g., 2D, stereoscopic or multiview) and with future light field displays, while maintaining high coding efficiency. Additionally, viewpoint random access, allowing to improve the light field navigation and to reduce the decoding delay, is also enabled with a flexible trade-off between coding efficiency and viewpoint random access.Esta Tese tem como objetivo avançar o estado da arte em representação e codificação de campos de luz. Neste contexto, são também apresentadas propostas para melhorar funcionalidades como o acesso aleatório ao campo de luz e a escalabilidade. Como a representação do campo de luz limita a abordagem de codificação a ser utilizada, são propostas e estudadas várias técnicas de codificação de campos de luz para explorar as características inerentes aos seus tipos mais populares de representação, que são normalmente baseadas em micro-imagens ou imagens de sub-abertura. Para codificar as micro-imagens, são propostas duas soluções, visando explorar a redundância entre micro-imagens vizinhas utilizando um modelo de predição de alta ordem, onde os parâmetros do modelo são explicitamente transmitidos ou inferidos no decodificador, respetivamente. Em ambos os casos, as soluções propostas são capazes de superar as soluções de predição de baixa ordem. Para codificar imagens de sub-abertura, é proposta uma solução baseada em HEVC que explora a inerente redundância intra e inter deste tipo de imagens. Neste caso, a imagem do campo de luz é codificada como uma pseudo-sequência de vídeo, onde a ordem de varrimento é sinalizada, permitindo ao codificador e decodificador otimizar as listas de imagens de referência para melhorar a eficiência da codificação. Também é proposta uma nova abordagem de codificação baseada na representação híbrida do campo de luz, explorando o uso combinado dos tipos de representação de micro-imagem e sub-imagem, em vez de usar cada representação individualmente. A fim de facilitar a rápida implantação da tecnologia de campo de luz, esta Tese também propõe abordagens escaláveis de codificação e representação que permitem uma compatibilidade adequada com monitores tradicionais (e.g., 2D, estereoscópicos ou multivista) e com futuros monitores de campo de luz, mantendo ao mesmo tempo uma alta eficiência de codificação. Além disso, o acesso aleatório de pontos de vista, permitindo melhorar a navegação no campo de luz e reduzir o atraso na descodificação, também é permitido com um equilíbrio flexível entre eficiência de codificação e acesso aleatório de pontos de vista

    Research and developments of distributed video coding

    Get PDF
    This thesis was submitted for the degree of Doctor of Philosophy and awarded by Brunel University.The recent developed Distributed Video Coding (DVC) is typically suitable for the applications such as wireless/wired video sensor network, mobile camera etc. where the traditional video coding standard is not feasible due to the constrained computation at the encoder. With DVC, the computational burden is moved from encoder to decoder. The compression efficiency is achieved via joint decoding at the decoder. The practical application of DVC is referred to Wyner-Ziv video coding (WZ) where the side information is available at the decoder to perform joint decoding. This join decoding inevitably causes a very complex decoder. In current WZ video coding issues, many of them emphasise how to improve the system coding performance but neglect the huge complexity caused at the decoder. The complexity of the decoder has direct influence to the system output. The beginning period of this research targets to optimise the decoder in pixel domain WZ video coding (PDWZ), while still achieves similar compression performance. More specifically, four issues are raised to optimise the input block size, the side information generation, the side information refinement process and the feedback channel respectively. The transform domain WZ video coding (TDWZ) has distinct superior performance to the normal PDWZ due to the exploitation in spatial direction during the encoding. However, since there is no motion estimation at the encoder in WZ video coding, the temporal correlation is not exploited at all at the encoder in all current WZ video coding issues. In the middle period of this research, the 3D DCT is adopted in the TDWZ to remove redundancy in both spatial and temporal direction thus to provide even higher coding performance. In the next step of this research, the performance of transform domain Distributed Multiview Video Coding (DMVC) is also investigated. Particularly, three types transform domain DMVC frameworks which are transform domain DMVC using TDWZ based 2D DCT, transform domain DMVC using TDWZ based on 3D DCT and transform domain residual DMVC using TDWZ based on 3D DCT are investigated respectively. One of the important applications of WZ coding principle is error-resilience. There have been several attempts to apply WZ error-resilient coding for current video coding standard e.g. H.264/AVC or MEPG 2. The final stage of this research is the design of WZ error-resilient scheme for wavelet based video codec. To balance the trade-off between error resilience ability and bandwidth consumption, the proposed scheme emphasises the protection of the Region of Interest (ROI) area. The efficiency of bandwidth utilisation is achieved by mutual efforts of WZ coding and sacrificing the quality of unimportant area. In summary, this research work contributed to achieves several advances in WZ video coding. First of all, it is targeting to build an efficient PDWZ with optimised decoder. Secondly, it aims to build an advanced TDWZ based on 3D DCT, which then is applied into multiview video coding to realise advanced transform domain DMVC. Finally, it aims to design an efficient error-resilient scheme for wavelet video codec, with which the trade-off between bandwidth consumption and error-resilience can be better balanced

    Robust density modelling using the student's t-distribution for human action recognition

    Full text link
    The extraction of human features from videos is often inaccurate and prone to outliers. Such outliers can severely affect density modelling when the Gaussian distribution is used as the model since it is highly sensitive to outliers. The Gaussian distribution is also often used as base component of graphical models for recognising human actions in the videos (hidden Markov model and others) and the presence of outliers can significantly affect the recognition accuracy. In contrast, the Student's t-distribution is more robust to outliers and can be exploited to improve the recognition rate in the presence of abnormal data. In this paper, we present an HMM which uses mixtures of t-distributions as observation probabilities and show how experiments over two well-known datasets (Weizmann, MuHAVi) reported a remarkable improvement in classification accuracy. © 2011 IEEE

    Architectures for Adaptive Low-Power Embedded Multimedia Systems

    Get PDF
    This Ph.D. thesis describes novel hardware/software architectures for adaptive low-power embedded multimedia systems. Novel techniques for run-time adaptive energy management are proposed, such that both HW & SW adapt together to react to the unpredictable scenarios. A complete power-aware H.264 video encoder was developed. Comparison with state-of-the-art demonstrates significant energy savings while meeting the performance constraint and keeping the video quality degradation unnoticeable

    End to end Multi-Objective Optimisation of H.264 and HEVC Codecs

    Get PDF
    All multimedia devices now incorporate video CODECs that comply with international video coding standards such as H.264 / MPEG4-AVC and the new High Efficiency Video Coding Standard (HEVC) otherwise known as H.265. Although the standard CODECs have been designed to include algorithms with optimal efficiency, large number of coding parameters can be used to fine tune their operation, within known constraints of for e.g., available computational power, bandwidth, consumer QoS requirements, etc. With large number of such parameters involved, determining which parameters will play a significant role in providing optimal quality of service within given constraints is a further challenge that needs to be met. Further how to select the values of the significant parameters so that the CODEC performs optimally under the given constraints is a further important question to be answered. This thesis proposes a framework that uses machine learning algorithms to model the performance of a video CODEC based on the significant coding parameters. Means of modelling both the Encoder and Decoder performance is proposed. We define objective functions that can be used to model the performance related properties of a CODEC, i.e., video quality, bit-rate and CPU time. We show that these objective functions can be practically utilised in video Encoder/Decoder designs, in particular in their performance optimisation within given operational and practical constraints. A Multi-objective Optimisation framework based on Genetic Algorithms is thus proposed to optimise the performance of a video codec. The framework is designed to jointly minimize the CPU Time, Bit-rate and to maximize the quality of the compressed video stream. The thesis presents the use of this framework in the performance modelling and multi-objective optimisation of the most widely used video coding standard in practice at present, H.264 and the latest video coding standard, H.265/HEVC. When a communication network is used to transmit video, performance related parameters of the communication channel will impact the end-to-end performance of the video CODEC. Network delays and packet loss will impact the quality of the video that is received at the decoder via the communication channel, i.e., even if a video CODEC is optimally configured network conditions will make the experience sub-optimal. Given the above the thesis proposes a design, integration and testing of a novel approach to simulating a wired network and the use of UDP protocol for the transmission of video data. This network is subsequently used to simulate the impact of packet loss and network delays on optimally coded video based on the framework previously proposed for the modelling and optimisation of video CODECs. The quality of received video under different levels of packet loss and network delay is simulated, concluding the impact on transmitted video based on their content and features
    corecore