    The ongoing research on Distributed Video Coding (DVC) is focused on flexibility and increased rate distortion (RD) performance. Besides Slepian-Wolf (SW) coding and coder control, a good side information (SI) quality is essential for high RD performance. The SI is typically extracted by temporal inter-/extrapolation. Fast motion is still a challenge. Two global motion guided adaptive temporal inter-/extrapolation (GTIE/GpTIE) schemes are proposed. They incorporate fast camera motion by global motion estimation and subsequent refinement. The issue of occlusion and revelation at the frame border is solved by an adaptive temporal inter-/extrapolation method. Simulation results show an RD performance increase of up to 3.1 dB

    Distributed Video Coding for Multiview and Video-plus-depth Coding

    Fusion of Global and Local Motion Estimation Using Foreground Objects for Distributed Video Coding

    International audienceThe side information in distributed video coding is estimated using the available decoded frames, and exploited for the decoding and reconstruction of other frames. The quality of the side information has a strong impact on the performance of distributed video coding. Here we propose a new approach that combines both global and local side information to improve coding performance. Since the background pixels in a frame are assigned to global estimation and the foreground objects to local estimation, one needs to estimate foreground objects in the side information using the backward and forward foreground objects, The background pixels are directly taken from the global side information. Specifically, elastic curves and local motion compensation are used to generate the foreground objects masks in the side information. Experimental results show that, as far as the rate-distortion performance is concerned, the proposed approach can achieve a PSNR improvement of up to 1.39 dB for a GOP size of 2, and up to 4.73 dB for larger GOP sizes, with respect to the reference DISCOVER codec. Index Terms A. ABOU-ELAILAH, F. DUFAUX, M. CAGNAZZO, and B. PESQUET-POPESCU are with the Signal and Image Processin

    Distributed Video Coding: Iterative Improvements

    The recently developed Distributed Video Coding (DVC) is typically suitable for the applications where the conventional video coding is not feasible because of its inherent high-complexity encoding. Examples include video surveillance usmg wireless/wired video sensor network and applications using mobile cameras etc. With DVC, the complexity is shifted from the encoder to the decoder. The practical application of DVC is referred to as Wyner-Ziv video coding (WZ) where an estimate of the original frame called "side information" is generated using motion compensation at the decoder. The compression is achieved by sending only that extra information that is needed to correct this estimation. An error-correcting code is used with the assumption that the estimate is a noisy version of the original frame and the rate needed is certain amount of the parity bits. The side information is assumed to have become available at the decoder through a virtual channel. Due to the limitation of compensation method, the predicted frame, or the side information, is expected to have varying degrees of success. These limitations stem from locationspecific non-stationary estimation noise. In order to avoid these, the conventional video coders, like MPEG, make use of frame partitioning to allocate optimum coder for each partition and hence achieve better rate-distortion performance. The same, however, has not been used in DVC as it increases the encoder complexity. This work proposes partitioning the considered frame into many coding units (region) where each unit is encoded differently. This partitioning is, however, done at the decoder while generating the side-information and the region map is sent over to encoder at very little rate penalty. The partitioning allows allocation of appropriate DVC coding parameters (virtual channel, rate, and quantizer) to each region. The resulting regions map is compressed by employing quadtree algorithm and communicated to the encoder via the feedback channel. The rate control in DVC is performed by channel coding techniques (turbo codes, LDPC, etc.). The performance of the channel code depends heavily on the accuracy of virtual channel model that models estimation error for each region. In this work, a turbo code has been used and an adaptive WZ DVC is designed both in transform domain and in pixel domain. The transform domain WZ video coding (TDWZ) has distinct superior performance as compared to the normal Pixel Domain Wyner-Ziv (PDWZ), since it exploits the ' spatial redundancy during the encoding. The performance evaluations show that the proposed system is superior to the existing distributed video coding solutions. Although the, proposed system requires extra bits representing the "regions map" to be transmitted, fuut still the rate gain is noticeable and it outperforms the state-of-the-art frame based DVC by 0.6-1.9 dB. The feedback channel (FC) has the role to adapt the bit rate to the changing ' statistics between the side infonmation and the frame to be encoded. In the unidirectional scenario, the encoder must perform the rate control. To correctly estimate the rate, the encoder must calculate typical side information. However, the rate cannot be exactly calculated at the encoder, instead it can only be estimated. This work also prbposes a feedback-free region-based adaptive DVC solution in pixel domain based on machine learning approach to estimate the side information. Although the performance evaluations show rate-penalty but it is acceptable considering the simplicity of the proposed algorithm. vii

    Ein Beitrag zur Pixel-basierten Verteilten Videocodierung: Seiteninformationsgenerierung, WZ-Codierung und flexible Decodierung

    Moderne Anwendungsszenarien, wie die individuelle Übertragung von Videodaten zwischen mobilen Endgeräten, stellen neue Herausforderungen an das Videoübertragungssystem. Hierbei liegt ein besonderer Fokus auf der geringen Komplexität des Videoencoders. Diese Anforderung kann mit Hilfe der Verteilten Videocodierung erfüllt werden. Im Fokus der vorliegenden Arbeit liegen die sehr geringe Encoderkomplexität sowie auch die Steigerung der Leistungsfähigkeit und die Verbesserung der Flexibilität des Decodierungsprozesses. Einer der wesentlichen Beiträge der Arbeit bezieht sich auf die Verbesserung der Seiteninformationsqualität durch temporale Interpolation

    Highly efficient low-level feature extraction for video representation and retrieval.

    PhDWitnessing the omnipresence of digital video media, the research community has raised the question of its meaningful use and management. Stored in immense multimedia databases, digital videos need to be retrieved and structured in an intelligent way, relying on the content and the rich semantics involved. Current Content Based Video Indexing and Retrieval systems face the problem of the semantic gap between the simplicity of the available visual features and the richness of user semantics. This work focuses on the issues of efficiency and scalability in video indexing and retrieval to facilitate a video representation model capable of semantic annotation. A highly efficient algorithm for temporal analysis and key-frame extraction is developed. It is based on the prediction information extracted directly from the compressed domain features and the robust scalable analysis in the temporal domain. Furthermore, a hierarchical quantisation of the colour features in the descriptor space is presented. Derived from the extracted set of low-level features, a video representation model that enables semantic annotation and contextual genre classification is designed. Results demonstrate the efficiency and robustness of the temporal analysis algorithm that runs in real time maintaining the high precision and recall of the detection task. Adaptive key-frame extraction and summarisation achieve a good overview of the visual content, while the colour quantisation algorithm efficiently creates hierarchical set of descriptors. Finally, the video representation model, supported by the genre classification algorithm, achieves excellent results in an automatic annotation system by linking the video clips with a limited lexicon of related keywords

    Intra-Key-Frame Coding and Side Information Generation Schemes in Distributed Video Coding

    In this thesis investigation has been made to propose improved schemes for intra-key-frame coding and side information (SI) generation in a distributed video coding (DVC) framework. From the DVC developments in last few years it has been observed that schemes put more thrust on intra-frame coding and better quality side information (SI) generation. In fact both are interrelated as SI generation is dependent on decoded key frame quality. Hence superior quality key frames generated through intra-key frame coding will in turn are utilized to generate good quality SI frames. As a result, DVC needs less number of parity bits to reconstruct the WZ frames at the decoder. Keeping this in mind, we have proposed two schemes for intra-key frame coding namely, (a) Borrows Wheeler Transform based H.264/AVC (Intra) intra-frame coding (BWT-H.264/AVC(Intra)) (b) Dictionary based H.264/AVC (Intra) intra-frame coding using orthogonal matching pursuit (DBOMP-H.264/AVC (Intra)) BWT-H.264/AVC (Intra) scheme is a modified version of H.264/AVC (Intra) scheme where a regularized bit stream is generated prior to compression. This scheme results in higher compression efficiency as well as high quality decoded key frames. DBOMP-H.264/AVC (Intra) scheme is based on an adaptive dictionary and H.264/AVC (Intra) intra-frame coding. The traditional transform is replaced with a dictionary trained with K-singular value decomposition (K-SVD) algorithm. The dictionary elements are coded using orthogonal matching pursuit (OMP). Further, two side information generation schemes have been suggested namely, (a) Multilayer Perceptron based side information generation (MLP - SI) (b) Multivariable support vector regression based side information generation (MSVR-SI) MLP-SI scheme utilizes a multilayer perceptron (MLP) to estimate SI frames from the decoded key frames block-by-block. The network is trained offline using training patterns from different frames collected from standard video sequences. MSVR-SI scheme uses an optimized multi variable support vector regression (M-SVR) to generate SI frames from decoded key frames block-by-block. Like MLP, the training for M-SVR is made offline with known training patterns apriori. Both intra-key-frame coding and SI generation schemes are embedded in the Stanford based DVC architecture and studied individually to compare performances with their competitive schemes. Visual as well as quantitative evaluations have been made to show the efficacy of the schemes. To exploit the usefulness of intra-frame coding schemes in SI generation, four hybrid schemes have been formulated by combining the aforesaid suggested schemes as follows: (a) BWT-MLP scheme that uses BWT-H.264/AVC (Intra) intra-frame coding scheme and MLP-SI side information generation scheme. (b) BWT-MSVR scheme, where we utilize BWT-H.264/AVC (Intra) for intra-frame coding followed by MSVR-SI based side information generation. (c) DBOMP-MLP scheme is an outcome of putting DBOMP-H.264/AVC (Intra) intra-frame coding and MLP-SI side information generation schemes. (d) DBOMP-MSVR scheme deals with DBOMP-H.264/AVC (Intra) intra-frame coding and MSVR-SI side information generation together. The hybrid schemes are also incorporated into the Stanford based DVC architecture and simulation has been carried out on standard video sequences. The performance analysis with respect to overall rate distortion, number requests per SI frame, temporal evaluation, and decoding time requirement has been made to derive an overall conclusion

    Video modeling via implicit motion representations

    Video modeling refers to the development of analytical representations for explaining the intensity distribution in video signals. Based on the analytical representation, we can develop algorithms for accomplishing particular video-related tasks. Therefore video modeling provides us a foundation to bridge video data and related-tasks. Although there are many video models proposed in the past decades, the rise of new applications calls for more efficient and accurate video modeling approaches.;Most existing video modeling approaches are based on explicit motion representations, where motion information is explicitly expressed by correspondence-based representations (i.e., motion velocity or displacement). Although it is conceptually simple, the limitations of those representations and the suboptimum of motion estimation techniques can degrade such video modeling approaches, especially for handling complex motion or non-ideal observation video data. In this thesis, we propose to investigate video modeling without explicit motion representation. Motion information is implicitly embedded into the spatio-temporal dependency among pixels or patches instead of being explicitly described by motion vectors.;Firstly, we propose a parametric model based on a spatio-temporal adaptive localized learning (STALL). We formulate video modeling as a linear regression problem, in which motion information is embedded within the regression coefficients. The coefficients are adaptively learned within a local space-time window based on LMMSE criterion. Incorporating a spatio-temporal resampling and a Bayesian fusion scheme, we can enhance the modeling capability of STALL on more general videos. Under the framework of STALL, we can develop video processing algorithms for a variety of applications by adjusting model parameters (i.e., the size and topology of model support and training window). We apply STALL on three video processing problems. The simulation results show that motion information can be efficiently exploited by our implicit motion representation and the resampling and fusion do help to enhance the modeling capability of STALL.;Secondly, we propose a nonparametric video modeling approach, which is not dependent on explicit motion estimation. Assuming the video sequence is composed of many overlapping space-time patches, we propose to embed motion-related information into the relationships among video patches and develop a generic sparsity-based prior for typical video sequences. First, we extend block matching to more general kNN-based patch clustering, which provides an implicit and distributed representation for motion information. We propose to enforce the sparsity constraint on a higher-dimensional data array signal, which is generated by packing the patches in the similar patch set. Then we solve the inference problem by updating the kNN array and the wanted signal iteratively. Finally, we present a Bayesian fusion approach to fuse multiple-hypothesis inferences. Simulation results in video error concealment, denoising, and deartifacting are reported to demonstrate its modeling capability.;Finally, we summarize the proposed two video modeling approaches. We also point out the perspectives of implicit motion representations in applications ranging from low to high level problems