243 research outputs found

    Edge-preserving depth-map coding using graph-based wavelets

    Get PDF
    Projecte final de carrera realitzat en col.laboració amb University of Southern CaliforniaThis thesis presents a new wavelet transform speci cally designed for the coding of depth images which are used in view synthesis operations. Two basic properties of these images can be leveraged: rst, errors in pixels located near the edges of objects have a greater perceptual impact on the synthesized view; second, they can be approximated as piece-wise planar signals. We make use of these facts to de ne a discrete wavelet transform using lifting that avoids ltering across edges. The lters are designed to t the planar shape of the signal. This leads to an e cient representation of the image while preserving the sharpness of the edges. By preserving the edge information, we are able to improve the quality of the synthesized views, as compared to existing methods.

    Toward sparse and geometry adapted video approximations

    Get PDF
    Video signals are sequences of natural images, where images are often modeled as piecewise-smooth signals. Hence, video can be seen as a 3D piecewise-smooth signal made of piecewise-smooth regions that move through time. Based on the piecewise-smooth model and on related theoretical work on rate-distortion performance of wavelet and oracle based coding schemes, one can better analyze the appropriate coding strategies that adaptive video codecs need to implement in order to be efficient. Efficient video representations for coding purposes require the use of adaptive signal decompositions able to capture appropriately the structure and redundancy appearing in video signals. Adaptivity needs to be such that it allows for proper modeling of signals in order to represent these with the lowest possible coding cost. Video is a very structured signal with high geometric content. This includes temporal geometry (normally represented by motion information) as well as spatial geometry. Clearly, most of past and present strategies used to represent video signals do not exploit properly its spatial geometry. Similarly to the case of images, a very interesting approach seems to be the decomposition of video using large over-complete libraries of basis functions able to represent salient geometric features of the signal. In the framework of video, these features should model 2D geometric video components as well as their temporal evolution, forming spatio-temporal 3D geometric primitives. Through this PhD dissertation, different aspects on the use of adaptivity in video representation are studied looking toward exploiting both aspects of video: its piecewise nature and the geometry. The first part of this work studies the use of localized temporal adaptivity in subband video coding. This is done considering two transformation schemes used for video coding: 3D wavelet representations and motion compensated temporal filtering. A theoretical R-D analysis as well as empirical results demonstrate how temporal adaptivity improves coding performance of moving edges in 3D transform (without motion compensation) based video coding. Adaptivity allows, at the same time, to equally exploit redundancy in non-moving video areas. The analogy between motion compensated video and 1D piecewise-smooth signals is studied as well. This motivates the introduction of local length adaptivity within frame-adaptive motion compensated lifted wavelet decompositions. This allows an optimal rate-distortion performance when video motion trajectories are shorter than the transformation "Group Of Pictures", or when efficient motion compensation can not be ensured. After studying temporal adaptivity, the second part of this thesis is dedicated to understand the fundamentals of how can temporal and spatial geometry be jointly exploited. This work builds on some previous results that considered the representation of spatial geometry in video (but not temporal, i.e, without motion). In order to obtain flexible and efficient (sparse) signal representations, using redundant dictionaries, the use of highly non-linear decomposition algorithms, like Matching Pursuit, is required. General signal representation using these techniques is still quite unexplored. For this reason, previous to the study of video representation, some aspects of non-linear decomposition algorithms and the efficient decomposition of images using Matching Pursuits and a geometric dictionary are investigated. A part of this investigation concerns the study on the influence of using a priori models within approximation non-linear algorithms. Dictionaries with a high internal coherence have some problems to obtain optimally sparse signal representations when used with Matching Pursuits. It is proved, theoretically and empirically, that inserting in this algorithm a priori models allows to improve the capacity to obtain sparse signal approximations, mainly when coherent dictionaries are used. Another point discussed in this preliminary study, on the use of Matching Pursuits, concerns the approach used in this work for the decompositions of video frames and images. The technique proposed in this thesis improves a previous work, where authors had to recur to sub-optimal Matching Pursuit strategies (using Genetic Algorithms), given the size of the functions library. In this work the use of full search strategies is made possible, at the same time that approximation efficiency is significantly improved and computational complexity is reduced. Finally, a priori based Matching Pursuit geometric decompositions are investigated for geometric video representations. Regularity constraints are taken into account to recover the temporal evolution of spatial geometric signal components. The results obtained for coding and multi-modal (audio-visual) signal analysis, clarify many unknowns and show to be promising, encouraging to prosecute research on the subject

    Multimedia Applications of the Wavelet Transform

    Get PDF
    This dissertation investigates novel applications of the wavelet transform in the analysis and compression of audio, still images, and video. Most recently, some surveys have been published on the restoration of noisy audio signals. Based on these, we have developed a wavelet-based denoising program for audio signals that allows flexible parameter settings. The multiscale property of the wavelet transform can successfully be exploited for the detection of semantic structures in images: A comparison of the coefficients allows the extraction of a predominant structure. This idea forms the basis of our semiautomatic edge detection algorithm. Empirical evaluations and the resulting recommendations follow. In the context of the teleteaching project Virtual University of the Upper Rhine Valley (VIROR), many lectures were transmitted between remote locations. We thus encountered the problem of scalability of a video stream for different access bandwidths in the Internet. A substantial contribution of this dissertation is the introduction of the wavelet transform into hierarchical video coding and the recommendation of parameter settings based on empirical surveys. Furthermore, a prototype implementation proves the principal feasibility of a wavelet-based, nearly arbitrarily scalable application. Mathematical transformations constitute a commonly underestimated problem for students in their first semesters of study. Motivated by the VIROR project, we spent a considerable amount of time and effort on the exploration of approaches to enhance mathematical topics with multimedia; both the technical design and the didactic integration into the curriculum are discussed. In a large field trial on "traditional teaching versus multimedia-enhanced teaching", the objective knowledge gained by the students was measured. This allows us to objectively rate positive the efficiency of our teaching modules

    Robust density modelling using the student's t-distribution for human action recognition

    Full text link
    The extraction of human features from videos is often inaccurate and prone to outliers. Such outliers can severely affect density modelling when the Gaussian distribution is used as the model since it is highly sensitive to outliers. The Gaussian distribution is also often used as base component of graphical models for recognising human actions in the videos (hidden Markov model and others) and the presence of outliers can significantly affect the recognition accuracy. In contrast, the Student's t-distribution is more robust to outliers and can be exploited to improve the recognition rate in the presence of abnormal data. In this paper, we present an HMM which uses mixtures of t-distributions as observation probabilities and show how experiments over two well-known datasets (Weizmann, MuHAVi) reported a remarkable improvement in classification accuracy. © 2011 IEEE
    • …
    corecore