15 research outputs found

    Selected topics on distributed video coding

    Get PDF
    Distributed Video Coding (DVC) is a new paradigm for video compression based on the information theoretical results of Slepian and Wolf (SW), and Wyner and Ziv (WZ). While conventional coding has a rigid complexity allocation as most of the complex tasks are performed at the encoder side, DVC enables a flexible complexity allocation between the encoder and the decoder. The most novel and interesting case is low complexity encoding and complex decoding, which is the opposite of conventional coding. While the latter is suitable for applications where the cost of the decoder is more critical than the encoder's one, DVC opens the door for a new range of applications where low complexity encoding is required and the decoder's complexity is not critical. This is interesting with the deployment of small and battery-powered multimedia mobile devices all around in our daily life. Further, since DVC operates as a reversed-complexity scheme when compared to conventional coding, DVC also enables the interesting scenario of low complexity encoding and decoding between two ends by transcoding between DVC and conventional coding. More specifically, low complexity encoding is possible by DVC at one end. Then, the resulting stream is decoded and conventionally re-encoded to enable low complexity decoding at the other end. Multiview video is attractive for a wide range of applications such as free viewpoint television, which is a system that allows viewing the scene from a viewpoint chosen by the viewer. Moreover, multiview can be beneficial for monitoring purposes in video surveillance. The increased use of multiview video systems is mainly due to the improvements in video technology and the reduced cost of cameras. While a multiview conventional codec will try to exploit the correlation among the different cameras at the encoder side, DVC allows for separate encoding of correlated video sources. Therefore, DVC requires no communication between the cameras in a multiview scenario. This is an advantage since communication is time consuming (i.e. more delay) and requires complex networking. Another appealing feature of DVC is the fact that it is based on a statistical framework. Moreover, DVC behaves as a natural joint source-channel coding solution. This results in an improved error resilience performance when compared to conventional coding. Further, DVC-based scalable codecs do not require a deterministic knowledge of the lower layers. In other words, the enhancement layers are completely independent from the base layer codec. This is called the codec-independent scalability feature, which offers a high flexibility in the way the various layers are distributed in a network. This thesis addresses the following topics: First, the theoretical foundations of DVC as well as the practical DVC scheme used in this research are presented. The potential applications for DVC are also outlined. DVC-based schemes use conventional coding to compress parts of the data, while the rest is compressed in a distributed fashion. Thus, different conventional codecs are studied in this research as they are compared in terms of compression efficiency for a rich set of sequences. This includes fine tuning the compression parameters such that the best performance is achieved for each codec. Further, DVC tools for improved Side Information (SI) and Error Concealment (EC) are introduced for monoview DVC using a partially decoded frame. The improved SI results in a significant gain in reconstruction quality for video with high activity and motion. This is done by re-estimating the erroneous motion vectors using the partially decoded frame to improve the SI quality. The latter is then used to enhance the reconstruction of the finally decoded frame. Further, the introduced spatio-temporal EC improves the quality of decoded video in the case of erroneously received packets, outperforming both spatial and temporal EC. Moreover, it also outperforms error-concealed conventional coding in different modes. Then, multiview DVC is studied in terms of SI generation, which differentiates it from the monoview case. More specifically, different multiview prediction techniques for SI generation are described and compared in terms of prediction quality, complexity and compression efficiency. Further, a technique for iterative multiview SI is introduced, where the final SI is used in an enhanced reconstruction process. The iterative SI outperforms the other SI generation techniques, especially for high motion video content. Finally, fusion techniques of temporal and inter-view side informations are introduced as well, which improves the performance of multiview DVC over monoview coding. DVC is also used to enable scalability for image and video coding. Since DVC is based on a statistical framework, the base and enhancement layers are completely independent, which is an interesting property called codec-independent scalability. Moreover, the introduced DVC scalable schemes show a good robustness to errors as the quality of decoded video steadily decreases with error rate increase. On the other hand, conventional coding exhibits a cliff effect as the performance drops dramatically after a certain error rate value. Further, the issue of privacy protection is addressed for DVC by transform domain scrambling, which is used to alter regions of interest in video such that the scene is still understood and privacy is preserved as well. The proposed scrambling techniques are shown to provide a good level of security without impairing the performance of the DVC scheme when compared to the one without scrambling. This is particularly attractive for video surveillance scenarios, which is one of the most promising applications for DVC. Finally, a practical DVC demonstrator built during this research is described, where the main requirements as well as the observed limitations are presented. Furthermore, it is defined in a setup being as close as possible to a complete real application scenario. This shows that it is actually possible to implement a complete end-to-end practical DVC system relying only on realistic assumptions. Even though DVC is inferior in terms of compression efficiency to the state of the art conventional coding for the moment, strengths of DVC reside in its good error resilience properties and the codec-independent scalability feature. Therefore, DVC offers promising possibilities for video compression with transmission over error-prone environments requirement as it significantly outperforms conventional coding in this case

    Towards a High Quality Real-Time Graphics Pipeline

    Get PDF
    Modern graphics hardware pipelines create photorealistic images with high geometric complexity in real time. The quality is constantly improving and advanced techniques from feature film visual effects, such as high dynamic range images and support for higher-order surface primitives, have recently been adopted. Visual effect techniques have large computational costs and significant memory bandwidth usage. In this thesis, we identify three problem areas and propose new algorithms that increase the performance of a set of computer graphics techniques. Our main focus is on efficient algorithms for the real-time graphics pipeline, but parts of our research are equally applicable to offline rendering. Our first focus is texture compression, which is a technique to reduce the memory bandwidth usage. The core idea is to store images in small compressed blocks which are sent over the memory bus and are decompressed on-the-fly when accessed. We present compression algorithms for two types of texture formats. High dynamic range images capture environment lighting with luminance differences over a wide intensity range. Normal maps store perturbation vectors for local surface normals, and give the illusion of high geometric surface detail. Our compression formats are tailored to these texture types and have compression ratios of 6:1, high visual fidelity, and low-cost decompression logic. Our second focus is tessellation culling. Culling is a commonly used technique in computer graphics for removing work that does not contribute to the final image, such as completely hidden geometry. By discarding rendering primitives from further processing, substantial arithmetic computations and memory bandwidth can be saved. Modern graphics processing units include flexible tessellation stages, where rendering primitives are subdivided for increased geometric detail. Images with highly detailed models can be synthesized, but the incurred cost is significant. We have devised a simple remapping technique that allowsfor better tessellation distribution in screen space. Furthermore, we present programmable tessellation culling, where bounding volumes for displaced geometry are computed and used to conservatively test if a primitive can be discarded before tessellation. We introduce a general tessellation culling framework, and an optimized algorithm for rendering of displaced Bézier patches, which is expected to be a common use case for graphics hardware tessellation. Our third and final focus is forward-looking, and relates to efficient algorithms for stochastic rasterization, a rendering technique where camera effects such as depth of field and motion blur can be faithfully simulated. We extend a graphics pipeline with stochastic rasterization in spatio-temporal space and show that stochastic motion blur can be rendered with rather modest pipeline modifications. Furthermore, backface culling algorithms for motion blur and depth of field rendering are presented, which are directly applicable to stochastic rasterization. Hopefully, our work in this field brings us closer to high quality real-time stochastic rendering

    Image Registration Workshop Proceedings

    Get PDF
    Automatic image registration has often been considered as a preliminary step for higher-level processing, such as object recognition or data fusion. But with the unprecedented amounts of data which are being and will continue to be generated by newly developed sensors, the very topic of automatic image registration has become and important research topic. This workshop presents a collection of very high quality work which has been grouped in four main areas: (1) theoretical aspects of image registration; (2) applications to satellite imagery; (3) applications to medical imagery; and (4) image registration for computer vision research

    Multiresolution image models and estimation techniques

    Get PDF

    4D (3D Dynamic) statistical models of conversational expressions and the synthesis of highly-realistic 4D facial expression sequences

    Get PDF
    In this thesis, a novel approach for modelling 4D (3D Dynamic) conversational interactions and synthesising highly-realistic expression sequences is described. To achieve these goals, a fully-automatic, fast, and robust pre-processing pipeline was developed, along with an approach for tracking and inter-subject registering 3D sequences (4D data). A method for modelling and representing sequences as single entities is also introduced. These sequences can be manipulated and used for synthesising new expression sequences. Classification experiments and perceptual studies were performed to validate the methods and models developed in this work. To achieve the goals described above, a 4D database of natural, synced, dyadic conversations was captured. This database is the first of its kind in the world. Another contribution of this thesis is the development of a novel method for modelling conversational interactions. Our approach takes into account the time-sequential nature of the interactions, and encompasses the characteristics of each expression in an interaction, as well as information about the interaction itself. Classification experiments were performed to evaluate the quality of our tracking, inter-subject registration, and modelling methods. To evaluate our ability to model, manipulate, and synthesise new expression sequences, we conducted perceptual experiments. For these perceptual studies, we manipulated modelled sequences by modifying their amplitudes, and had human observers evaluate the level of expression realism and image quality. To evaluate our coupled modelling approach for conversational facial expression interactions, we performed a classification experiment that differentiated predicted frontchannel and backchannel sequences, using the original sequences in the training set. We also used the predicted backchannel sequences in a perceptual study in which human observers rated the level of similarity of the predicted and original sequences. The results of these experiments help support our methods and our claim of our ability to produce 4D, highly-realistic expression sequences that compete with state-of-the-art methods

    Adaptively subsampled image coding with warped polynomials

    No full text

    Object Recognition

    Get PDF
    Vision-based object recognition tasks are very familiar in our everyday activities, such as driving our car in the correct lane. We do these tasks effortlessly in real-time. In the last decades, with the advancement of computer technology, researchers and application developers are trying to mimic the human's capability of visually recognising. Such capability will allow machine to free human from boring or dangerous jobs
    corecore