4,314 research outputs found

    Matching pursuits video coding: dictionaries and fast implementation

    Get PDF

    On unifying sparsity and geometry for image-based 3D scene representation

    Get PDF
    Demand has emerged for next generation visual technologies that go beyond conventional 2D imaging. Such technologies should capture and communicate all perceptually relevant three-dimensional information about an environment to a distant observer, providing a satisfying, immersive experience. Camera networks offer a low cost solution to the acquisition of 3D visual information, by capturing multi-view images from different viewpoints. However, the camera's representation of the data is not ideal for common tasks such as data compression or 3D scene analysis, as it does not make the 3D scene geometry explicit. Image-based scene representations fundamentally require a multi-view image model that facilitates extraction of underlying geometrical relationships between the cameras and scene components. Developing new, efficient multi-view image models is thus one of the major challenges in image-based 3D scene representation methods. This dissertation focuses on defining and exploiting a new method for multi-view image representation, from which the 3D geometry information is easily extractable, and which is additionally highly compressible. The method is based on sparse image representation using an overcomplete dictionary of geometric features, where a single image is represented as a linear combination of few fundamental image structure features (edges for example). We construct the dictionary by applying a unitary operator to an analytic function, which introduces a composition of geometric transforms (translations, rotation and anisotropic scaling) to that function. The advantage of this approach is that the features across multiple views can be related with a single composition of transforms. We then establish a connection between image components and scene geometry by defining the transforms that satisfy the multi-view geometry constraint, and obtain a new geometric multi-view correlation model. We first address the construction of dictionaries for images acquired by omnidirectional cameras, which are particularly convenient for scene representation due to their wide field of view. Since most omnidirectional images can be uniquely mapped to spherical images, we form a dictionary by applying motions on the sphere, rotations, and anisotropic scaling to a function that lives on the sphere. We have used this dictionary and a sparse approximation algorithm, Matching Pursuit, for compression of omnidirectional images, and additionally for coding 3D objects represented as spherical signals. Both methods offer better rate-distortion performance than state of the art schemes at low bit rates. The novel multi-view representation method and the dictionary on the sphere are then exploited for the design of a distributed coding method for multi-view omnidirectional images. In a distributed scenario, cameras compress acquired images without communicating with each other. Using a reliable model of correlation between views, distributed coding can achieve higher compression ratios than independent compression of each image. However, the lack of a proper model has been an obstacle for distributed coding in camera networks for many years. We propose to use our geometric correlation model for distributed multi-view image coding with side information. The encoder employs a coset coding strategy, developed by dictionary partitioning based on atom shape similarity and multi-view geometry constraints. Our method results in significant rate savings compared to independent coding. An additional contribution of the proposed correlation model is that it gives information about the scene geometry, leading to a new camera pose estimation method using an extremely small amount of data from each camera. Finally, we develop a method for learning stereo visual dictionaries based on the new multi-view image model. Although dictionary learning for still images has received a lot of attention recently, dictionary learning for stereo images has been investigated only sparingly. Our method maximizes the likelihood that a set of natural stereo images is efficiently represented with selected stereo dictionaries, where the multi-view geometry constraint is included in the probabilistic modeling. Experimental results demonstrate that including the geometric constraints in learning leads to stereo dictionaries that give both better distributed stereo matching and approximation properties than randomly selected dictionaries. We show that learning dictionaries for optimal scene representation based on the novel correlation model improves the camera pose estimation and that it can be beneficial for distributed coding

    Efficient compression of motion compensated residuals

    Get PDF
    EThOS - Electronic Theses Online ServiceGBUnited Kingdo

    Joint coding/decoding techniques and diversity techniques for video and HTML transmission over wireless point/multipoint: a survey

    Get PDF
    I. Introduction The concomitant developments of the Internet, which offers to its users always larger and more evolved contents (from HTML (HyperText Markup Language) files to multimedia applications), and of wireless systems and handhelds integrating them, have progressively convinced a fair share of people of the interest to always be connected. Still, constraints of heterogeneity, reliability, quality and delay over the transmission channels are generally imposed to fulfill the requirements of these new needs and their corresponding economical goals. This implies different theoretical and practical challenges for the digital communications community of the present time. This paper presents a survey of the different techniques existing in the domain of HTML and video stream transmission over erroneous or lossy channels. In particular, the existing techniques on joint source and channel coding and decoding for multimedia or HTML applications are surveyed, as well as the related problems of streaming and downloading files over an IP mobile link. Finally, various diversity techniques that can be considered for such links, from antenna diversity to coding diversity, are presented...L’engouement du grand public pour les applications multimédia sans fil ne cesse de croître depuis le développement d’Internet. Des contraintes d’hétérogénéité de canaux de transmission, de fiabilité, de qualité et de délai sont généralement exigées pour satisfaire les nouveaux besoins applicatifs entraînant ainsi des enjeux économiques importants. À l’heure actuelle, il reste encore un certain nombre de défis pratiques et théoriques lancés par les chercheurs de la communauté des communications numériques. C’est dans ce cadre que s’inscrit le panorama présenté ici. Cet article présente d’une part un état de l’art sur les principales techniques de codage et de décodage conjoint développées dans la littérature pour des applications multimédia de type téléchargement et diffusion de contenu sur lien mobile IP. Sont tout d’abord rappelées des notions fondamentales des communications numériques à savoir le codage de source, le codage de canal ainsi que les théorèmes de Shannon et leurs principales limitations. Les techniques de codage décodage conjoint présentées dans cet article concernent essentiellement celles développées pour des schémas de codage de source faisant intervenir des codes à longueur variable (CLV) notamment les codes d’Huffman, arithmétiques et les codes entropiques universels de type Lempel-Ziv (LZ). Faisant face au problème de la transmission de données (Hypertext Markup Language (HTML) et vidéo) sur un lien sans fil, cet article présente d’autre part un panorama de techniques de diversités plus ou moins complexes en vue d’introduire le nouveau système à multiples antennes d’émission et de réception

    Distributed multi-view image coding with learned dictionaries

    Get PDF
    This paper addresses the problem of distributed image coding in camera neworks. The correlation between multiple images of a scene captured from different viewpoints can be effiiciently modeled by local geometric transforms of prominent images features. Such features can be efficiently represented by sparse approximation algorithms using geometric dictionaries of various waveforms, called atoms. When the dictionaries are built on geometrical transformations of some generating functions, the features in different images can be paired with simple local geometrical transforms, such as scaling, rotation or translations. The construction of the dictionary however represents a trade-off between approximation performance that generally improves with the size of the dictionary, and cost for coding the atoms indexes. We propose a learning algorithm for the construction of dictionaries adapted to stereo omnidirectional images. The algorithm is based on a maximum likelihood solution that results in atoms adapted to both image approximation and stereo matching. We then use the learned dictionary in a Wyner-Ziv multi-view image coder built on a geometrical correlation model. The experimental results show that the learned dictionary improves the rate- distortion performance of the Wyner-Ziv coder at low bit rates compared to a baseline parametric dictionary

    Rain Removal in Traffic Surveillance: Does it Matter?

    Get PDF
    Varying weather conditions, including rainfall and snowfall, are generally regarded as a challenge for computer vision algorithms. One proposed solution to the challenges induced by rain and snowfall is to artificially remove the rain from images or video using rain removal algorithms. It is the promise of these algorithms that the rain-removed image frames will improve the performance of subsequent segmentation and tracking algorithms. However, rain removal algorithms are typically evaluated on their ability to remove synthetic rain on a small subset of images. Currently, their behavior is unknown on real-world videos when integrated with a typical computer vision pipeline. In this paper, we review the existing rain removal algorithms and propose a new dataset that consists of 22 traffic surveillance sequences under a broad variety of weather conditions that all include either rain or snowfall. We propose a new evaluation protocol that evaluates the rain removal algorithms on their ability to improve the performance of subsequent segmentation, instance segmentation, and feature tracking algorithms under rain and snow. If successful, the de-rained frames of a rain removal algorithm should improve segmentation performance and increase the number of accurately tracked features. The results show that a recent single-frame-based rain removal algorithm increases the segmentation performance by 19.7% on our proposed dataset, but it eventually decreases the feature tracking performance and showed mixed results with recent instance segmentation methods. However, the best video-based rain removal algorithm improves the feature tracking accuracy by 7.72%.Comment: Published in IEEE Transactions on Intelligent Transportation System

    Application and Theory of Multimedia Signal Processing Using Machine Learning or Advanced Methods

    Get PDF
    This Special Issue is a book composed by collecting documents published through peer review on the research of various advanced technologies related to applications and theories of signal processing for multimedia systems using ML or advanced methods. Multimedia signals include image, video, audio, character recognition and optimization of communication channels for networks. The specific contents included in this book are data hiding, encryption, object detection, image classification, and character recognition. Academics and colleagues who are interested in these topics will find it interesting to read

    Devolopment of Mean and Median Based Adaptive Search Algorithm for Motion Estimation in SNR Scalable Video Coding

    Get PDF
    Now a day’s quality of video in encoding is challenging in many video applications like video conferences, live streaming and video surveillance. The development of technology has resulted in invention of various devices, different network conditions and many more. This has made video coding challenging day by day. An answer to the need of all can be scalable video coding, where a single bit stream contains more than one layer known as base and enhancement layers respectively. There are various types of scalability as spatial, SNR, temporal scalability. Among these three types of scalability, SNR scalability deals with the quality of the frames i.e. base layers includes least quality frames and enhancement layer gets frames with better quality. Motion estimation is the most important aspect of video coding. Usually the adjacent frames of a video are very much similar to each other. Hence to increase the coding efficiency to remove redundancy as well as to reduce computational complexity,motion should be estimatedand compensated.Hence, in the scalable video coding, videos have been encoded in SNR scalability mode and then the motion estimation has been carried out by two proposed methods.The approach depends on eliminating the unnecessary blocks, which have not undergone motion, by taking the specific threshold value for every search region. It is desirable to reduce the time of computation to increase the efficiency but keeping in view that not at the cost of much quality. In second method, the search method has been optimized using ‘particle swarm optimization’ (PSO) technique, which is a method of computation aims at optimizing a problem with the help of popular candidate solutions.In block matching based on PSO, a swarm of particles will fly in random directions in search window of reference frame, which can be indexed by the horizontal and vertical coordinates of the center pixel of the candidate block. These algorithm mainly used to reducing the computational time by checking some random position points in the search window for finding out the best match.PSO algorithm estimate the motion with very low complexity in the context of video estimation. Both the methods have been analyzed and performance have been compared with various video sequences.The proposed technique out performs to the existing techniques in terms of computational complexity and video qualit

    Error Resilience in Heterogeneous Visual Communications

    Get PDF
    A critical and challenging aspect of visual communication technologies is to immunize visual information to transmission errors. In order to effectively protect visual content against transmission errors, various kinds of heterogeneities involved in multimedia delivery need to be considered, such as compressed stream characteristics heterogeneity, channel condition heterogeneity, multi-user and multi-hop heterogeneity. The main theme of this dissertation is to explore these heterogeneities involved in error-resilient visual communications to deliver different visual content over heterogeneous networks with good visual quality. Concurrently transmitting multiple video streams in error-prone environment faces many challenges, such as video content characteristics are heterogeneous, transmission bandwidth is limited, and the user device capabilities vary. These challenges prompt the need for an integrated approach of error protection and resource allocation. One motivation of this dissertation is to develop such an integrated approach for an emerging application of multi-stream video aggregation, i.e. multi-point video conferencing. We propose a distributed multi-point video conferencing system that employs packet division multiplexing access (PDMA)-based error protection and resource allocation, and explore the multi-hop awareness to deliver good and fair visual quality of video streams to end users. When the transport layer mechanism, such as forward error correction (FEC), cannot provide sufficient error protection on the payload stream, the unrecovered transmission errors may lead to visual distortions at the decoder. In order to mitigate the visual distortions caused by the unrecovered errors, concealment techniques can be applied at the decoder to provide an approximation of the original content. Due to image characteristics heterogeneity, different concealment approaches are necessary to accommodate different nature of the lost image content. We address this heterogeneity issue and propose to apply a classification framework that adaptively selects the suitable error concealment technique for each damaged image area. The analysis and extensive experimental results in this dissertation demonstrate that the proposed integrated approach of FEC and resource allocation as well as the new classification-based error concealment approach can significantly outperform conventional error-resilient approaches
    corecore