809 research outputs found

    Sparse stereo image coding with learned dictionaries

    Get PDF
    This paper proposes a framework for stereo image coding with effective representation of geometry in 3D scenes. We propose a joint sparse approximation framework for pairs of perspective images that are represented as linear expansions of atoms selected from a dictionary of geometric functions learned on a database of stereo perspective images. We then present a coding solution where atoms are selected iteratively as a trade-off between distortion and consistency of the geometry information. Experimental results on stereo images from the Middlebury database show that the new coder achieves better rate-distortion performance compared to the MPEG4-part10 scheme, at all rates. In addition to good rate-distortion performance, our flexible framework permits to build consistent image representations that capture the geometry of the scene. It certainly represents a promising solution towards the design of multi-view coding algorithms where the compressed stream inherently contains rich information about 3D geometry

    Learning sparse representations of depth

    Full text link
    This paper introduces a new method for learning and inferring sparse representations of depth (disparity) maps. The proposed algorithm relaxes the usual assumption of the stationary noise model in sparse coding. This enables learning from data corrupted with spatially varying noise or uncertainty, typically obtained by laser range scanners or structured light depth cameras. Sparse representations are learned from the Middlebury database disparity maps and then exploited in a two-layer graphical model for inferring depth from stereo, by including a sparsity prior on the learned features. Since they capture higher-order dependencies in the depth structure, these priors can complement smoothness priors commonly used in depth inference based on Markov Random Field (MRF) models. Inference on the proposed graph is achieved using an alternating iterative optimization technique, where the first layer is solved using an existing MRF-based stereo matching algorithm, then held fixed as the second layer is solved using the proposed non-stationary sparse coding algorithm. This leads to a general method for improving solutions of state of the art MRF-based depth estimation algorithms. Our experimental results first show that depth inference using learned representations leads to state of the art denoising of depth maps obtained from laser range scanners and a time of flight camera. Furthermore, we show that adding sparse priors improves the results of two depth estimation methods: the classical graph cut algorithm by Boykov et al. and the more recent algorithm of Woodford et al.Comment: 12 page

    Distributed multi-view image coding with learned dictionaries

    Get PDF
    This paper addresses the problem of distributed image coding in camera neworks. The correlation between multiple images of a scene captured from different viewpoints can be effiiciently modeled by local geometric transforms of prominent images features. Such features can be efficiently represented by sparse approximation algorithms using geometric dictionaries of various waveforms, called atoms. When the dictionaries are built on geometrical transformations of some generating functions, the features in different images can be paired with simple local geometrical transforms, such as scaling, rotation or translations. The construction of the dictionary however represents a trade-off between approximation performance that generally improves with the size of the dictionary, and cost for coding the atoms indexes. We propose a learning algorithm for the construction of dictionaries adapted to stereo omnidirectional images. The algorithm is based on a maximum likelihood solution that results in atoms adapted to both image approximation and stereo matching. We then use the learned dictionary in a Wyner-Ziv multi-view image coder built on a geometrical correlation model. The experimental results show that the learned dictionary improves the rate- distortion performance of the Wyner-Ziv coder at low bit rates compared to a baseline parametric dictionary

    Fast Dictionary Learning for Sparse Representations of Speech Signals

    Get PDF
    © 2011 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works. Published version: IEEE Journal of Selected Topics in Signal Processing 5(5): 1025-1031, Sep 2011. DOI: 10.1109/JSTSP.2011.2157892

    Audio Source Separation Using Sparse Representations

    Get PDF
    This is the author's final version of the article, first published as A. Nesbit, M. G. Jafari, E. Vincent and M. D. Plumbley. Audio Source Separation Using Sparse Representations. In W. Wang (Ed), Machine Audition: Principles, Algorithms and Systems. Chapter 10, pp. 246-264. IGI Global, 2011. ISBN 978-1-61520-919-4. DOI: 10.4018/978-1-61520-919-4.ch010file: NesbitJafariVincentP11-audio.pdf:n\NesbitJafariVincentP11-audio.pdf:PDF owner: markp timestamp: 2011.02.04file: NesbitJafariVincentP11-audio.pdf:n\NesbitJafariVincentP11-audio.pdf:PDF owner: markp timestamp: 2011.02.04The authors address the problem of audio source separation, namely, the recovery of audio signals from recordings of mixtures of those signals. The sparse component analysis framework is a powerful method for achieving this. Sparse orthogonal transforms, in which only few transform coefficients differ significantly from zero, are developed; once the signal has been transformed, energy is apportioned from each transform coefficient to each estimated source, and, finally, the signal is reconstructed using the inverse transform. The overriding aim of this chapter is to demonstrate how this framework, as exemplified here by two different decomposition methods which adapt to the signal to represent it sparsely, can be used to solve different problems in different mixing scenarios. To address the instantaneous (neither delays nor echoes) and underdetermined (more sources than mixtures) mixing model, a lapped orthogonal transform is adapted to the signal by selecting a basis from a library of predetermined bases. This method is highly related to the windowing methods used in the MPEG audio coding framework. In considering the anechoic (delays but no echoes) and determined (equal number of sources and mixtures) mixing case, a greedy adaptive transform is used based on orthogonal basis functions that are learned from the observed data, instead of being selected from a predetermined library of bases. This is found to encode the signal characteristics, by introducing a feedback system between the bases and the observed data. Experiments on mixtures of speech and music signals demonstrate that these methods give good signal approximations and separation performance, and indicate promising directions for future research

    Sparse Coding on Stereo Video for Object Detection

    Get PDF
    Deep Convolutional Neural Networks (DCNN) require millions of labeled training examples for image classification and object detection tasks, which restrict these models to domains where such datasets are available. In this paper, we explore the use of unsupervised sparse coding applied to stereo-video data to help alleviate the need for large amounts of labeled data. We show that replacing a typical supervised convolutional layer with an unsupervised sparse-coding layer within a DCNN allows for better performance on a car detection task when only a limited number of labeled training examples is available. Furthermore, the network that incorporates sparse coding allows for more consistent performance over varying initializations and ordering of training examples when compared to a fully supervised DCNN. Finally, we compare activations between the unsupervised sparse-coding layer and the supervised convolutional layer, and show that the sparse representation exhibits an encoding that is depth selective, whereas encodings from the convolutional layer do not exhibit such selectivity. These result indicates promise for using unsupervised sparse-coding approaches in real-world computer vision tasks in domains with limited labeled training data

    Simultaneous Codeword Optimization (SimCO) for Dictionary Update and Learning

    Get PDF
    We consider the data-driven dictionary learning problem. The goal is to seek an over-complete dictionary from which every training signal can be best approximated by a linear combination of only a few codewords. This task is often achieved by iteratively executing two operations: sparse coding and dictionary update. In the literature, there are two benchmark mechanisms to update a dictionary. The first approach, such as the MOD algorithm, is characterized by searching for the optimal codewords while fixing the sparse coefficients. In the second approach, represented by the K-SVD method, one codeword and the related sparse coefficients are simultaneously updated while all other codewords and coefficients remain unchanged. We propose a novel framework that generalizes the aforementioned two methods. The unique feature of our approach is that one can update an arbitrary set of codewords and the corresponding sparse coefficients simultaneously: when sparse coefficients are fixed, the underlying optimization problem is similar to that in the MOD algorithm; when only one codeword is selected for update, it can be proved that the proposed algorithm is equivalent to the K-SVD method; and more importantly, our method allows us to update all codewords and all sparse coefficients simultaneously, hence the term simultaneous codeword optimization (SimCO). Under the proposed framework, we design two algorithms, namely, primitive and regularized SimCO. We implement these two algorithms based on a simple gradient descent mechanism. Simulations are provided to demonstrate the performance of the proposed algorithms, as compared with two baseline algorithms MOD and K-SVD. Results show that regularized SimCO is particularly appealing in terms of both learning performance and running speed.Comment: 13 page
    corecore