2,955 research outputs found

    Structured Sparsity Models for Multiparty Speech Recovery from Reverberant Recordings

    Get PDF
    We tackle the multi-party speech recovery problem through modeling the acoustic of the reverberant chambers. Our approach exploits structured sparsity models to perform room modeling and speech recovery. We propose a scheme for characterizing the room acoustic from the unknown competing speech sources relying on localization of the early images of the speakers by sparse approximation of the spatial spectra of the virtual sources in a free-space model. The images are then clustered exploiting the low-rank structure of the spectro-temporal components belonging to each source. This enables us to identify the early support of the room impulse response function and its unique map to the room geometry. To further tackle the ambiguity of the reflection ratios, we propose a novel formulation of the reverberation model and estimate the absorption coefficients through a convex optimization exploiting joint sparsity model formulated upon spatio-spectral sparsity of concurrent speech representation. The acoustic parameters are then incorporated for separating individual speech signals through either structured sparse recovery or inverse filtering the acoustic channels. The experiments conducted on real data recordings demonstrate the effectiveness of the proposed approach for multi-party speech recovery and recognition.Comment: 31 page

    An Efficient Algorithm for Video Super-Resolution Based On a Sequential Model

    Get PDF
    In this work, we propose a novel procedure for video super-resolution, that is the recovery of a sequence of high-resolution images from its low-resolution counterpart. Our approach is based on a "sequential" model (i.e., each high-resolution frame is supposed to be a displaced version of the preceding one) and considers the use of sparsity-enforcing priors. Both the recovery of the high-resolution images and the motion fields relating them is tackled. This leads to a large-dimensional, non-convex and non-smooth problem. We propose an algorithmic framework to address the latter. Our approach relies on fast gradient evaluation methods and modern optimization techniques for non-differentiable/non-convex problems. Unlike some other previous works, we show that there exists a provably-convergent method with a complexity linear in the problem dimensions. We assess the proposed optimization method on {several video benchmarks and emphasize its good performance with respect to the state of the art.}Comment: 37 pages, SIAM Journal on Imaging Sciences, 201

    A Joint Intensity and Depth Co-Sparse Analysis Model for Depth Map Super-Resolution

    Full text link
    High-resolution depth maps can be inferred from low-resolution depth measurements and an additional high-resolution intensity image of the same scene. To that end, we introduce a bimodal co-sparse analysis model, which is able to capture the interdependency of registered intensity and depth information. This model is based on the assumption that the co-supports of corresponding bimodal image structures are aligned when computed by a suitable pair of analysis operators. No analytic form of such operators exist and we propose a method for learning them from a set of registered training signals. This learning process is done offline and returns a bimodal analysis operator that is universally applicable to natural scenes. We use this to exploit the bimodal co-sparse analysis model as a prior for solving inverse problems, which leads to an efficient algorithm for depth map super-resolution.Comment: 13 pages, 4 figure

    Sparse-to-Continuous: Enhancing Monocular Depth Estimation using Occupancy Maps

    Full text link
    This paper addresses the problem of single image depth estimation (SIDE), focusing on improving the quality of deep neural network predictions. In a supervised learning scenario, the quality of predictions is intrinsically related to the training labels, which guide the optimization process. For indoor scenes, structured-light-based depth sensors (e.g. Kinect) are able to provide dense, albeit short-range, depth maps. On the other hand, for outdoor scenes, LiDARs are considered the standard sensor, which comparatively provides much sparser measurements, especially in areas further away. Rather than modifying the neural network architecture to deal with sparse depth maps, this article introduces a novel densification method for depth maps, using the Hilbert Maps framework. A continuous occupancy map is produced based on 3D points from LiDAR scans, and the resulting reconstructed surface is projected into a 2D depth map with arbitrary resolution. Experiments conducted with various subsets of the KITTI dataset show a significant improvement produced by the proposed Sparse-to-Continuous technique, without the introduction of extra information into the training stage.Comment: Accepted. (c) 2019 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other work

    Sparse Modeling for Image and Vision Processing

    Get PDF
    In recent years, a large amount of multi-disciplinary research has been conducted on sparse models and their applications. In statistics and machine learning, the sparsity principle is used to perform model selection---that is, automatically selecting a simple model among a large collection of them. In signal processing, sparse coding consists of representing data with linear combinations of a few dictionary elements. Subsequently, the corresponding tools have been widely adopted by several scientific communities such as neuroscience, bioinformatics, or computer vision. The goal of this monograph is to offer a self-contained view of sparse modeling for visual recognition and image processing. More specifically, we focus on applications where the dictionary is learned and adapted to data, yielding a compact representation that has been successful in various contexts.Comment: 205 pages, to appear in Foundations and Trends in Computer Graphics and Visio
    • 

    corecore