2,955 research outputs found
Structured Sparsity Models for Multiparty Speech Recovery from Reverberant Recordings
We tackle the multi-party speech recovery problem through modeling the
acoustic of the reverberant chambers. Our approach exploits structured sparsity
models to perform room modeling and speech recovery. We propose a scheme for
characterizing the room acoustic from the unknown competing speech sources
relying on localization of the early images of the speakers by sparse
approximation of the spatial spectra of the virtual sources in a free-space
model. The images are then clustered exploiting the low-rank structure of the
spectro-temporal components belonging to each source. This enables us to
identify the early support of the room impulse response function and its unique
map to the room geometry. To further tackle the ambiguity of the reflection
ratios, we propose a novel formulation of the reverberation model and estimate
the absorption coefficients through a convex optimization exploiting joint
sparsity model formulated upon spatio-spectral sparsity of concurrent speech
representation. The acoustic parameters are then incorporated for separating
individual speech signals through either structured sparse recovery or inverse
filtering the acoustic channels. The experiments conducted on real data
recordings demonstrate the effectiveness of the proposed approach for
multi-party speech recovery and recognition.Comment: 31 page
An Efficient Algorithm for Video Super-Resolution Based On a Sequential Model
In this work, we propose a novel procedure for video super-resolution, that
is the recovery of a sequence of high-resolution images from its low-resolution
counterpart. Our approach is based on a "sequential" model (i.e., each
high-resolution frame is supposed to be a displaced version of the preceding
one) and considers the use of sparsity-enforcing priors. Both the recovery of
the high-resolution images and the motion fields relating them is tackled. This
leads to a large-dimensional, non-convex and non-smooth problem. We propose an
algorithmic framework to address the latter. Our approach relies on fast
gradient evaluation methods and modern optimization techniques for
non-differentiable/non-convex problems. Unlike some other previous works, we
show that there exists a provably-convergent method with a complexity linear in
the problem dimensions. We assess the proposed optimization method on {several
video benchmarks and emphasize its good performance with respect to the state
of the art.}Comment: 37 pages, SIAM Journal on Imaging Sciences, 201
A Joint Intensity and Depth Co-Sparse Analysis Model for Depth Map Super-Resolution
High-resolution depth maps can be inferred from low-resolution depth
measurements and an additional high-resolution intensity image of the same
scene. To that end, we introduce a bimodal co-sparse analysis model, which is
able to capture the interdependency of registered intensity and depth
information. This model is based on the assumption that the co-supports of
corresponding bimodal image structures are aligned when computed by a suitable
pair of analysis operators. No analytic form of such operators exist and we
propose a method for learning them from a set of registered training signals.
This learning process is done offline and returns a bimodal analysis operator
that is universally applicable to natural scenes. We use this to exploit the
bimodal co-sparse analysis model as a prior for solving inverse problems, which
leads to an efficient algorithm for depth map super-resolution.Comment: 13 pages, 4 figure
Sparse-to-Continuous: Enhancing Monocular Depth Estimation using Occupancy Maps
This paper addresses the problem of single image depth estimation (SIDE),
focusing on improving the quality of deep neural network predictions. In a
supervised learning scenario, the quality of predictions is intrinsically
related to the training labels, which guide the optimization process. For
indoor scenes, structured-light-based depth sensors (e.g. Kinect) are able to
provide dense, albeit short-range, depth maps. On the other hand, for outdoor
scenes, LiDARs are considered the standard sensor, which comparatively provides
much sparser measurements, especially in areas further away. Rather than
modifying the neural network architecture to deal with sparse depth maps, this
article introduces a novel densification method for depth maps, using the
Hilbert Maps framework. A continuous occupancy map is produced based on 3D
points from LiDAR scans, and the resulting reconstructed surface is projected
into a 2D depth map with arbitrary resolution. Experiments conducted with
various subsets of the KITTI dataset show a significant improvement produced by
the proposed Sparse-to-Continuous technique, without the introduction of extra
information into the training stage.Comment: Accepted. (c) 2019 IEEE. Personal use of this material is permitted.
Permission from IEEE must be obtained for all other uses, in any current or
future media, including reprinting/republishing this material for advertising
or promotional purposes, creating new collective works, for resale or
redistribution to servers or lists, or reuse of any copyrighted component of
this work in other work
Sparse Modeling for Image and Vision Processing
In recent years, a large amount of multi-disciplinary research has been
conducted on sparse models and their applications. In statistics and machine
learning, the sparsity principle is used to perform model selection---that is,
automatically selecting a simple model among a large collection of them. In
signal processing, sparse coding consists of representing data with linear
combinations of a few dictionary elements. Subsequently, the corresponding
tools have been widely adopted by several scientific communities such as
neuroscience, bioinformatics, or computer vision. The goal of this monograph is
to offer a self-contained view of sparse modeling for visual recognition and
image processing. More specifically, we focus on applications where the
dictionary is learned and adapted to data, yielding a compact representation
that has been successful in various contexts.Comment: 205 pages, to appear in Foundations and Trends in Computer Graphics
and Visio
- âŠ