526 research outputs found
Introducing SPAIN (SParse Audio INpainter)
A novel sparsity-based algorithm for audio inpainting is proposed. It is an
adaptation of the SPADE algorithm by Kiti\'c et al., originally developed for
audio declipping, to the task of audio inpainting. The new SPAIN (SParse Audio
INpainter) comes in synthesis and analysis variants. Experiments show that both
A-SPAIN and S-SPAIN outperform other sparsity-based inpainting algorithms.
Moreover, A-SPAIN performs on a par with the state-of-the-art method based on
linear prediction in terms of the SNR, and, for larger gaps, SPAIN is even
slightly better in terms of the PEMO-Q psychoacoustic criterion
Inpainting of long audio segments with similarity graphs
We present a novel method for the compensation of long duration data loss in
audio signals, in particular music. The concealment of such signal defects is
based on a graph that encodes signal structure in terms of time-persistent
spectral similarity. A suitable candidate segment for the substitution of the
lost content is proposed by an intuitive optimization scheme and smoothly
inserted into the gap, i.e. the lost or distorted signal region. Extensive
listening tests show that the proposed algorithm provides highly promising
results when applied to a variety of real-world music signals
Multiple Hankel matrix rank minimization for audio inpainting
Sasaki et al. (2018) presented an efficient audio declipping algorithm, based
on the properties of Hankel-structured matrices constructed from time-domain
signal blocks. We adapt their approach to solve the audio inpainting problem,
where samples are missing in the signal. We analyze the algorithm and provide
modifications, some of them leading to an improved performance. Overall, it
turns out that the new algorithms perform reasonably well for speech signals
but they are not competitive in the case of music signals
Diffusion-Based Audio Inpainting
Audio inpainting aims to reconstruct missing segments in corrupted
recordings. Previous methods produce plausible reconstructions when the gap
length is shorter than about 100\;ms, but the quality decreases for longer
gaps. This paper explores recent advancements in deep learning and,
particularly, diffusion models, for the task of audio inpainting. The proposed
method uses an unconditionally trained generative model, which can be
conditioned in a zero-shot fashion for audio inpainting, offering high
flexibility to regenerate gaps of arbitrary length. An improved deep neural
network architecture based on the constant-Q transform, which allows the model
to exploit pitch-equivariant symmetries in audio, is also presented. The
performance of the proposed algorithm is evaluated through objective and
subjective metrics for the task of reconstructing short to mid-sized gaps. The
results of a formal listening test show that the proposed method delivers a
comparable performance against state-of-the-art for short gaps, while retaining
a good audio quality and outperforming the baselines for the longest gap
lengths tested, 150\;ms and 200\;ms. This work helps improve the restoration of
sound recordings having fairly long local disturbances or dropouts, which must
be reconstructed.Comment: Submitted for publication to the Journal of Audio Engineering Society
on January 30th, 202
Sparse Modeling for Image and Vision Processing
In recent years, a large amount of multi-disciplinary research has been
conducted on sparse models and their applications. In statistics and machine
learning, the sparsity principle is used to perform model selection---that is,
automatically selecting a simple model among a large collection of them. In
signal processing, sparse coding consists of representing data with linear
combinations of a few dictionary elements. Subsequently, the corresponding
tools have been widely adopted by several scientific communities such as
neuroscience, bioinformatics, or computer vision. The goal of this monograph is
to offer a self-contained view of sparse modeling for visual recognition and
image processing. More specifically, we focus on applications where the
dictionary is learned and adapted to data, yielding a compact representation
that has been successful in various contexts.Comment: 205 pages, to appear in Foundations and Trends in Computer Graphics
and Visio
- …