32 research outputs found
Audio Declipping with Social Sparsity
International audienceWe consider the audio declipping problem by using iterative thresholding algorithms and the principle of social sparsity. This recently introduced approach features thresholding/shrinkage operators which allow to model dependencies between neighboring coefficients in expansions with time-frequency dictionaries. A new unconstrained convex formulation of the audio declipping problem is introduced. The chosen structured thresholding operators are the so called \emph{windowed group-Lasso} and the \emph{persistent empirical Wiener}. The usage of these operators significantly improves the quality of the reconstruction, compared to simple soft-thresholding. The resulting algorithm is fast, simple to implement, and it outperforms the state of the art in terms of signal to noise ratio
Sparsity and cosparsity for audio declipping: a flexible non-convex approach
This work investigates the empirical performance of the sparse synthesis
versus sparse analysis regularization for the ill-posed inverse problem of
audio declipping. We develop a versatile non-convex heuristics which can be
readily used with both data models. Based on this algorithm, we report that, in
most cases, the two models perform almost similarly in terms of signal
enhancement. However, the analysis version is shown to be amenable for real
time audio processing, when certain analysis operators are considered. Both
versions outperform state-of-the-art methods in the field, especially for the
severely saturated signals
A Proper version of Synthesis-based Sparse Audio Declipper
Methods based on sparse representation have found great use in the recovery
of audio signals degraded by clipping. The state of the art in declipping has
been achieved by the SPADE algorithm by Kiti\'c et. al. (LVA/ICA2015). Our
recent study (LVA/ICA2018) has shown that although the original S-SPADE can be
improved such that it converges significantly faster than the A-SPADE, the
restoration quality is significantly worse. In the present paper, we propose a
new version of S-SPADE. Experiments show that the novel version of S-SPADE
outperforms its old version in terms of restoration quality, and that it is
comparable with the A-SPADE while being even slightly faster than A-SPADE
Introducing SPAIN (SParse Audio INpainter)
A novel sparsity-based algorithm for audio inpainting is proposed. It is an
adaptation of the SPADE algorithm by Kiti\'c et al., originally developed for
audio declipping, to the task of audio inpainting. The new SPAIN (SParse Audio
INpainter) comes in synthesis and analysis variants. Experiments show that both
A-SPAIN and S-SPAIN outperform other sparsity-based inpainting algorithms.
Moreover, A-SPAIN performs on a par with the state-of-the-art method based on
linear prediction in terms of the SNR, and, for larger gaps, SPAIN is even
slightly better in terms of the PEMO-Q psychoacoustic criterion
Revisiting Synthesis Model of Sparse Audio Declipper
The state of the art in audio declipping has currently been achieved by SPADE
(SParse Audio DEclipper) algorithm by Kiti\'c et al. Until now, the
synthesis/sparse variant, S-SPADE, has been considered significantly slower
than its analysis/cosparse counterpart, A-SPADE. It turns out that the opposite
is true: by exploiting a recent projection lemma, individual iterations of both
algorithms can be made equally computationally expensive, while S-SPADE tends
to require considerably fewer iterations to converge. In this paper, the two
algorithms are compared across a range of parameters such as the window length,
window overlap and redundancy of the transform. The experiments show that
although S-SPADE typically converges faster, the average performance in terms
of restoration quality is not superior to A-SPADE
Audio declipping performance enhancement via crossfading
Some audio declipping methods produce waveforms that do not fully respect the actual process of clipping and allow a deviation on the reliable samples. This article reports what effect on perception it has if the output of such “inconsistent” methods is pushed towards “consistent” solutions by postprocessing. We first propose a simple sample replacement method, then we identify its main weaknesses and propose an improved variant. The experiments show that the vast majority of inconsistent declipping methods significantly benefit from the proposed approach in terms of objective perceptual metrics. In particular, we show that the SS PEW method based on social sparsity combined with the proposed method performs comparable to top methods from the consistent class, but at a computational cost of one order of magnitude lower
Solving Audio Inverse Problems with a Diffusion Model
This paper presents CQT-Diff, a data-driven generative audio model that can,
once trained, be used for solving various different audio inverse problems in a
problem-agnostic setting. CQT-Diff is a neural diffusion model with an
architecture that is carefully constructed to exploit pitch-equivariant
symmetries in music. This is achieved by preconditioning the model with an
invertible Constant-Q Transform (CQT), whose logarithmically-spaced frequency
axis represents pitch equivariance as translation equivariance. The proposed
method is evaluated with objective and subjective metrics in three different
and varied tasks: audio bandwidth extension, inpainting, and declipping. The
results show that CQT-Diff outperforms the compared baselines and ablations in
audio bandwidth extension and, without retraining, delivers competitive
performance against modern baselines in audio inpainting and declipping. This
work represents the first diffusion-based general framework for solving inverse
problems in audio processing.Comment: Submitted to ICASSP 202