32 research outputs found

    Audio Declipping with Social Sparsity

    No full text
    International audienceWe consider the audio declipping problem by using iterative thresholding algorithms and the principle of social sparsity. This recently introduced approach features thresholding/shrinkage operators which allow to model dependencies between neighboring coefficients in expansions with time-frequency dictionaries. A new unconstrained convex formulation of the audio declipping problem is introduced. The chosen structured thresholding operators are the so called \emph{windowed group-Lasso} and the \emph{persistent empirical Wiener}. The usage of these operators significantly improves the quality of the reconstruction, compared to simple soft-thresholding. The resulting algorithm is fast, simple to implement, and it outperforms the state of the art in terms of signal to noise ratio

    Sparsity and cosparsity for audio declipping: a flexible non-convex approach

    Get PDF
    This work investigates the empirical performance of the sparse synthesis versus sparse analysis regularization for the ill-posed inverse problem of audio declipping. We develop a versatile non-convex heuristics which can be readily used with both data models. Based on this algorithm, we report that, in most cases, the two models perform almost similarly in terms of signal enhancement. However, the analysis version is shown to be amenable for real time audio processing, when certain analysis operators are considered. Both versions outperform state-of-the-art methods in the field, especially for the severely saturated signals

    A Proper version of Synthesis-based Sparse Audio Declipper

    Full text link
    Methods based on sparse representation have found great use in the recovery of audio signals degraded by clipping. The state of the art in declipping has been achieved by the SPADE algorithm by Kiti\'c et. al. (LVA/ICA2015). Our recent study (LVA/ICA2018) has shown that although the original S-SPADE can be improved such that it converges significantly faster than the A-SPADE, the restoration quality is significantly worse. In the present paper, we propose a new version of S-SPADE. Experiments show that the novel version of S-SPADE outperforms its old version in terms of restoration quality, and that it is comparable with the A-SPADE while being even slightly faster than A-SPADE

    Introducing SPAIN (SParse Audio INpainter)

    Full text link
    A novel sparsity-based algorithm for audio inpainting is proposed. It is an adaptation of the SPADE algorithm by Kiti\'c et al., originally developed for audio declipping, to the task of audio inpainting. The new SPAIN (SParse Audio INpainter) comes in synthesis and analysis variants. Experiments show that both A-SPAIN and S-SPAIN outperform other sparsity-based inpainting algorithms. Moreover, A-SPAIN performs on a par with the state-of-the-art method based on linear prediction in terms of the SNR, and, for larger gaps, SPAIN is even slightly better in terms of the PEMO-Q psychoacoustic criterion

    Revisiting Synthesis Model of Sparse Audio Declipper

    Full text link
    The state of the art in audio declipping has currently been achieved by SPADE (SParse Audio DEclipper) algorithm by Kiti\'c et al. Until now, the synthesis/sparse variant, S-SPADE, has been considered significantly slower than its analysis/cosparse counterpart, A-SPADE. It turns out that the opposite is true: by exploiting a recent projection lemma, individual iterations of both algorithms can be made equally computationally expensive, while S-SPADE tends to require considerably fewer iterations to converge. In this paper, the two algorithms are compared across a range of parameters such as the window length, window overlap and redundancy of the transform. The experiments show that although S-SPADE typically converges faster, the average performance in terms of restoration quality is not superior to A-SPADE

    Audio declipping performance enhancement via crossfading

    Get PDF
    Some audio declipping methods produce waveforms that do not fully respect the actual process of clipping and allow a deviation on the reliable samples. This article reports what effect on perception it has if the output of such “inconsistent” methods is pushed towards “consistent” solutions by postprocessing. We first propose a simple sample replacement method, then we identify its main weaknesses and propose an improved variant. The experiments show that the vast majority of inconsistent declipping methods significantly benefit from the proposed approach in terms of objective perceptual metrics. In particular, we show that the SS PEW method based on social sparsity combined with the proposed method performs comparable to top methods from the consistent class, but at a computational cost of one order of magnitude lower

    Solving Audio Inverse Problems with a Diffusion Model

    Full text link
    This paper presents CQT-Diff, a data-driven generative audio model that can, once trained, be used for solving various different audio inverse problems in a problem-agnostic setting. CQT-Diff is a neural diffusion model with an architecture that is carefully constructed to exploit pitch-equivariant symmetries in music. This is achieved by preconditioning the model with an invertible Constant-Q Transform (CQT), whose logarithmically-spaced frequency axis represents pitch equivariance as translation equivariance. The proposed method is evaluated with objective and subjective metrics in three different and varied tasks: audio bandwidth extension, inpainting, and declipping. The results show that CQT-Diff outperforms the compared baselines and ablations in audio bandwidth extension and, without retraining, delivers competitive performance against modern baselines in audio inpainting and declipping. This work represents the first diffusion-based general framework for solving inverse problems in audio processing.Comment: Submitted to ICASSP 202
    corecore