Search CORE

15 research outputs found

Sparsity and cosparsity for audio declipping: a flexible non-convex approach

Author: A Adler
A Janssen
B Defraene
DP Bertsekas
M Elad
M Goto
M Kahrs
M Kowalski
MD Plumbley
S Boyd
S Foucart
S Nam
SK Naik
T Blumensath
Y Tachioka
YC Eldar
Publication venue
Publication date: 09/06/2015
Field of study

This work investigates the empirical performance of the sparse synthesis versus sparse analysis regularization for the ill-posed inverse problem of audio declipping. We develop a versatile non-convex heuristics which can be readily used with both data models. Based on this algorithm, we report that, in most cases, the two models perform almost similarly in terms of signal enhancement. However, the analysis version is shown to be amenable for real time audio processing, when certain analysis operators are considered. Both versions outperform state-of-the-art methods in the field, especially for the severely saturated signals

arXiv.org e-Print Archive

HAL-CentraleSupelec

Crossref

INRIA a CCSD electronic archive server

HAL-Rennes 1

Audio Inpainting

Author: Adler A
Elad M
Emiya V
Gribonval R
Jafari MG
Plumbley MD
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/03/2012
Field of study

(c) 2012 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other users, including reprinting/ republishing this material for advertising or promotional purposes, creating new collective works for resale or redistribution to servers or lists, or reuse of any copyrighted components of this work in other works. Published version: IEEE Transactions on Audio, Speech and Language Processing 20(3): 922-932, Mar 2012. DOI: 10.1090/TASL.2011.2168211

HAL-CentraleSupelec

CiteSeerX

INRIA a CCSD electronic archive server

Queen Mary Research Online

Surrey Research Insight

Hal-Diderot

HAL-Rennes 1

Revisiting Synthesis Model of Sparse Audio Declipper

Author: A Dahimene
AJEM Janssen
AM Bruckstein
B Defraene
CT Tan
D Bertsekas
DL Donoho
I Bayram
I Bayram
K Gröchenig
O Christensen
P Combettes
P Duhamel
S Kitić
S Nam
SJ Godsill
SP Boyd
Y Tachioka
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 11/07/2018
Field of study

The state of the art in audio declipping has currently been achieved by SPADE (SParse Audio DEclipper) algorithm by Kiti\'c et al. Until now, the synthesis/sparse variant, S-SPADE, has been considered significantly slower than its analysis/cosparse counterpart, A-SPADE. It turns out that the opposite is true: by exploiting a recent projection lemma, individual iterations of both algorithms can be made equally computationally expensive, while S-SPADE tends to require considerably fewer iterations to converge. In this paper, the two algorithms are compared across a range of parameters such as the window length, window overlap and redundancy of the transform. The experiments show that although S-SPADE typically converges faster, the average performance in terms of restoration quality is not superior to A-SPADE

arXiv.org e-Print Archive

Crossref

Proceedings of the second "international Traveling Workshop on Interactions between Sparse models and Technology" (iTWIST'14)

Author: Absil P. -A.
Anthoine S.
Bertin N.
Bilen C.
Boumal N.
Boursier Y.
Bundervoet S.
Cambareri V.
Chabiron O.
Chainais P.
Cornelis B.
Dankova M.
Daubechies I.
Daudet L.
Davies M.
De Mol C.
De Vleeschouwer C.
Degraux K.
Determe J. -F.
Dobigeon N.
Dooms A.
Drémeau A.
Dunson D.
Duval V.
Fadili J.
Fawzi A.
Frossard P.
Geelen B.
Gigan S.
Gillis N.
Golbabaee M.
Gribonval R.
Heas P.
Herzet C.
Horlin F.
Jacques L.
Kitic S.
Lafruit G.
Liang J.
Liutkus A.
Loris I.
Louveaux J.
Maggioni M.
Magoarou L. Le
Malgouyres F.
Martina D.
Minsker S.
Mishra B.
Mory C.
Ngole F.
Peyré G.
Pizurica A.
Rajmic P.
Richard C.
Schelkens P.
Schretter C.
Sepulchre R.
Setti G.
Soussen C.
Starck J. -L.
Strawn N.
Sudhakar P.
Tourneret J. -Y.
Vaiter S.
Vandergheynst P.
Vavasis S. A.
Vukobratovic D.
Publication venue
Publication date: 01/10/2014
Field of study

The implicit objective of the biennial "international - Traveling Workshop on Interactions between Sparse models and Technology" (iTWIST) is to foster collaboration between international scientific teams by disseminating ideas through both specific oral/poster presentations and free discussions. For its second edition, the iTWIST workshop took place in the medieval and picturesque town of Namur in Belgium, from Wednesday August 27th till Friday August 29th, 2014. The workshop was conveniently located in "The Arsenal" building within walking distance of both hotels and town center. iTWIST'14 has gathered about 70 international participants and has featured 9 invited talks, 10 oral presentations, and 14 posters on the following themes, all related to the theory, application and generalization of the "sparsity paradigm": Sparsity-driven data sensing and processing; Union of low dimensional subspaces; Beyond linear and convex inverse problem; Matrix/manifold/graph sensing/processing; Blind inverse problems and dictionary learning; Sparsity and computational neuroscience; Information theory, geometry and randomness; Complexity/accuracy tradeoffs in numerical methods; Sparsity? What's next?; Sparse machine learning and inference.Comment: 69 pages, 24 extended abstracts, iTWIST'14 website: http://sites.google.com/site/itwist1

arXiv.org e-Print Archive

Edinburgh Research Explorer

Restoration of signals with limited instantaneous value for the multichannel audio signal

Author: Hájek Vojtěch
Publication venue: Vysoké učení technické v Brně. Fakulta elektrotechniky a komunikačních technologií
Publication date: 01/01/2019
Field of study

Tato diplomová práce se zabývá rekonstrukcí saturovaného vícekanálového audio signálu pomocí metod založených na řídké reprezentaci signálu. V první části práce je popsána teorie clippingu u audio signálů a teorie řídké reprezentace signálů. V této části je obsažena také krátká rešerše současných rekonstrukčních algoritmů. Následně jsou představeny dva rekonstrukční algoritmy, které byly v rámci práce naprogramovány v prostředí Matlab. První z nich je algoritmus SPADE, „state-of-the-art“ pro rekonstrukci monofonních signálů, a druhým je od něj odvozený algoritmus CASCADE, navržený pro vícekanálové signály. Ve třetí části práce jsou oba algoritmy otestovány a porovnány pomocí objektivních ukazatelů SDR a PEAQ a pomocí subjektivního poslechového testu MUSHRA.This master’s thesis deals with the restoration of clipped multichannel audio signals based on sparse representations. First, a general theory of clipping and theory of sparse representations of audio signals is described. A short overview of existing restoration methods is part of this thesis as well. Subsequently, two declipping algorithms are introduced and are also implemented in the Matlab environment as a part of the thesis. The first one, SPADE, is considered a state- of-the-art method for mono audio signals declipping and the second one, CASCADE, which is derived from SPADE, is designed for the restoration of multichannel signals. In the last part of the thesis, both algorithms are tested and the results are compared using the objective measures SDR and PEAQ, and also using the subjective listening test MUSHRA.

Digital library of Brno University of Technology

National Repository of Grey Literature

Designing Gabor windows using convex optimization

Author: Balazs Peter
Holighaus Nicki
Perraudin Nathanaël
Søndergaard Peter L.
Publication venue
Publication date: 11/04/2018
Field of study

Redundant Gabor frames admit an infinite number of dual frames, yet only the canonical dual Gabor system, constructed from the minimal l2-norm dual window, is widely used. This window function however, might lack desirable properties, e.g. good time-frequency concentration, small support or smoothness. We employ convex optimization methods to design dual windows satisfying the Wexler-Raz equations and optimizing various constraints. Numerical experiments suggest that alternate dual windows with considerably improved features can be found

arXiv.org e-Print Archive

Infoscience - École polytechnique fédérale de Lausanne

Addressing Variability in Speech when Recognizing Emotion and Mood In-the-Wild

Author: Gideon John
Publication venue
Publication date: 01/01/2019
Field of study

Bipolar disorder is a chronic mental illness, affecting 4% of Americans, that is characterized by periodic mood changes ranging from severe depression to extreme compulsive highs. Both mania and depression profoundly impact the behavior of affected individuals, resulting in potentially devastating personal and social consequences. Bipolar disorder is managed clinically with regular interactions with care providers, who assess mood, energy levels, and the form and content of speech. Recent work has proposed smartphones for automatically monitoring mood using speech. Much of the early work in speech-centered mood detection has been done in the laboratory or clinic and is not reflective of the variability found in real-world conversations and conditions. Outside of these settings, automatic mood detection is hard, as the recordings include environmental noise, differences in recording devices, and variations in subject speaking patterns. Without addressing these issues, it is difficult to move towards a passive mobile health system. My research works to address this variability present in speech so that such a system can be created, allowing for interventions to mitigate the life-changing effects of mood transitions. However detecting mood directly from speech is difficult, as mood varies over the course of days or weeks, while speech fluctuates rapidly. To address this, my thesis explores how an intermediate step can be used to aid in this prediction. For example, one of the major symptoms of bipolar disorder is emotion dysregulation - changes in the way emotions are perceived and a lack of inhibition in their expression. My work has supported the relationship between automatically extracted emotion estimates and mood. Because of this, my thesis explores how to mitigate the variability found when detecting emotion from speech. The remainder of my thesis is focused on employing these emotion-based features, as well as features based on language content, to real-world applications. This dissertation is divided into the following parts: Part I: I address the direct classification of mood from speech. This is accomplished by addressing variability due to recording device using preprocessing and multi-task learning. I then show how both subject-specific and population-general information can be combined to significantly improve mood detection. Part II: I explore the automatic detection of emotion from speech and how to control for the other factors of variability present in the speech signal. I use progressive networks as a method to augment emotion with other paralinguistic data including gender and speaker, as well as other datasets. Additionally, I introduce a novel domain generalization method for cross-corpus detection. Part III: I demonstrate real-world applications of speech mood monitoring using everyday conversations. I show how the previously introduced generalized model can predict emotion from the speech of individuals with suicidal ideation, demonstrating its effectiveness across domains. Furthermore, I use these predictions to distinguish individuals with suicidal thoughts from healthy controls. Lastly, I introduce a novel framework for intervention detection in individuals with bipolar disorder. I then create a natural speech mood monitoring system based on features derived from measures of emotion and automatic speech recognition (ASR) transcripts and show effective intervention detection. I conclude this dissertation with the following future directions: (1) Extending my emotion generalization system to include multiple modalities and factors of variability; (2) Expanding natural speech mood monitoring by including more devices, exploring other data besides speech, and investigating mood rating causality.PHDComputer Science & EngineeringUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttps://deepblue.lib.umich.edu/bitstream/2027.42/153461/1/gideonjn_1.pd

Deep Blue Documents at the University of Michigan

Reconstruction de phase et de signaux audio avec des fonctions de coût non-quadratiques

Author: Vial Pierre-Hugo
Publication venue
Publication date: 01/01/2022
Field of study

Audio signal reconstruction consists in recovering sound signals from incomplete or degraded representations. This problem can be cast as an inverse problem. Such problems are frequently tackled with the help of optimization or machine learning strategies. In this thesis, we propose to change the cost function in inverse problems related to audio signal reconstruction. We mainly address the phase retrieval problem, which is common when manipulating audio spectrograms. A first line of work tackles the optimization of non-quadratic cost functions for phase retrieval. We study this problem in two contexts: audio signal reconstruction from a single spectrogram and source separation. We introduce a novel formulation of the problem with Bregman divergences, as well as algorithms for its resolution. A second line of work proposes to learn the cost function from a given dataset. This is done under the framework of unfolded neural networks, which are derived from iterative algorithms. We introduce a neural network based on the unfolding of the Alternating Direction Method of Multipliers, that includes learnable activation functions. We expose the relation between the learning of its parameters and the learning of the cost function for phase retrieval. We conduct numerical experiments for each of the proposed methods to evaluate their performance and their potential with audio signal reconstruction

Open Archive Toulouse Archive Ouverte