15 research outputs found

    Proceedings of the second "international Traveling Workshop on Interactions between Sparse models and Technology" (iTWIST'14)

    Get PDF
    The implicit objective of the biennial "international - Traveling Workshop on Interactions between Sparse models and Technology" (iTWIST) is to foster collaboration between international scientific teams by disseminating ideas through both specific oral/poster presentations and free discussions. For its second edition, the iTWIST workshop took place in the medieval and picturesque town of Namur in Belgium, from Wednesday August 27th till Friday August 29th, 2014. The workshop was conveniently located in "The Arsenal" building within walking distance of both hotels and town center. iTWIST'14 has gathered about 70 international participants and has featured 9 invited talks, 10 oral presentations, and 14 posters on the following themes, all related to the theory, application and generalization of the "sparsity paradigm": Sparsity-driven data sensing and processing; Union of low dimensional subspaces; Beyond linear and convex inverse problem; Matrix/manifold/graph sensing/processing; Blind inverse problems and dictionary learning; Sparsity and computational neuroscience; Information theory, geometry and randomness; Complexity/accuracy tradeoffs in numerical methods; Sparsity? What's next?; Sparse machine learning and inference.Comment: 69 pages, 24 extended abstracts, iTWIST'14 website: http://sites.google.com/site/itwist1

    A new generalized projection and its application to acceleration of audio declipping

    Get PDF
    In convex optimization, it is often inevitable to work with projectors onto convex sets composed with a linear operator. Such a need arises from both the theory and applications, with signal processing being a prominent and broad field where convex optimization has been used recently. In this article, a novel projector is presented, which generalizes previous results in that it admits to work with a broader family of linear transforms when compared with the state of the art but, on the other hand, it is limited to box-type convex sets in the transformed domain. The new projector is described by an explicit formula, which makes it simple to implement and requires a low computational cost. The projector is interpreted within the framework of the so-called proximal splitting theory. The convenience of the new projector is demonstrated on an example from signal processing, where it was possible to speed up the convergence of a signal declipping algorithm by a factor of more than two

    Restoration of signals with limited instantaneous value for the multichannel audio signal

    Get PDF
    Tato diplomová práce se zabývá rekonstrukcí saturovaného vícekanálového audio signálu pomocí metod založených na řídké reprezentaci signálu. V první části práce je popsána teorie clippingu u audio signálů a teorie řídké reprezentace signálů. V této části je obsažena také krátká rešerše současných rekonstrukčních algoritmů. Následně jsou představeny dva rekonstrukční algoritmy, které byly v rámci práce naprogramovány v prostředí Matlab. První z nich je algoritmus SPADE, „state-of-the-art“ pro rekonstrukci monofonních signálů, a druhým je od něj odvozený algoritmus CASCADE, navržený pro vícekanálové signály. Ve třetí části práce jsou oba algoritmy otestovány a porovnány pomocí objektivních ukazatelů SDR a PEAQ a pomocí subjektivního poslechového testu MUSHRA.This master’s thesis deals with the restoration of clipped multichannel audio signals based on sparse representations. First, a general theory of clipping and theory of sparse representations of audio signals is described. A short overview of existing restoration methods is part of this thesis as well. Subsequently, two declipping algorithms are introduced and are also implemented in the Matlab environment as a part of the thesis. The first one, SPADE, is considered a state- of-the-art method for mono audio signals declipping and the second one, CASCADE, which is derived from SPADE, is designed for the restoration of multichannel signals. In the last part of the thesis, both algorithms are tested and the results are compared using the objective measures SDR and PEAQ, and also using the subjective listening test MUSHRA.

    Inpainting of Missing Audio Signal Samples

    Get PDF
    V oblasti zpracování signálů se v současné době čím dál více využívají tzv. řídké reprezentace signálů, tzn. že daný signál je možné vyjádřit přesně či velmi dobře aproximovat lineární kombinací velmi malého počtu vektorů ze zvoleného reprezentačního systému. Tato práce se zabývá využitím řídkých reprezentací pro rekonstrukci poškozených zvukových záznamů, ať už historických nebo nově vzniklých. Především historické zvukové nahrávky trpí zarušením jako praskání nebo šum. Krátkodobé poškození zvukových nahrávek bylo doposud řešeno interpolačními technikami, zejména pomocí autoregresního modelování. V nedávné době byl představen algoritmus s názvem Audio Inpainting, který řeší doplňování chybějících vzorků ve zvukovém signálu pomocí řídkých reprezentací. Zmíněný algoritmus využívá tzv. hladové algoritmy pro řešení optimalizačních úloh. Cílem této práce je porovnání dosavadních interpolačních metod s technikou Audio Inpaintingu. Navíc, k řešení optimalizačních úloh jsou využívány algoritmy založené na l1-relaxaci, a to jak ve formě analyzujícího, tak i syntetizujícího modelu. Především se jedná o proximální algoritmy. Tyto algoritmy pracují jak s jednotlivými koeficienty samostatně, tak s koeficienty v závislosti na jejich okolí, tzv. strukturovaná řídkost. Strukturovaná řídkost je dále využita taky pro odšumování zvukových nahrávek. Jednotlivé algoritmy jsou v praktické části zhodnoceny z hlediska nastavení parametrů pro optimální poměr rekonstrukce vs. výpočetní čas. Všechny algoritmy popsané v práci jsou na praktických příkladech porovnány pomocí objektivních metod odstupu signálu od šumu (SNR) a PEMO-Q. Na závěr je úspěšnost rekonstrukce poškozených zvukových signálů vyhodnocena.Recently, sparse representations of signals became very popular in the field of signal processing. Sparse representation mean that the signal is represented exactly or very well approximated by a linear combination of only a few vectors from the specific representation system. This thesis deals with the utilization of sparse representations of signals for the process of audio restoration, either historical or recent records. Primarily old audio recordings suffer from defects like crackles or noise. Until now, short gaps in audio signals were repaired by interpolation techniques, especially autoregressive modeling. Few years ago, an algorithm termed the Audio Inpainting was introduced. This algorithm solves the missing audio signal samples inpainting using sparse representations through the greedy algorithm for sparse approximation. This thesis aims to compare the state-of-the-art interpolation methods with the Audio Inpainting. Besides this, l1-relaxation methods are utilized for sparse approximation, while both analysis and synthesis models are incorporated. Algorithms used for the sparse approximation are called the proximal algorithms. These algorithms treat the coefficients either separately or with relations to their neighbourhood (structured sparsity). Further, structured sparsity is used for audio denoising. In the experimental part of the thesis, parameters of each algorithm are evaluated in terms of optimal restoration efficiency vs. processing time efficiency. All of the algorithms described in the thesis are compared using objective evaluation methods Signal-to-Noise ratio (SNR) and PEMO-Q. Finally, the overall conclusion and discussion on the restoration results is presented.

    Addressing Variability in Speech when Recognizing Emotion and Mood In-the-Wild

    Full text link
    Bipolar disorder is a chronic mental illness, affecting 4% of Americans, that is characterized by periodic mood changes ranging from severe depression to extreme compulsive highs. Both mania and depression profoundly impact the behavior of affected individuals, resulting in potentially devastating personal and social consequences. Bipolar disorder is managed clinically with regular interactions with care providers, who assess mood, energy levels, and the form and content of speech. Recent work has proposed smartphones for automatically monitoring mood using speech. Much of the early work in speech-centered mood detection has been done in the laboratory or clinic and is not reflective of the variability found in real-world conversations and conditions. Outside of these settings, automatic mood detection is hard, as the recordings include environmental noise, differences in recording devices, and variations in subject speaking patterns. Without addressing these issues, it is difficult to move towards a passive mobile health system. My research works to address this variability present in speech so that such a system can be created, allowing for interventions to mitigate the life-changing effects of mood transitions. However detecting mood directly from speech is difficult, as mood varies over the course of days or weeks, while speech fluctuates rapidly. To address this, my thesis explores how an intermediate step can be used to aid in this prediction. For example, one of the major symptoms of bipolar disorder is emotion dysregulation - changes in the way emotions are perceived and a lack of inhibition in their expression. My work has supported the relationship between automatically extracted emotion estimates and mood. Because of this, my thesis explores how to mitigate the variability found when detecting emotion from speech. The remainder of my thesis is focused on employing these emotion-based features, as well as features based on language content, to real-world applications. This dissertation is divided into the following parts: Part I: I address the direct classification of mood from speech. This is accomplished by addressing variability due to recording device using preprocessing and multi-task learning. I then show how both subject-specific and population-general information can be combined to significantly improve mood detection. Part II: I explore the automatic detection of emotion from speech and how to control for the other factors of variability present in the speech signal. I use progressive networks as a method to augment emotion with other paralinguistic data including gender and speaker, as well as other datasets. Additionally, I introduce a novel domain generalization method for cross-corpus detection. Part III: I demonstrate real-world applications of speech mood monitoring using everyday conversations. I show how the previously introduced generalized model can predict emotion from the speech of individuals with suicidal ideation, demonstrating its effectiveness across domains. Furthermore, I use these predictions to distinguish individuals with suicidal thoughts from healthy controls. Lastly, I introduce a novel framework for intervention detection in individuals with bipolar disorder. I then create a natural speech mood monitoring system based on features derived from measures of emotion and automatic speech recognition (ASR) transcripts and show effective intervention detection. I conclude this dissertation with the following future directions: (1) Extending my emotion generalization system to include multiple modalities and factors of variability; (2) Expanding natural speech mood monitoring by including more devices, exploring other data besides speech, and investigating mood rating causality.PHDComputer Science & EngineeringUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttps://deepblue.lib.umich.edu/bitstream/2027.42/153461/1/gideonjn_1.pd

    Designing Gabor windows using convex optimization

    Full text link
    Redundant Gabor frames admit an infinite number of dual frames, yet only the canonical dual Gabor system, constructed from the minimal l2-norm dual window, is widely used. This window function however, might lack desirable properties, e.g. good time-frequency concentration, small support or smoothness. We employ convex optimization methods to design dual windows satisfying the Wexler-Raz equations and optimizing various constraints. Numerical experiments suggest that alternate dual windows with considerably improved features can be found

    Automatic removal of music tracks from tv programmes

    Get PDF
    This work pertains to in the research area of sound source separation. It deals with the problem of automatically removing musical segments from TV programmes. The dissertation proposes the utilisation of a pre-existant music recording, easily obtainable from o cially published CDs related to the audiovisual piece, as a reference for the undesired signal. The method is able to automatically detect small segments of the speci c musictrack spread among the whole audio signal of the programme, even if they appear with time-variable gain, or after having su ered linear distortions, such as being processed by equalization lters, or non-linear distortions, such as dynamic range compression. The project developed a quick-search algorithm using audio ngerprint techniques and hash-token data types to lower the algorithm complexity. The work also proposes the utilisation of a Wiener ltering technique to estimate potential equalization lter coe cients and uses a template matching algorithm to estimate time-variable gains to properly scale the musical segments to the correct amplitude they appear in the mixture. The key components of the separation system are presented, and a detailed description of all the algorithms involved is reported. Simulations with arti cial and real TV programme soundtracks are analysed and considerations about new future works are made. Furthermore, given the unique nature of this project, it is possible to say the dissertation is pioneer in the subject, becoming an ideal source of reference for other researchers that want to work in the area.Este trabalho está inserido na área de pesquisa de separação de fontes sonoras. Ele trata do problema de remover automaticamente segmentos de música de programas de TV. A tese propõe a utilização de uma gravação musical pré-existente, facilmente obtida em CDs oficialmente publicados relacionados à obra audiovisual, como referência para o sinal não desejado. O método é capaz de detectar automaticamente pequenos segmentos de uma trilha musical específica espalhados pelo sinal de áudio do programa, mesmo que eles apareçam com um ganho variante no tempo, ou tenham sofrido distorções lineares, como processamento por filtros equalizadores, ou distorções não lineares, como compressão de sua faixa dinâmica. O projeto desenvolveu um algoritmo de busca rápida usando técnicas de impressão digital de áudio e dados do tipo hash-token para diminuir a complexidade. O trabalho também propõe a utilização da técnica de filtragem de Wiener para estimar os coe cientes de um potencial filtro de equalização, e usa um algoritmo de template matching para estimar ganhos variantes no tempo para escalar corretamente os excertos musicais até a amplitude correta com que eles aparecem na mistura. Os componentes-chaves para o sistema de separação são apresentados, e uma descrição detalhada de todos os algoritmos envolvidos é reportada. Simulações com trilhas sonoras artificiais e de programas de TV reais são analisadas e considerações sobre novos trabalhos futuros são feitas. Além disso, dada a natureza única do projeto, é possível dizer que a dissertação é pioneira no assunto, tornando-se uma fonte de referência para outros pesquisadores que queiram trabalhar na área
    corecore