Search CORE

3,661 research outputs found

Objective Assessment of Machine Learning Algorithms for Speech Enhancement in Hearing Aids

Author: Parameswaran Krishnan
Publication venue: Scholarship@Western
Publication date: 10/12/2018
Field of study

Speech enhancement in assistive hearing devices has been an area of research for many decades. Noise reduction is particularly challenging because of the wide variety of noise sources and the non-stationarity of speech and noise. Digital signal processing (DSP) algorithms deployed in modern hearing aids for noise reduction rely on certain assumptions on the statistical properties of undesired signals. This could be disadvantageous in accurate estimation of different noise types, which subsequently leads to suboptimal noise reduction. In this research, a relatively unexplored technique based on deep learning, i.e. Recurrent Neural Network (RNN), is used to perform noise reduction and dereverberation for assisting hearing-impaired listeners. For noise reduction, the performance of the deep learning model was evaluated objectively and compared with that of open Master Hearing Aid (openMHA), a conventional signal processing based framework, and a Deep Neural Network (DNN) based model. It was found that the RNN model can suppress noise and improve speech understanding better than the conventional hearing aid noise reduction algorithm and the DNN model. The same RNN model was shown to reduce reverberation components with proper training. A real-time implementation of the deep learning model is also discussed

Scholarship@Western

REPET for Background/Foreground Separation in Audio

Author: Liutkus Antoine
Pardo Bryan
Rafii Zafar
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2014
Field of study

International audienceRepetition is a fundamental element in generating and perceiving structure. In audio, mixtures are often composed of structures where a repeating background signal is superimposed with a varying foreground signal (e.g., a singer overlaying varying vocals on a repeating accompaniment or a varying speech signal mixed up with a repeating background noise). On this basis, we present the REpeating Pattern Extraction Technique (REPET), a simple approach for separating the repeating background from the non-repeating foreground in an audio mixture. The basic idea is to find the repeating elements in the mixture, derive the underlying repeating models, and extract the repeating background by comparing the models to the mixture. Unlike other separation approaches, REPET does not depend on special parametrizations, does not rely on complex frameworks, and does not require external information. Because it is only based on repetition, it has the advantage of being simple, fast, blind, and therefore completely and easily automatable

INRIA a CCSD electronic archive server

Interaction between Attention and Bottom-Up Saliency Mediates the Representation of Foreground and Background in an Auditory Scene

Author: A Bidet-Caulet
A Brechmann
A Delorme
A Gazzaley
A Gutschalk
A Martinez
A Zani
AS Bregman
AS Bregman
B Efron
B Ross
C Alain
C Alain
C Liegeois-Chauvel
C Micheyl
C Micheyl
CD Gilbert
D Coch
DB Polley
DH Hubel
DL Woods
DL Woods
E Craft
E Niebur
EC Cherry
EI Knudsen
ES Sussman
FT Qiu
G Kidd Jr
GM Ghose
H Okamoto
H Tiitinen
I Nelken
J Kauramaki
J Martinez-Trujillo
J Xiang
JB Fritz
JB Fritz
JC Hansen
JH Maunsell
JH Reynolds
JH Reynolds
Jonathan Z. Simon
JP Gottlieb
JS Bendat
JS Snyder
JT Devlin
Juanjuan Xiang
JZ Simon
K Alho
LM Miller
M Elhilali
MG Woldorff
MI Posner
Mounya Elhilali
MW Donald
N Kowalski
N Muller
NE Ahmar
NI Fisher
NM Weinberger
P Neri
PR Roelfsema
PT Michie
R Desimone
R Hari
R Naatanen
R Naatanen
R Naatanen
R Srinivasan
RC Oldfield
RJ Zatorre
RP Carlyon
RP Carlyon
S Anstis
S Atiani
S Sheft
SA Hillyard
Shihab A. Shamma
SM Kay
SR Arnott
TD Griffiths
Timothy D. Griffiths
TJ Buschman
TW Picton
TW Picton
VA Lamme
W Li
YB Saalmann
YI Fishman
YI Fishman
Publication venue: Public Library of Science
Publication date: 16/06/2009
Field of study

Bottom-up (stimulus-driven) and top-down (attentional) processes interact when a complex acoustic scene is parsed. Both modulate the neural representation of the target in a manner strongly correlated with behavioral performance

Public Library of Science (PLOS)

Crossref

PubMed Central

Applying source separation to music

Author: Arora
Ballou
Bertin
Bregman
Brown
Canadas-Quesada
Carabias-Orti
Davis
Davy
Duan
Duan
Duan
Duan
Durrieu
Durrieu
Ewert
Hennequin
Hu
Klapuri
Kokkinis
Liutkus
Magoarou
Rafii
Rafii
Rodriguez-Serrano
Rodriguez-Serrano
Souviraa-Labastie
Su
Tachibana
Tolonen
Virtanen
Wang
Publication venue: 'Wiley'
Publication date: 03/08/2018
Field of study

International audienceSeparation of existing audio into remixable elements is very useful to repurpose music audio. Applications include upmixing video soundtracks to surround sound (e.g. home theater 5.1 systems), facilitating music transcriptions, allowing better mashups and remixes for disk jockeys, and rebalancing sound levels on multiple instruments or voices recorded simultaneously to a single track. In this chapter, we provide an overview of the algorithms and approaches designed to address the challenges and opportunities in music. Where applicable, we also introduce commonalities and links to source separation for video soundtracks, since many musical scenarios involve video soundtracks (e.g. YouTube recordings of live concerts, movie sound tracks). While space prohibits describing every method in detail, we include detail on representative music‐specific algorithms and approaches not covered in other chapters. The intent is to give the reader a high‐level understanding of the workings of key exemplars of the source separation approaches applied in this domain

Crossref

INRIA a CCSD electronic archive server

HAL Descartes

The listening talker: A review of human and algorithmic context-induced modifications of speech

Author: Adriaans
Albin
Alcántara
Andruski
ANSI S3.5-1997
Arai
Assmann
Assmann
Aubanel
Aubanel
Aubanel
Babel
Babel
Bailly
Baran
Barker
Batliner
Beautemps
Beckford Wassink
Beckman
Beckman
Bele
Bell
Benoit
Best
Biersack
Bird
Blamey
Boike
Bond
Bond
Bond
Boril
Bradlow
Bradlow
Bradlow
Bradlow
Branigan
Bregman
Bronkhorst
Brungart
Brungart
Brunskog
Burnham
Burnham
Burnham
Burnham
Castellanos
Chen
Cheskin
Cheyne
Chládková
Chung
Church
Cole
Cooke
Cooke
Cooke
Cooke
Cooke
Cooke
Cooper
Cooper
Cox
Cox
Cristia
Cristià
Cutler
Darwin
Dau
Davis
Davis
Dejonckere
Delvaux
Dodane
Dreher
Dudley
Dunst
Egan
Englund
Eriksson
Erting
Estival
Falk
Farris
Ferguson
Ferguson
Fernald
Fernald
Fernald
Fernald
Fernald
Field
Fisher
Fisher
Fitzpatrick
Floccia
Fogerty
Fogerty
Fowler
Fowler
Freed
Fux
Fux
Fux
Gagne
Gagne
Gagne
Galati
Garnier
Garnier
Garnier
Garnier
Garnier
Garnier
Garnier
Garrod
Giles
Goldwater
Golinkoff
Golinkoff
Gordon-Salant
Granlund
Granlund
Green
Grieser
Hawley
Hazan
Hazan
Hazan
Hazan
Healey
Helfer
Helfer
Hornsby
Horwitz
Howell
Imaizumi
Imaizumi
Ishizuka
Janarthanam
Johnson
Jun
Jung
Junqua
Junqua
Junqua
Kadiri
Kang
Kaplan
Kappes
Kawahara
Kewley-Port
Kim
Kim
Kirchhoff
Kitamura
Kitamura
Kondaurova
Kondaurova
Korn
Krause
Krause
Krause
Krause
Krause
Kretsinger
Kryter
Kuhl
Kusumoto
Lam
Lane
Laures
Laures
Lee
Lienard
Lindblom
Lindblom
Little
Liu
Liu
Liu
Lombard
Long
Long
Lu
Lu
Lu
Malsheen
Maniwa
Marin
Martin Cooke
Masataka
Matthies
Mattys
Mattys
Mattys
Maye
Maye
Mayo
Maëva Garnier
Metz
Michael
Miller
Mokbel
Monsen
Montgomery
Moon
Moon
Moore
Moore
Moulines
Naoi
Natale
Nejime
Newport
Niederjohn
Niwano
Niwano
Ostroff
Oviatt
Owren
Papoušek
Papoušek
Papoušek
Pardo
Patel
Patel
Payne
Payton
Pegg
Pelegrín-García
Perkell
Petkov
Peutz
Phillips
Picheny
Picheny
Picheny
Pickering
Pickett
Pickett
Pisoni
Pittman
Pollack
Pucher
Pye
Rasetshwane
Ratner
Ratner
Ratner
Rieser
Rogers
Rostolland
Rostolland
Ryan
Räsänen
Sachs
Sankowska
Sauert
Scarborough
Schmitt
Schulman
Schum
Shimron
Simon King
Sims
Singh
Skowronski
Smiljanic
Smith
Snow
Song
Stanton
Stern
Stilp
Stylianou
Summers
Summers
Sundberg
Sundberg
Sundberg
Suni
Synnestvedt
Taal
Taal
Tang
Tang
Tang
Tartter
Ternström
Thanavisuth
Titze
Torick
Trainor
Trainor
Traunmuller
Uchanski
Uchanski
Uther
Valentini-Botinhao
Valentini-Botinhao
Valian
Valian
van de Weijer
van Rooij
Vatikiotis-Bateson
Villegas
Vincent Aubanel
Vitevitch
Wang
Warner
Warren
Watson
Webster
Welby
Welby
Werker
World Health Organisation
Xu
Xu
Yamagishi
Yang
Yoo
Zajdó
Zampini
Zangl
Zhao
Zipf
Zorilă
Publication venue: 'Elsevier BV'
Publication date: 01/01/2014
Field of study

International audienceSpeech output technology is finding widespread application, including in scenarios where intelligibility might be compromised - at least for some listeners - by adverse conditions. Unlike most current algorithms, talkers continually adapt their speech patterns as a response to the immediate context of spoken communication, where the type of interlocutor and the environment are the dominant situational factors influencing speech production. Observations of talker behaviour can motivate the design of more robust speech output algorithms. Starting with a listener-oriented categorisation of possible goals for speech modification, this review article summarises the extensive set of behavioural findings related to human speech modification, identifies which factors appear to be beneficial, and goes on to examine previous computational attempts to improve intelligibility in noise. The review concludes by tabulating 46 speech modifications, many of which have yet to be perceptually or algorithmically evaluated. Consequently, the review provides a roadmap for future work in improving the robustness of speech output

Crossref

Hal - Université Grenoble Alpes

Edinburgh Research Explorer

Western Sydney ResearchDirect

Brain Responses Track Patterns in Sound

Author: Southwell Rosemary
Publication venue: UCL (University College London)
Publication date: 28/08/2019
Field of study

This thesis uses specifically structured sound sequences, with electroencephalography (EEG) recording and behavioural tasks, to understand how the brain forms and updates a model of the auditory world. Experimental chapters 3-7 address different effects arising from statistical predictability, stimulus repetition and surprise. Stimuli comprised tone sequences, with frequencies varying in regular or random patterns. In Chapter 3, EEG data demonstrate fast recognition of predictable patterns, shown by an increase in responses to regular relative to random sequences. Behavioural experiments investigate attentional capture by stimulus structure, suggesting that regular sequences are easier to ignore. Responses to repetitive stimulation generally exhibit suppression, thought to form a building block of regularity learning. However, the patterns used in this thesis show the opposite effect, where predictable patterns show a strongly enhanced brain response, compared to frequency-matched random sequences. Chapter 4 presents a study which reconciles auditory sequence predictability and repetition in a single paradigm. Results indicate a system for automatic predictability monitoring which is distinct from, but concurrent with, repetition suppression. The brain’s internal model can be investigated via the response to rule violations. Chapters 5 and 6 present behavioural and EEG experiments where violations are inserted in the sequences. Outlier tones within regular sequences evoked a larger response than matched outliers in random sequences. However, this effect was not present when the violation comprised a silent gap. Chapter 7 concerns the ability of the brain to update an existing model. Regular patterns transitioned to a different rule, keeping the frequency content constant. Responses show a period of adjustment to the rule change, followed by a return to tracking the predictability of the sequence. These findings are consistent with the notion that the brain continually maintains a detailed representation of ongoing sensory input and that this representation shapes the processing of incoming information

UCL Discovery

Automatic removal of music tracks from tv programmes

Author: Lordelo Carlos Pedro Vianna
Publication venue: 'Programa de Pos-graduacao em Ciencias Contabeis da UFRJ'
Publication date: 01/09/2018
Field of study

This work pertains to in the research area of sound source separation. It deals with the problem of automatically removing musical segments from TV programmes. The dissertation proposes the utilisation of a pre-existant music recording, easily obtainable from o cially published CDs related to the audiovisual piece, as a reference for the undesired signal. The method is able to automatically detect small segments of the speci c musictrack spread among the whole audio signal of the programme, even if they appear with time-variable gain, or after having su ered linear distortions, such as being processed by equalization lters, or non-linear distortions, such as dynamic range compression. The project developed a quick-search algorithm using audio ngerprint techniques and hash-token data types to lower the algorithm complexity. The work also proposes the utilisation of a Wiener ltering technique to estimate potential equalization lter coe cients and uses a template matching algorithm to estimate time-variable gains to properly scale the musical segments to the correct amplitude they appear in the mixture. The key components of the separation system are presented, and a detailed description of all the algorithms involved is reported. Simulations with arti cial and real TV programme soundtracks are analysed and considerations about new future works are made. Furthermore, given the unique nature of this project, it is possible to say the dissertation is pioneer in the subject, becoming an ideal source of reference for other researchers that want to work in the area.Este trabalho está inserido na área de pesquisa de separação de fontes sonoras. Ele trata do problema de remover automaticamente segmentos de música de programas de TV. A tese propõe a utilização de uma gravação musical pré-existente, facilmente obtida em CDs oficialmente publicados relacionados à obra audiovisual, como referência para o sinal não desejado. O método é capaz de detectar automaticamente pequenos segmentos de uma trilha musical específica espalhados pelo sinal de áudio do programa, mesmo que eles apareçam com um ganho variante no tempo, ou tenham sofrido distorções lineares, como processamento por filtros equalizadores, ou distorções não lineares, como compressão de sua faixa dinâmica. O projeto desenvolveu um algoritmo de busca rápida usando técnicas de impressão digital de áudio e dados do tipo hash-token para diminuir a complexidade. O trabalho também propõe a utilização da técnica de filtragem de Wiener para estimar os coe cientes de um potencial filtro de equalização, e usa um algoritmo de template matching para estimar ganhos variantes no tempo para escalar corretamente os excertos musicais até a amplitude correta com que eles aparecem na mistura. Os componentes-chaves para o sistema de separação são apresentados, e uma descrição detalhada de todos os algoritmos envolvidos é reportada. Simulações com trilhas sonoras artificiais e de programas de TV reais são analisadas e considerações sobre novos trabalhos futuros são feitas. Além disso, dada a natureza única do projeto, é possível dizer que a dissertação é pioneira no assunto, tornando-se uma fonte de referência para outros pesquisadores que queiram trabalhar na área

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Pantheon

Audio source separation for music in low-latency and high-latency scenarios

Author: Marxer Piñón Ricard
Publication venue: 'Universitat Pompeu Fabra'
Publication date: 01/01/2013
Field of study

Aquesta tesi proposa mètodes per tractar les limitacions de les tècniques existents de separació de fonts musicals en condicions de baixa i alta latència. En primer lloc, ens centrem en els mètodes amb un baix cost computacional i baixa latència. Proposem l'ús de la regularització de Tikhonov com a mètode de descomposició de l'espectre en el context de baixa latència. El comparem amb les tècniques existents en tasques d'estimació i seguiment dels tons, que són passos crucials en molts mètodes de separació. A continuació utilitzem i avaluem el mètode de descomposició de l'espectre en tasques de separació de veu cantada, baix i percussió. En segon lloc, proposem diversos mètodes d'alta latència que milloren la separació de la veu cantada, gràcies al modelatge de components específics, com la respiració i les consonants. Finalment, explorem l'ús de correlacions temporals i anotacions manuals per millorar la separació dels instruments de percussió i dels senyals musicals polifònics complexes.Esta tesis propone métodos para tratar las limitaciones de las técnicas existentes de separación de fuentes musicales en condiciones de baja y alta latencia. En primer lugar, nos centramos en los métodos con un bajo coste computacional y baja latencia. Proponemos el uso de la regularización de Tikhonov como método de descomposición del espectro en el contexto de baja latencia. Lo comparamos con las técnicas existentes en tareas de estimación y seguimiento de los tonos, que son pasos cruciales en muchos métodos de separación. A continuación utilizamos y evaluamos el método de descomposición del espectro en tareas de separación de voz cantada, bajo y percusión. En segundo lugar, proponemos varios métodos de alta latencia que mejoran la separación de la voz cantada, gracias al modelado de componentes que a menudo no se toman en cuenta, como la respiración y las consonantes. Finalmente, exploramos el uso de correlaciones temporales y anotaciones manuales para mejorar la separación de los instrumentos de percusión y señales musicales polifónicas complejas.This thesis proposes specific methods to address the limitations of current music source separation methods in low-latency and high-latency scenarios. First, we focus on methods with low computational cost and low latency. We propose the use of Tikhonov regularization as a method for spectrum decomposition in the low-latency context. We compare it to existing techniques in pitch estimation and tracking tasks, crucial steps in many separation methods. We then use the proposed spectrum decomposition method in low-latency separation tasks targeting singing voice, bass and drums. Second, we propose several high-latency methods that improve the separation of singing voice by modeling components that are often not accounted for, such as breathiness and consonants. Finally, we explore using temporal correlations and human annotations to enhance the separation of drums and complex polyphonic music signals

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Tesis Doctorals en Xarxa

Hearing VS. Listening: Attention Changes the Neural Representations of Auditory Percepts

Author: xiang juanjuan
Publication venue
Publication date: 01/05/2008
Field of study

Making sense of acoustic environments is a challenging task. At any moment, the signals from distinct auditory sources arrive in the ear simultaneously, forming an acoustic mixture. The brain must represent distinct auditory objects in this complex scene and prioritize processing of relevant stimuli while maintaining the capability to react quickly to unexpected events. The present studies explore neural representations of temporal modulations and the effects of attention on these representations. Temporal modulation plays a significant role in speech perception and auditory scene analysis. To uncover how temporal modulations are processed and represented is potentially of great importance for our general understanding of the auditory system. Neural representations of compound modulations were investigated by magnetoencephalography (MEG). Interaction components are generated by near rather than distant modulation rhythms, suggesting band-limited modulation filter banks operating in the central stage of the auditory system. Furthermore, the slowest detectable neural oscillation in the auditory cortex corresponds to the perceived oscillation of the auditory percept. Interactions between stimulus-evoked and goal-related neural responses were investigated in simultaneous behavioral-neurophysiological studies, in which we manipulate subjects' attention to different components of an auditory scene. Our experimental results reveal that attention to the target correlates with a sustained increase in the neural target representation, beyond well-known transient effects. The enhancement of power and phase coherence presumably reflects increased local and global synchronizations in the brain. Furthermore, the target's perceptual detectability improves over time (several seconds), correlating strongly with the target representation's neural buildup. The change in cortical representations can be reversed in a short time-scale (several minutes) by various behavioral goals. These aforementioned results demonstrate that the neural representation of the percept is encoded using the feature-driven mechanisms of sensory cortex, but shaped in a sustained manner via attention-driven projections from higher-level areas. This adaptive neural representations occur on multiple time scales (seconds vs. minutes) and multiple spatial scales (local vs. global synchronization). Such multiple resolutions of adaptation may underlie general mechanisms of scene organization in any sensory modality and may contribute to our highly adaptive behaviors

Digital Repository at the University of Maryland