Search CORE

8 research outputs found

Source localization for multiple speech sources using low complexity non-parametric source separation and clustering

Author: Allen
Araki
Araki
Araki
Araki
B. Sällberg
Balan
Brandstein
Cermak
Di Claudio
Forsythe
Knapp
M. Swartling
N. Grbić
Nishiura
Rickard
Sawada
Swartling
Swartling
Vaidyanathan
Yılmaz
Yiu
Zhang
Publication venue: 'Elsevier BV'
Publication date
Field of study

Independent Component Analysis and Time-Frequency Masking for Speech Recognition in Multitalker Conditions

Author: Dorothea Kolossa
Eugen Hoffmann
Ramon Fernandez Astudillo
Reinhold Orglmeister
Publication venue: SpringerOpen
Publication date: 01/01/2010
Field of study

When a number of speakers are simultaneously active, for example in meetings or noisy public places, the sources of interest need to be separated from interfering speakers and from each other in order to be robustly recognized. Independent component analysis (ICA) has proven a valuable tool for this purpose. However, ICA outputs can still contain strong residual components of the interfering speakers whenever noise or reverberation is high. In such cases, nonlinear postprocessing can be applied to the ICA outputs, for the purpose of reducing remaining interferences. In order to improve robustness to the artefacts and loss of information caused by this process, recognition can be greatly enhanced by considering the processed speech feature vector as a random variable with time-varying uncertainty, rather than as deterministic. The aim of this paper is to show the potential to improve recognition of multiple overlapping speech signals through nonlinear postprocessing together with uncertainty-based decoding techniques

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

Informed algorithms for sound source separation in enclosed reverberant environments

Author: Muhammad Salman Khan (7202543)
Publication venue
Publication date: 01/01/2013
Field of study

While humans can separate a sound of interest amidst a cacophony of contending sounds in an echoic environment, machine-based methods lag behind in solving this task. This thesis thus aims at improving performance of audio separation algorithms when they are informed i.e. have access to source location information. These locations are assumed to be known a priori in this work, for example by video processing. Initially, a multi-microphone array based method combined with binary time-frequency masking is proposed. A robust least squares frequency invariant data independent beamformer designed with the location information is utilized to estimate the sources. To further enhance the estimated sources, binary time-frequency masking based post-processing is used but cepstral domain smoothing is required to mitigate musical noise. To tackle the under-determined case and further improve separation performance at higher reverberation times, a two-microphone based method which is inspired by human auditory processing and generates soft time-frequency masks is described. In this approach interaural level difference, interaural phase difference and mixing vectors are probabilistically modeled in the time-frequency domain and the model parameters are learned through the expectation-maximization (EM) algorithm. A direction vector is estimated for each source, using the location information, which is used as the mean parameter of the mixing vector model. Soft time-frequency masks are used to reconstruct the sources. A spatial covariance model is then integrated into the probabilistic model framework that encodes the spatial characteristics of the enclosure and further improves the separation performance in challenging scenarios i.e. when sources are in close proximity and when the level of reverberation is high. Finally, new dereverberation based pre-processing is proposed based on the cascade of three dereverberation stages where each enhances the twomicrophone reverberant mixture. The dereverberation stages are based on amplitude spectral subtraction, where the late reverberation is estimated and suppressed. The combination of such dereverberation based pre-processing and use of soft mask separation yields the best separation performance. All methods are evaluated with real and synthetic mixtures formed for example from speech signals from the TIMIT database and measured room impulse responses

Loughborough University Institutional Repository

地磁気地電流法探査におけるウェーブレット変換及び独立成分分析を用いたノイズ低減による高精度な地下深部情報の推定

Author: Ogawa Hiroki
小川大輝
Publication venue
Publication date: 27/07/2023
Field of study

早稲田大学博士(工学)早大学位記番号:新9380doctoral thesi

Waseda University Repository