Search CORE

10 research outputs found

Blind identification of acoustic systems and enhancement of reverberant speech

Author: Gaubitch Nikolay Dian
Gaubitch Nikolay Dian
Publication venue
Publication date: 01/01/2007
Field of study

Imperial Users onl

Spiral - Imperial College Digital Repository

Variable Span Filters for Speech Enhancement

Author: Benesty Jacob
Christensen Mads Græsbøll
Jensen Jesper Rindom
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/03/2016
Field of study

Crossref

VBN

Blind suppression of nonstationary diffuse noise based on spatial covariance matrix decomposition

Author: Araki Shoko
Ito Nobutaka
Nakatani Tomohiro
Ono Nobutaka
Sagayama Shigeki
Vincent Emmanuel
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/05/2015
Field of study

International audienceWe propose methods for blind suppression of nonstationary diffuse noise based on decomposition of the observed spatial covariance matrix into signal and noise parts. In modeling noise to regularize the ill-posed decomposition problem, we exploit spatial invariance (isotropy) instead of temporal invariance (stationarity). The isotropy assumption is that the spatial cross-spectrum of noise is dependent on the distance between microphones and independent of the direction between them. We propose methods for spatial covariance matrix decomposition based on least squares and maximum likelihood estimation. The methods are validated on real-world recordings

INRIA a CCSD electronic archive server

Blind dereverberation of speech from moving and stationary speakers using sequential Monte Carlo methods

Author: Evers Christine
Publication venue: The University of Edinburgh
Publication date: 01/01/2010
Field of study

Speech signals radiated in confined spaces are subject to reverberation due to reflections of surrounding walls and obstacles. Reverberation leads to severe degradation of speech intelligibility and can be prohibitive for applications where speech is digitally recorded, such as audio conferencing or hearing aids. Dereverberation of speech is therefore an important field in speech enhancement. Driven by consumer demand, blind speech dereverberation has become a popular field in the research community and has led to many interesting approaches in the literature. However, most existing methods are dictated by their underlying models and hence suffer from assumptions that constrain the approaches to specific subproblems of blind speech dereverberation. For example, many approaches limit the dereverberation to voiced speech sounds, leading to poor results for unvoiced speech. Few approaches tackle single-sensor blind speech dereverberation, and only a very limited subset allows for dereverberation of speech from moving speakers. Therefore, the aim of this dissertation is the development of a flexible and extendible framework for blind speech dereverberation accommodating different speech sound types, single- or multiple sensor as well as stationary and moving speakers. Bayesian methods benefit from – rather than being dictated by – appropriate model choices. Therefore, the problem of blind speech dereverberation is considered from a Bayesian perspective in this thesis. A generic sequential Monte Carlo approach accommodating a multitude of models for the speech production mechanism and room transfer function is consequently derived. In this approach both the anechoic source signal and reverberant channel are estimated using their optimal estimators by means of Rao-Blackwellisation of the state-space of unknown variables. The remaining model parameters are estimated using sequential importance resampling. The proposed approach is implemented for two different speech production models for stationary speakers, demonstrating substantial reduction in reverberation for both unvoiced and voiced speech sounds. Furthermore, the channel model is extended to facilitate blind dereverberation of speech from moving speakers. Due to the structure of measurement model, single- as well as multi-microphone processing is facilitated, accommodating physically constrained scenarios where only a single sensor can be used as well as allowing for the exploitation of spatial diversity in scenarios where the physical size of microphone arrays is of no concern. This dissertation is concluded with a survey of possible directions for future research, including the use of switching Markov source models, joint target tracking and enhancement, as well as an extension to subband processing for improved computational efficiency

Edinburgh Research Archive

Signal-Adaptive and Perceptually Optimized Sound Zones with Variable Span Trade-Off Filters

Author: Christensen Mads Græsbøll
Lee Taewoong
Nielsen Jesper Kjær
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2020
Field of study

Creating sound zones has been an active research field since the idea was first proposed. So far, most sound zone control methods rely on either an optimization of physical metrics such as acoustic contrast and signal distortion or a mode decomposition of the desired sound field. By using these types of methods, approximately 15 dB of acoustic contrast between the reproduced sound field in the target zone and its leakage to other zone(s) has been reported in practical set-ups, but this is typically not high enough to satisfy the people inside the zones. In this paper, we propose a sound zone control method shaping the leakage errors so that they are as inaudible as possible for a given acoustic contrast. The shaping of the leakage errors is performed by taking the time-varying input signal characteristics and the human auditory system into account when the loudspeaker control filters are calculated. We show how this shaping can be performed using variable span trade-off filters, and we show theoretically how these filters can be used for trading signal distortion in the target zone for acoustic contrast. The proposed method is evaluated based on physical metrics such as acoustic contrast and perceptual metrics such as STOI. The computational complexity and processing time of the proposed method for different system set-ups are also investigated. Lastly, the results of a MUSHRA listening test are reported. The test results show that the proposed method provides more than 20% perceptual improvement compared to existing sound zone control methods.Comment: Accepted for publication in IEEE/ACM TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSIN

arXiv.org e-Print Archive

Crossref

VBN

Distant Speech Recognition of Natural Spontaneous Multi-party Conversations

Author: Liu Yulan
Publication venue: 'University of Sheffield Conference Proceedings'
Publication date: 20/06/2017
Field of study

Distant speech recognition (DSR) has gained wide interest recently. While deep networks keep improving ASR overall, the performance gap remains between using close-talking recordings and distant recordings. Therefore the work in this thesis aims at providing some insights for further improvement of DSR performance. The investigation starts with collecting the first multi-microphone and multi-media corpus of natural spontaneous multi-party conversations in native English with the speaker location tracked, i.e. the Sheffield Wargame Corpus (SWC). The state-of-the-art recognition systems with the acoustic models trained standalone and adapted both show word error rates (WERs) above 40% on headset recordings and above 70% on distant recordings. A comparison between SWC and AMI corpus suggests a few unique properties in the real natural spontaneous conversations, e.g. the very short utterances and the emotional speech. Further experimental analysis based on simulated data and real data quantifies the impact of such influence factors on DSR performance, and illustrates the complex interaction among multiple factors which makes the treatment of each influence factor much more difficult. The reverberation factor is studied further. It is shown that the reverberation effect on speech features could be accurately modelled with a temporal convolution in the complex spectrogram domain. Based on that a polynomial reverberation score is proposed to measure the distortion level of short utterances. Compared to existing reverberation metrics like C50, it avoids a rigid early-late-reverberation partition without compromising the performance on ranking the reverberation level of recording environments and channels. Furthermore, the existing reverberation measurement is signal independent thus unable to accurately estimate the reverberation distortion level in short recordings. Inspired by the phonetic analysis on the reverberation distortion via self-masking and overlap-masking, a novel partition of reverberation distortion into the intra-phone smearing and the inter-phone smearing is proposed, so that the reverberation distortion level is first estimated on each part and then combined

White Rose E-theses Online

Fundamental Frequency and Direction-of-Arrival Estimation for Multichannel Speech Enhancement

Author: Karimian-Azari Sam
Publication venue: Aalborg Universitetsforlag
Publication date: 01/01/2016
Field of study

VBN

Underwater noise due to precipitation

Author: Crum Lawrence A.
Jensen Leif Bjørnø
Prosperetti Andrea
Pumphrey Hugh C.
Publication venue: 'Acoustical Society of America (ASA)'
Publication date: 01/01/1989
Field of study

Crossref

Online Research Database In Technology

Influence of statistical surface models on dynamic scattering of high-frequency signals from the ocean surface (A)

Author: Bjerrum-Niese Christian
Jensen Leif Bjørnø
Publication venue: 'Acoustical Society of America (ASA)'
Publication date: 01/01/1994
Field of study

Crossref

Online Research Database In Technology

Neural network modeling of a dolphin's sonar discrimination capabilities

Author: Andersen Lars Nonboe
Au WWL
Nachtigall PE
René Rasmussen A
Roitblat H.
Publication venue: 'Acoustical Society of America (ASA)'
Publication date: 01/01/1994
Field of study

Crossref

Online Research Database In Technology