Search CORE

57 research outputs found

Glottal-synchronous speech processing

Author: Thomas Mark R P
Thomas Mark R P
Publication venue: Electrical and Electronic Engineering, Imperial College London
Publication date: 01/01/2010
Field of study

Glottal-synchronous speech processing is a field of speech science where the pseudoperiodicity of voiced speech is exploited. Traditionally, speech processing involves segmenting and processing short speech frames of predefined length; this may fail to exploit the inherent periodic structure of voiced speech which glottal-synchronous speech frames have the potential to harness. Glottal-synchronous frames are often derived from the glottal closure instants (GCIs) and glottal opening instants (GOIs). The SIGMA algorithm was developed for the detection of GCIs and GOIs from the Electroglottograph signal with a measured accuracy of up to 99.59%. For GCI and GOI detection from speech signals, the YAGA algorithm provides a measured accuracy of up to 99.84%. Multichannel speech-based approaches are shown to be more robust to reverberation than single-channel algorithms. The GCIs are applied to real-world applications including speech dereverberation, where SNR is improved by up to 5 dB, and to prosodic manipulation where the importance of voicing detection in glottal-synchronous algorithms is demonstrated by subjective testing. The GCIs are further exploited in a new area of data-driven speech modelling, providing new insights into speech production and a set of tools to aid deployment into real-world applications. The technique is shown to be applicable in areas of speech coding, identification and artificial bandwidth extension of telephone speec

Spiral - Imperial College Digital Repository

OpenGrey Repository

Single-Channel Online Enhancement of Speech Corrupted by Reverberation and Noise

Author: Betts Dave
Brookes Mike
Dmour Mohammad A.
Doire Clement Samuel Joseph
Hicks Christopher M.
Jensen Soren Holdt
Naylor Patrick A.
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/03/2017
Field of study

Crossref

VBN

Convolutive Blind Source Separation Methods

Author: Kjems Ulrik
Larsen Jan
Parra Lucas C.
Pedersen Michael Syskind
Publication venue: 'Springer Fachmedien Wiesbaden GmbH'
Publication date: 01/01/2008
Field of study

In this chapter, we provide an overview of existing algorithms for blind source separation of convolutive audio mixtures. We provide a taxonomy, wherein many of the existing algorithms can be organized, and we present published results from those algorithms that have been applied to real-world audio separation tasks

CiteSeerX

Online Research Database In Technology

Nonstationary Signal Processing with Application to Reverberation Cancellation in Acoustic Environments

Author: Hopgood James
Publication venue
Publication date: 01/04/2001
Field of study

Edinburgh Research Explorer

Model-based speech enhancement for hearing aids

Author: Kavalekalam Mathew Shaji
Publication venue: Aalborg Universitetsforlag
Publication date: 01/01/2018
Field of study

VBN

Reverberation: models, estimation and application

Author: Wen Jimi
Wen Jimi
Publication venue: Department of Electrical and Electronic Engineering, Imperial College London
Publication date: 01/08/2009
Field of study

The use of reverberation models is required in many applications such as acoustic measurements, speech dereverberation and robust automatic speech recognition. The aim of this thesis is to investigate different models and propose a perceptually-relevant reverberation model with suitable parameter estimation techniques for different applications. Reverberation can be modelled in both the time and frequency domain. The model parameters give direct information of both physical and perceptual characteristics. These characteristics create a multidimensional parameter space of reverberation, which can be to a large extent captured by a time-frequency domain model. In this thesis, the relationship between physical and perceptual model parameters will be discussed. In the first application, an intrusive technique is proposed to measure the reverberation or reverberance, perception of reverberation and the colouration. The room decay rate parameter is of particular interest. In practical applications, a blind estimate of the decay rate of acoustic energy in a room is required. A statistical model for the distribution of the decay rate of the reverberant signal named the eagleMax distribution is proposed. The eagleMax distribution describes the reverberant speech decay rates as a random variable that is the maximum of the room decay rates and anechoic speech decay rates. Three methods were developed to estimate the mean room decay rate from the eagleMax distributions alone. The estimated room decay rates form a reverberation model that will be discussed in the context of room acoustic measurements, speech dereverberation and robust automatic speech recognition individually

Spiral - Imperial College Digital Repository

Speech dereverberation for enhancement and recognition using dynamic features constrained deep neural networks and feature adaptation

Author
Publication venue: Springer
Publication date: 13/01/2016
Field of study

Springer - Publisher Connector

Parametric modelling for single-channel blind dereverberation of speech from a moving speaker

Author: Evers C.
Hopgood James
Publication venue: 'Institution of Engineering and Technology (IET)'
Publication date: 01/06/2008
Field of study

Single-channel blind dereverberation for the enhancement of speech acquired in acoustic environments is essential in applications where microphone arrays prove impractical. In many scenarios, the source-sensor geometry is not varying rapidly, but in most applications the geometry is subject to change, for example when a user wishes to move around a room. A previous model-based approach to blind dereverberation by representing the channel as a linear time-varying all-pole filter is extended, in which the parameters of the filter are modelled as a linear combination of known basis functions with unknown weightings. Moreover, an improved block-based time-varying autoregressive model is proposed for the speech signal, which aims to reflect the underlying signal statistics more accurately on both a local and global level. Given these parametric models, their coefficients are estimated using Bayesian inference, so that the channel estimate can then be used for dereverberation. An in-depth discussion is also presented about the applicability of these models to real speech and a real acoustic environment. Results are presented to demonstrate the performance of the Bayesian inference algorithms

Southampton (e-Prints Soton)

Edinburgh Research Explorer

Source Separation for Hearing Aid Applications

Author: Pedersen Michael Syskind
Publication venue: Technical University of Denmark
Publication date: 01/11/2006
Field of study

Online Research Database In Technology

Speech dereverberation for enhancement and recognition using dynamic features constrained deep neural networks and feature adaptation

Author: A Rix
AL Maas
B Li
BDV Veen
C-P Chen
CH Knapp
E Habets
E Habets
F Weninger
GE Hinton
GE Hinton
H Hermansky
H Kuttruff
J Allen
J Li
JL Gauvain
K Lebart
M Delcroix
MJF Gales
O Cappe
OLF III
R Chen
S Fischer
S Furui
S Gannot
S Subramaniam
T Toda
T Yoshioka
TH Falk
TH Li
X Xiao
X Xiao
Y Hu
Y Xu
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2016
Field of study

This paper investigates deep neural networks (DNN) based on nonlinear feature mapping and statistical linear feature adaptation approaches for reducing reverberation in speech signals. In the nonlinear feature mapping approach, DNN is trained from parallel clean/distorted speech corpus to map reverberant and noisy speech coefficients (such as log magnitude spectrum) to the underlying clean speech coefficients. The constraint imposed by dynamic features (i.e., the time derivatives of the speech coefficients) are used to enhance the smoothness of predicted coefficient trajectories in two ways. One is to obtain the enhanced speech coefficients with a least square estimation from the coefficients and dynamic features predicted by DNN. The other is to incorporate the constraint of dynamic features directly into the DNN training process using a sequential cost function. In the linear feature adaptation approach, a sparse linear transform, called cross transform, is used to transform multiple frames of speech coefficients to a new feature space. The transform is estimated to maximize the likelihood of the transformed coefficients given a model of clean speech coefficients. Unlike the DNN approach, no parallel corpus is used and no assumption on distortion types is made. The two approaches are evaluated on the REVERB Challenge 2014 tasks. Both speech enhancement and automatic speech recognition (ASR) results show that the DNN-based mappings significantly reduce the reverberation in speech and improve both speech quality and ASR performance. For the speech enhancement task, the proposed dynamic feature constraint help to improve cepstral distance, frequency-weighted segmental signal-to-noise ratio (SNR), and log likelihood ratio metrics while moderately degrades the speech-to-reverberation modulation energy ratio. In addition, the cross transform feature adaptation improves the ASR performance significantly for clean-condition trained acoustic models.Published versio

Crossref

Springer - Publisher Connector

DR-NTU (Digital Repository of NTU)

ScholarBank@NUS