Search CORE

1,178 research outputs found

Glottal Spectral Separation for Speech Synthesis

Author: Cabral João P
Renals Steve
Richmond Korin
Yamagishi Junichi
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/04/2014
Field of study

Reconstructing intelligible audio speech from visual speech features

Author: Le Cornu Thomas
Milner Ben
Publication venue
Publication date: 01/01/2015
Field of study

This work describes an investigation into the feasibility of producing intelligible audio speech from only visual speech fea- tures. The proposed method aims to estimate a spectral enve- lope from visual features which is then combined with an arti- ficial excitation signal and used within a model of speech pro- duction to reconstruct an audio signal. Different combinations of audio and visual features are considered, along with both a statistical method of estimation and a deep neural network. The intelligibility of the reconstructed audio speech is measured by human listeners, and then compared to the intelligibility of the video signal only and when combined with the reconstructed audio

University of East Anglia digital repository

Estimating acoustic speech features in low signal-to-noise ratios using a statistical framework

Author: Ben Milner
Berouti
Boersma
Cappe
Christensen
Christensen
Christensen
Chung
Darch
Davis
de Cheveigné
Dhananjaya
Ephraim
ETSI
Faubel
Gales
Gauvain
Geiger
Godsill
Gonzalez
Harding
Harding
Hermansky
Hirahara
Hu
Kaewtip
Kawahara
Kawahara
Koriyama
Lei
Li
Loizou
Ma
Makhoul
Martin
McAulay
Milner
Milner
Morales-Cordovilla
Morales-Cordovilla
Moreno
Nielsen
Oppenheim
Philip Harding
Rangachari
Robinson
Scalart
Schäck
Soong
Stoica
Swindlehurst
Syrdal
Tabrikian
Taghia
Talkin
Varga
Vaseghi
Woodland
Publication venue: 'Elsevier BV'
Publication date: 17/08/2016
Field of study

Accurate estimation of acoustic speech features from noisy speech and from different speakers is an ongoing problem in speech processing. Many methods have been proposed to estimate acoustic features but errors increase as signal-to-noise ratios fall. This work proposes a robust statistical framework to estimate an acoustic speech vector (comprising voicing, fundamental frequency and spectral envelope) from an intermediate feature that is extracted from a noisy time-domain speech signal. The initial approach is accurate in clean conditions but deteriorates in noise and with changing speaker. Adaptation methods are then developed to adjust the acoustic models to the noise conditions and speaker. Evaluations are carried out in stationary and nonstationary noises and at SNRs from -5dB to clean conditions. Comparison with conventional methods of estimating fundamental frequency, voicing and spectral envelope reveals the proposed framework to have lowest errors in all conditions tested

Crossref

University of East Anglia digital repository

<strong>Non-Gaussian, Non-stationary and Nonlinear Signal Processing Methods - with Applications to Speech Processing and Channel Estimation</strong>

Author: Li Chunjian
Publication venue: Institut for Elektroniske Systemer, Aalborg Universitet
Publication date: 01/01/2007
Field of study

VBN

Model-Based Speech Enhancement

Author: Harding Philip
Publication venue
Publication date: 01/07/2013
Field of study

Abstract A method of speech enhancement is developed that reconstructs clean speech from a set of acoustic features using a harmonic plus noise model of speech. This is a significant departure from traditional filtering-based methods of speech enhancement. A major challenge with this approach is to estimate accurately the acoustic features (voicing, fundamental frequency, spectral envelope and phase) from noisy speech. This is achieved using maximum a-posteriori (MAP) estimation methods that operate on the noisy speech. In each case a prior model of the relationship between the noisy speech features and the estimated acoustic feature is required. These models are approximated using speaker-independent GMMs of the clean speech features that are adapted to speaker-dependent models using MAP adaptation and for noise using the Unscented Transform. Objective results are presented to optimise the proposed system and a set of subjective tests compare the approach with traditional enhancement methods. Threeway listening tests examining signal quality, background noise intrusiveness and overall quality show the proposed system to be highly robust to noise, performing significantly better than conventional methods of enhancement in terms of background noise intrusiveness. However, the proposed method is shown to reduce signal quality, with overall quality measured to be roughly equivalent to that of the Wiener filter

University of East Anglia digital repository