2 research outputs found
Fundamental Frequency and Voicing Prediction from MFCCs for Speech Reconstruction from Unconstrained Speech
This work proposes a method to predict the fundamental frequency and voicing of a frame of speech from its MFCC representation. This has particular use in distributed speech recognition systems where the ability to predict fundamental frequency and voicing allows a time-domain speech signal to be reconstructed solely from the MFCC vectors. Prediction is achieved by modeling the joint density of MFCCs and fundamental frequency with a combined hidden Markov model-Gaussian mixture model (HMM-GMM) framework. Prediction results are presented on unconstrained speech using both a speaker-dependent database and a speaker-independent database. Spectrogram comparisons of the reconstructed and original speech are also made. The results show for the speaker-dependent task a percentage fundamental frequency prediction error of 3.1% is made while for the speaker-independent task this rises to 8.3%
Model-Based Speech Enhancement
Abstract
A method of speech enhancement is developed that reconstructs clean speech from
a set of acoustic features using a harmonic plus noise model of speech. This is a significant
departure from traditional filtering-based methods of speech enhancement.
A major challenge with this approach is to estimate accurately the acoustic features
(voicing, fundamental frequency, spectral envelope and phase) from noisy speech.
This is achieved using maximum a-posteriori (MAP) estimation methods that operate
on the noisy speech. In each case a prior model of the relationship between the
noisy speech features and the estimated acoustic feature is required. These models
are approximated using speaker-independent GMMs of the clean speech features
that are adapted to speaker-dependent models using MAP adaptation and for noise
using the Unscented Transform.
Objective results are presented to optimise the proposed system and a set of subjective
tests compare the approach with traditional enhancement methods. Threeway
listening tests examining signal quality, background noise intrusiveness and
overall quality show the proposed system to be highly robust to noise, performing
significantly better than conventional methods of enhancement in terms of background
noise intrusiveness. However, the proposed method is shown to reduce signal
quality, with overall quality measured to be roughly equivalent to that of the Wiener
filter