469 research outputs found
Subspace-based Fundamental Frequency Estimation
Publication in the conference proceedings of EUSIPCO, Viena, Austria, 200
Glottal-synchronous speech processing
Glottal-synchronous speech processing is a field of speech science where the pseudoperiodicity
of voiced speech is exploited. Traditionally, speech processing involves segmenting
and processing short speech frames of predefined length; this may fail to exploit the inherent
periodic structure of voiced speech which glottal-synchronous speech frames have
the potential to harness. Glottal-synchronous frames are often derived from the glottal
closure instants (GCIs) and glottal opening instants (GOIs).
The SIGMA algorithm was developed for the detection of GCIs and GOIs from
the Electroglottograph signal with a measured accuracy of up to 99.59%. For GCI and
GOI detection from speech signals, the YAGA algorithm provides a measured accuracy
of up to 99.84%. Multichannel speech-based approaches are shown to be more robust to
reverberation than single-channel algorithms.
The GCIs are applied to real-world applications including speech dereverberation,
where SNR is improved by up to 5 dB, and to prosodic manipulation where the importance
of voicing detection in glottal-synchronous algorithms is demonstrated by subjective
testing. The GCIs are further exploited in a new area of data-driven speech modelling,
providing new insights into speech production and a set of tools to aid deployment into
real-world applications. The technique is shown to be applicable in areas of speech coding,
identification and artificial bandwidth extension of telephone speec
Model-Based Speech Enhancement
Abstract
A method of speech enhancement is developed that reconstructs clean speech from
a set of acoustic features using a harmonic plus noise model of speech. This is a significant
departure from traditional filtering-based methods of speech enhancement.
A major challenge with this approach is to estimate accurately the acoustic features
(voicing, fundamental frequency, spectral envelope and phase) from noisy speech.
This is achieved using maximum a-posteriori (MAP) estimation methods that operate
on the noisy speech. In each case a prior model of the relationship between the
noisy speech features and the estimated acoustic feature is required. These models
are approximated using speaker-independent GMMs of the clean speech features
that are adapted to speaker-dependent models using MAP adaptation and for noise
using the Unscented Transform.
Objective results are presented to optimise the proposed system and a set of subjective
tests compare the approach with traditional enhancement methods. Threeway
listening tests examining signal quality, background noise intrusiveness and
overall quality show the proposed system to be highly robust to noise, performing
significantly better than conventional methods of enhancement in terms of background
noise intrusiveness. However, the proposed method is shown to reduce signal
quality, with overall quality measured to be roughly equivalent to that of the Wiener
filter
Cognitive Information Processing
Contains reports on six research projects.National Institutes of Health (Grant 5 PO1 GM14940-04)National Institutes of Health (Grant 5 PO1 GM15006-03)Joint Services Electronics Programs (U. S. Army, U.S. Navy, and U.S. Air Force) under Contract DA 28-043-AMC-02536(E
An investigation into glottal waveform based speech coding
Coding of voiced speech by extraction of the glottal waveform has shown promise in improving the efficiency of speech coding systems. This thesis describes an investigation into the performance of such a system.
The effect of reverberation on the radiation impedance at the lips is shown to be negligible under normal conditions. Also, the accuracy of the Image Method for adding artificial reverberation to anechoic speech recordings is established.
A new algorithm, Pre-emphasised Maximum Likelihood Epoch Detection (PMLED), for Glottal Closure Instant detection is proposed. The algorithm is tested on natural speech and is shown to be both accurate and robust.
Two techniques for giottai waveform estimation, Closed Phase Inverse Filtering (CPIF) and Iterative Adaptive Inverse Filtering (IAIF), are compared. In tandem with an LF model fitting procedure, both techniques display a high degree of accuracy However, IAIF is found to be slightly more robust.
Based on these results, a Glottal Excited Linear Predictive (GELP) coding system for voiced speech is proposed and tested. Using a differential LF parameter quantisation scheme, the system achieves speech quality similar to that of U S Federal Standard 1016 CELP at a lower mean bit rate while incurring no extra delay
Pitch and spectral analysis of speech based on an auditory synchrony model
Also issued as Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 1985.Includes bibliographical references (p. 228-235).Supported in part by the National Institutes of Health. 5 T32 NS07040Stephanie Seneff
Defining Fundamental Frequency for Almost Harmonic Signals
In this work, we consider the modeling of signals that are almost, but not
quite, harmonic, i.e., composed of sinusoids whose frequencies are close to
being integer multiples of a common frequency. Typically, in applications, such
signals are treated as perfectly harmonic, allowing for the estimation of their
fundamental frequency, despite the signals not actually being periodic. Herein,
we provide three different definitions of a concept of fundamental frequency
for such inharmonic signals and study the implications of the different choices
for modeling and estimation. We show that one of the definitions corresponds to
a misspecified modeling scenario, and provides a theoretical benchmark for
analyzing the behavior of estimators derived under a perfectly harmonic
assumption. The second definition stems from optimal mass transport theory and
yields a robust and easily interpretable concept of fundamental frequency based
on the signals' spectral properties. The third definition interprets the
inharmonic signal as an observation of a randomly perturbed harmonic signal.
This allows for computing a hybrid information theoretical bound on estimation
performance, as well as for finding an estimator attaining the bound. The
theoretical findings are illustrated using numerical examples.Comment: Accepted for publication in IEEE Transactions on Signal Processin
- …