42 research outputs found
Reverberation: models, estimation and application
The use of reverberation models is required in many applications such as acoustic measurements,
speech dereverberation and robust automatic speech recognition. The aim of this thesis is to
investigate different models and propose a perceptually-relevant reverberation model with suitable
parameter estimation techniques for different applications.
Reverberation can be modelled in both the time and frequency domain. The model parameters
give direct information of both physical and perceptual characteristics. These characteristics
create a multidimensional parameter space of reverberation, which can be to a large extent captured
by a time-frequency domain model. In this thesis, the relationship between physical and perceptual
model parameters will be discussed. In the first application, an intrusive technique is proposed to
measure the reverberation or reverberance, perception of reverberation and the colouration. The
room decay rate parameter is of particular interest.
In practical applications, a blind estimate of the decay rate of acoustic energy in a room
is required. A statistical model for the distribution of the decay rate of the reverberant signal
named the eagleMax distribution is proposed. The eagleMax distribution describes the reverberant
speech decay rates as a random variable that is the maximum of the room decay rates and anechoic
speech decay rates. Three methods were developed to estimate the mean room decay rate from
the eagleMax distributions alone. The estimated room decay rates form a reverberation model that
will be discussed in the context of room acoustic measurements, speech dereverberation and robust
automatic speech recognition individually
Glottal-synchronous speech processing
Glottal-synchronous speech processing is a field of speech science where the pseudoperiodicity
of voiced speech is exploited. Traditionally, speech processing involves segmenting
and processing short speech frames of predefined length; this may fail to exploit the inherent
periodic structure of voiced speech which glottal-synchronous speech frames have
the potential to harness. Glottal-synchronous frames are often derived from the glottal
closure instants (GCIs) and glottal opening instants (GOIs).
The SIGMA algorithm was developed for the detection of GCIs and GOIs from
the Electroglottograph signal with a measured accuracy of up to 99.59%. For GCI and
GOI detection from speech signals, the YAGA algorithm provides a measured accuracy
of up to 99.84%. Multichannel speech-based approaches are shown to be more robust to
reverberation than single-channel algorithms.
The GCIs are applied to real-world applications including speech dereverberation,
where SNR is improved by up to 5 dB, and to prosodic manipulation where the importance
of voicing detection in glottal-synchronous algorithms is demonstrated by subjective
testing. The GCIs are further exploited in a new area of data-driven speech modelling,
providing new insights into speech production and a set of tools to aid deployment into
real-world applications. The technique is shown to be applicable in areas of speech coding,
identification and artificial bandwidth extension of telephone speec