3 research outputs found
Glottal-synchronous speech processing
Glottal-synchronous speech processing is a field of speech science where the pseudoperiodicity
of voiced speech is exploited. Traditionally, speech processing involves segmenting
and processing short speech frames of predefined length; this may fail to exploit the inherent
periodic structure of voiced speech which glottal-synchronous speech frames have
the potential to harness. Glottal-synchronous frames are often derived from the glottal
closure instants (GCIs) and glottal opening instants (GOIs).
The SIGMA algorithm was developed for the detection of GCIs and GOIs from
the Electroglottograph signal with a measured accuracy of up to 99.59%. For GCI and
GOI detection from speech signals, the YAGA algorithm provides a measured accuracy
of up to 99.84%. Multichannel speech-based approaches are shown to be more robust to
reverberation than single-channel algorithms.
The GCIs are applied to real-world applications including speech dereverberation,
where SNR is improved by up to 5 dB, and to prosodic manipulation where the importance
of voicing detection in glottal-synchronous algorithms is demonstrated by subjective
testing. The GCIs are further exploited in a new area of data-driven speech modelling,
providing new insights into speech production and a set of tools to aid deployment into
real-world applications. The technique is shown to be applicable in areas of speech coding,
identification and artificial bandwidth extension of telephone speec
A MODEL FOR PREDICTING THE PERFORMANCE OF IP VIDEOCONFERENCING
With the incorporation of free desktop videoconferencing (DVC) software on the
majority of the world's PCs, over the recent years, there has, inevitably, been considerable
interest in using DVC over the Internet. The growing popularity of DVC
increases the need for multimedia quality assessment. However, the task of predicting
the perceived multimedia quality over the Internet Protocol (IP) networks is
complicated by the fact that the audio and video streams are susceptible to unique
impairments due to the unpredictable nature of IP networks, different types of task
scenarios, different levels of complexity, and other related factors. To date, a standard
consensus to define the IP media Quality of Service (QoS) has yet to be implemented.
The thesis addresses this problem by investigating a new approach to
assess the quality of audio, video, and audiovisual overall as perceived in low cost
DVC systems.
The main aim of the thesis is to investigate current methods used to assess the perceived
IP media quality, and then propose a model which will predict the quality of
audiovisual experience from prevailing network parameters.
This thesis investigates the effects of various traffic conditions, such as, packet loss,
jitter, and delay and other factors that may influence end user acceptance, when low
cost DVC is used over the Internet. It also investigates the interaction effects between
the audio and video media, and the issues involving the lip sychronisation
error. The thesis provides the empirical evidence that the subjective mean opinion
score (MOS) of the perceived multimedia quality is unaffected by lip synchronisation
error in low cost DVC systems.
The data-gathering approach that is advocated in this thesis involves both field and
laboratory trials to enable the comparisons of results between classroom-based experiments
and real-world environments to be made, and to provide actual real-world
confirmation of the bench tests. The subjective test method was employed
since it has been proven to be more robust and suitable for the research studies, as
compared to objective testing techniques.
The MOS results, and the number of observations obtained, have enabled a set of
criteria to be established that can be used to determine the acceptable QoS for given
network conditions and task scenarios. Based upon these comprehensive findings,
the final contribution of the thesis is the proposal of a new adaptive architecture
method that is intended to enable the performance of IP based DVC of a particular
session to be predicted for a given network condition