104 research outputs found
A quantitative assessment of group delay methods for identifying glottal closures in voiced speech
Published versio
A Quantitative Assessment of Group Delay Methods for Identifying Glottal Closures in Voiced Speech
Abstract-Measures based on the group delay of the LPC residual have been used by a number of authors to identify the time instants of glottal closure in voiced speech. In this paper, we discuss the theoretical properties of three such measures and we also present a new measure having useful properties. We give a quantitative assessment of each measure's ability to detect glottal closure instants evaluated using a speech database that includes a direct measurement of glottal activity from a Laryngograph/EGG signal. We find that when using a fixed-length analysis window, the best measures can detect the instant of glottal closure in 97% of larynx cycles with a standard deviation of 0.6 ms and that in 9% of these cycles an additional excitation instant is found that normally corresponds to glottal opening. We show that some improvement in detection rate may be obtained if the analysis window length is adapted to the speech pitch. If the measures are applied to the preemphasized speech instead of to the LPC residual, we find that the timing accuracy worsens but the detection rate improves slightly. We assess the computational cost of evaluating the measures and we present new recursive algorithms that give a substantial reduction in computation in all cases
A Quantitative Assessment of Group Delay Methods for Identifying Glottal Closures in Voiced Speech
Abstract-Measures based on the group delay of the LPC residual have been used by a number of authors to identify the time instants of glottal closure in voiced speech. In this paper, we discuss the theoretical properties of three such measures and we also present a new measure having useful properties. We give a quantitative assessment of each measure's ability to detect glottal closure instants evaluated using a speech database that includes a direct measurement of glottal activity from a Laryngograph/EGG signal. We find that when using a fixed-length analysis window, the best measures can detect the instant of glottal closure in 97% of larynx cycles with a standard deviation of 0.6 ms and that in 9% of these cycles an additional excitation instant is found that normally corresponds to glottal opening. We show that some improvement in detection rate may be obtained if the analysis window length is adapted to the speech pitch. If the measures are applied to the preemphasized speech instead of to the LPC residual, we find that the timing accuracy worsens but the detection rate improves slightly. We assess the computational cost of evaluating the measures and we present new recursive algorithms that give a substantial reduction in computation in all cases
Glottal-synchronous speech processing
Glottal-synchronous speech processing is a field of speech science where the pseudoperiodicity
of voiced speech is exploited. Traditionally, speech processing involves segmenting
and processing short speech frames of predefined length; this may fail to exploit the inherent
periodic structure of voiced speech which glottal-synchronous speech frames have
the potential to harness. Glottal-synchronous frames are often derived from the glottal
closure instants (GCIs) and glottal opening instants (GOIs).
The SIGMA algorithm was developed for the detection of GCIs and GOIs from
the Electroglottograph signal with a measured accuracy of up to 99.59%. For GCI and
GOI detection from speech signals, the YAGA algorithm provides a measured accuracy
of up to 99.84%. Multichannel speech-based approaches are shown to be more robust to
reverberation than single-channel algorithms.
The GCIs are applied to real-world applications including speech dereverberation,
where SNR is improved by up to 5 dB, and to prosodic manipulation where the importance
of voicing detection in glottal-synchronous algorithms is demonstrated by subjective
testing. The GCIs are further exploited in a new area of data-driven speech modelling,
providing new insights into speech production and a set of tools to aid deployment into
real-world applications. The technique is shown to be applicable in areas of speech coding,
identification and artificial bandwidth extension of telephone speec
Experimental phonetic study of the timing of voicing in English obstruents
The treatment given to the timing of voicing in three areas of phonetic
research -- phonetic taxonomy, speech production modelling, and speech
synthesis -- Is considered in the light of an acoustic study of the timing of
voicing in British English obstruents. In each case, it is found to be deficient.
The underlying cause is the difficulty in applying a rigid segmental approach to
an aspect of speech production characterised by important inter-articulator
asynchronies, coupled to the limited quantitative data available concerning the
systematic properties of the timing of voicing in languages.
It is argued that the categories and labels used to describe the timing of
voicing In obstruents are Inadequate for fulfilling the descriptive goals of
phonetic theory. One possible alternative descriptive strategy is proposed,
based on incorporating aspects of the parametric organisation of speech into
the descriptive framework. Within the domain of speech production modelling,
no satisfactory account has been given of fine-grained variability of the timing
of voicing not capable of explanation in terms of general properties of motor
programming and utterance execution. The experimental results support claims
In the literature that the phonetic control of an utterance may be somewhat
less abstract than has been suggestdd in some previous reports. A schematic
outline is given, of one way in which the timing of voicing could be controlled
in speech production. The success of a speech synthesis-by-rule system
depends to a great extent on a comprehensive encoding of the systematic
phonetic characteristics of the target language. Only limited success has been
achieved in the past thirty years. A set of rules is proposed for generating
more naturalistic patterns of voicing in obstruents, reflecting those observed in
the experimental component of this study. Consideration Is given to strategies
for evaluating the effect of fine-grained phonetic rules In speech synthesis
Hilbert phase methods for glottal activity detection
The 2 pi discontinuities found in the wrapped Hilbert phase of the bandpass-filtered analytic DEGG signal provide accurate candidate locations of glottal closure instances (GCIs). Pruning these GCI candidates with an automatically determined amplitude threshold, found by iteratively removing from the full signal the inlier samples within a fraction of its standard deviation until converged, yields a 99.6% accurate detection system with a false alarm rate of 0.17%. This simpler algorithm, named Glottal Activity Detector For Laryngeal Input (GADFLI), outperforms the state-of-the-art SIGMA algorithm for GCI detection, which has a 94.2% detection rate, but a 5.46% false alarm rate. Performance metrics were computed over the entire APLAWD database, using an extensive, hand-verified markings database of 10,944 waveforms. A related proposed algorithm, QuickGCI, also makes use of Hilbert phase discontinuities, and does not require a thresholding post-processing step for GCI selection. Its performance is nearly as good as GADFLI. Both proposed algorithms operate using the electroglottographic signal or acoustic speech signal
- …