643 research outputs found
Glottal-synchronous speech processing
Glottal-synchronous speech processing is a field of speech science where the pseudoperiodicity
of voiced speech is exploited. Traditionally, speech processing involves segmenting
and processing short speech frames of predefined length; this may fail to exploit the inherent
periodic structure of voiced speech which glottal-synchronous speech frames have
the potential to harness. Glottal-synchronous frames are often derived from the glottal
closure instants (GCIs) and glottal opening instants (GOIs).
The SIGMA algorithm was developed for the detection of GCIs and GOIs from
the Electroglottograph signal with a measured accuracy of up to 99.59%. For GCI and
GOI detection from speech signals, the YAGA algorithm provides a measured accuracy
of up to 99.84%. Multichannel speech-based approaches are shown to be more robust to
reverberation than single-channel algorithms.
The GCIs are applied to real-world applications including speech dereverberation,
where SNR is improved by up to 5 dB, and to prosodic manipulation where the importance
of voicing detection in glottal-synchronous algorithms is demonstrated by subjective
testing. The GCIs are further exploited in a new area of data-driven speech modelling,
providing new insights into speech production and a set of tools to aid deployment into
real-world applications. The technique is shown to be applicable in areas of speech coding,
identification and artificial bandwidth extension of telephone speec
Recommended from our members
Modelling and extraction of fundamental frequency in speech signals
This thesis was submitted for the degree of Doctor of Philosophy and awarded by Brunel University.One of the most important parameters of speech is the fundamental frequency of vibration of voiced sounds. The audio sensation of the fundamental frequency is known as the pitch. Depending on the tonal/non-tonal category of language, the fundamental frequency conveys intonation, pragmatics and meaning. In addition the fundamental frequency and intonation carry speaker gender, age, identity, speaking style and emotional state. Accurate estimation of the fundamental frequency is critically important for functioning of speech processing applications such as speech coding, speech recognition, speech synthesis and voice morphing. This thesis makes contributions to the development of accurate pitch estimation research in three distinct ways: (1) an investigation of the impact of the window length on pitch estimation error, (2) an investigation of the use of the higher order moments and (3) an investigation of an analysis-synthesis method for selection of the best pitch value among N proposed candidates. Experimental evaluations show that the length of the speech window has a major impact on the accuracy of pitch estimation. Depending on the similarity criteria and the order of the statistical moment a window length of 37 to 80 ms gives the least error. In order to avoid excessive delay as a consequence of using a longer window, a method is proposed
ii where the current short window is concatenated with the previous frames to form a longer signal window for pitch extraction. The use of second order and higher order moments, and the magnitude difference function, as the similarity criteria were explored and compared. A novel method of calculation of moments is introduced where the signal is split, i.e. rectified, into positive and negative valued samples. The moments for the positive and negative parts of the signal are computed separately and combined. The new method of calculation of moments from positive and negative parts and the higher order criteria provide competitive results. A challenging issue in pitch estimation is the determination of the best candidate from N extrema of the similarity criteria. The analysis-synthesis method proposed in this thesis selects the pitch candidate that provides the best reproduction (synthesis) of the harmonic spectrum of the original speech. The synthesis method must be such that the distortion increases with the increasing error in the estimate of the fundamental frequency. To this end a new method of spectral synthesis is proposed using an estimate of the spectral envelop and harmonically spaced asymmetric Gaussian pulses as excitation. The N-best method provides consistent reduction in pitch estimation error. The methods described in this thesis result in a significant improvement in the pitch accuracy and outperform the benchmark YIN method
From heuristics-based to data-driven audio melody extraction
The identification of the melody from a music recording is a relatively easy task for humans, but very challenging for computational systems. This task is known as "audio melody extraction", more formally defined as the automatic estimation of the pitch sequence of the melody directly from the audio signal of a polyphonic music recording. This thesis investigates the benefits of exploiting knowledge automatically derived from data for audio melody extraction, by combining digital signal processing and machine learning methods. We extend the scope of melody extraction research by working with a varied dataset and multiple definitions of melody. We first present an overview of the state of the art, and perform an evaluation focused on a novel symphonic music dataset. We then propose melody extraction methods based on a source-filter model and pitch contour characterisation and evaluate them on a wide range of music genres. Finally, we explore novel timbre, tonal and spatial features for contour characterisation, and propose a method for estimating multiple melodic lines. The combination of supervised and unsupervised approaches leads to advancements on melody extraction and shows a promising path for future research and applications
Digital Signal Processing
Contains an introduction and reports on twenty research projects.National Science Foundation (Grant ECS 84-07285)U.S. Navy - Office of Naval Research (Contract N00014-81-K-0742)National Science Foundation FellowshipSanders Associates, Inc.U.S. Air Force - Office of Scientific Research (Contract F19628-85-K-0028)Canada, Bell Northern Research ScholarshipCanada, Fonds pour la Formation de Chercheurs et l'Aide a la Recherche Postgraduate FellowshipCanada, Natural Science and Engineering Research Council Postgraduate FellowshipU.S. Navy - Office of Naval Research (Contract N00014-81-K-0472)Fanny and John Hertz Foundation FellowshipCenter for Advanced Television StudiesAmoco Foundation FellowshipU.S. Air Force - Office of Scientific Research (Contract F19628-85-K-0028
Unsupervised Voice Activity Detection by Modeling Source and System Information using Zero Frequency Filtering
Voice activity detection (VAD) is an important pre-processing step for speech
technology applications. The task consists of deriving segment boundaries of
audio signals which contain voicing information. In recent years, it has been
shown that voice source and vocal tract system information can be extracted
using zero-frequency filtering (ZFF) without making any explicit model
assumptions about the speech signal. This paper investigates the potential of
zero-frequency filtering for jointly modeling voice source and vocal tract
system information, and proposes two approaches for VAD. The first approach
demarcates voiced regions using a composite signal composed of different
zero-frequency filtered signals. The second approach feeds the composite signal
as input to the rVAD algorithm. These approaches are compared with other
supervised and unsupervised VAD methods in the literature, and are evaluated on
the Aurora-2 database, across a range of SNRs (20 to -5 dB). Our studies show
that the proposed ZFF-based methods perform comparable to state-of-art VAD
methods and are more invariant to added degradation and different channel
characteristics.Comment: Accepted at Interspeech 202
Speech Communication
Contains reports on three research projects.U.S. Air Force Cambridge Research Laboratories under Contract F19628-72-C-0181National Institutes of Health (Grant 5 RO1 NS04332-09)Joint Services Electronics Programs (U.S. Army, U. S. Navy, and U. S. Air Force) under Contract DAAB07-71-C-0300M. I. T. Lincoln Laboratory Purchase Order CC-57
Communications Biophysics
Contains reports on eight research projects split into four sections.National Institutes of Health (Grant 5 P01 NS13126)National Institutes of Health (Grant 5 K04 NS00113)National Institutes of Health (Training Grant 5 T32 NS07047)National Science Foundation (Grant BNS80-06369)National Institutes of Health (Grant 5 ROl NS11153)National Institutes of Health (Fellowship 1 F32 NS06544)National Science Foundation (Grant BNS77-16861)National Institutes of Health (Grant 5 R01 NS10916)National Institutes of Health (Grant 5 RO1 NS12846)National Science Foundation (Grant BNS77-21751)National Institutes of Health (Grant 1 R01 NS14092)National Institutes of Health (Grant 2 R01 NS11680)National Institutes of Health (Grant 5 ROl1 NS11080)National Institutes of Health (Training Grant 5 T32 GM07301
- …