171 research outputs found
Models and analysis of vocal emissions for biomedical applications
This book of Proceedings collects the papers presented at the 3rd International Workshop on Models and Analysis of Vocal Emissions for Biomedical Applications, MAVEBA 2003, held 10-12 December 2003, Firenze, Italy. The workshop is organised every two years, and aims to stimulate contacts between specialists active in research and industrial developments, in the area of voice analysis for biomedical applications. The scope of the Workshop includes all aspects of voice modelling and analysis, ranging from fundamental research to all kinds of biomedical applications and related established and advanced technologies
Subband Adaptive Modeling of Digital Hearing Aids
In this thesis, the application of a subband adaptive model to characterize compression behaviour of five digital hearing aids is investigated. Using a signal-to-error ratio metric, modeling performance is determined by varying the number of analysis bands in the subband structure as well as consideration of three adaptive algorithms. The normalized least mean-squares (NLMS), the affine projection algorithm (APA), and the recursive least-squares (RLS) algorithms are employed using a range of parameters to determine the impact on modeling performance. Using the subband adaptive model to estimate the time-varying frequency response of each hearing aid allows the Perceptual Evaluation of Speech Quality (PESQ) mean-opinion score (MOS) to be computed. The PESQ MOS facilitates an estimation of a subjective assessment of speech quality using an objective score. Initial results suggest the PESQ MOS score is able to differentiate speech processed by hearing aids allowing them to be ranked accordingly. Further work is required to obtain subjective assessments of the processed speech signals and determine if possible correlations exist
Improved compactly computable objective measures for predicting the acceptiability of speech communications systems
Issued as Monthly status reports [1-7], and Final report, Project no. E-21-61
Perceptual models in speech quality assessment and coding
The ever-increasing demand for good communications/toll
quality speech has created a renewed interest into the
perceptual impact of rate compression. Two general areas are
investigated in this work, namely speech quality assessment
and speech coding.
In the field of speech quality assessment, a model is
developed which simulates the processing stages of the
peripheral auditory system. At the output of the model a
"running" auditory spectrum is obtained. This represents
the auditory (spectral) equivalent of any acoustic sound such
as speech. Auditory spectra from coded speech segments serve
as inputs to a second model. This model simulates the
information centre in the brain which performs the speech
quality assessment. [Continues.
Recommended from our members
Modelling and extraction of fundamental frequency in speech signals
This thesis was submitted for the degree of Doctor of Philosophy and awarded by Brunel University.One of the most important parameters of speech is the fundamental frequency of vibration of voiced sounds. The audio sensation of the fundamental frequency is known as the pitch. Depending on the tonal/non-tonal category of language, the fundamental frequency conveys intonation, pragmatics and meaning. In addition the fundamental frequency and intonation carry speaker gender, age, identity, speaking style and emotional state. Accurate estimation of the fundamental frequency is critically important for functioning of speech processing applications such as speech coding, speech recognition, speech synthesis and voice morphing. This thesis makes contributions to the development of accurate pitch estimation research in three distinct ways: (1) an investigation of the impact of the window length on pitch estimation error, (2) an investigation of the use of the higher order moments and (3) an investigation of an analysis-synthesis method for selection of the best pitch value among N proposed candidates. Experimental evaluations show that the length of the speech window has a major impact on the accuracy of pitch estimation. Depending on the similarity criteria and the order of the statistical moment a window length of 37 to 80 ms gives the least error. In order to avoid excessive delay as a consequence of using a longer window, a method is proposed
ii where the current short window is concatenated with the previous frames to form a longer signal window for pitch extraction. The use of second order and higher order moments, and the magnitude difference function, as the similarity criteria were explored and compared. A novel method of calculation of moments is introduced where the signal is split, i.e. rectified, into positive and negative valued samples. The moments for the positive and negative parts of the signal are computed separately and combined. The new method of calculation of moments from positive and negative parts and the higher order criteria provide competitive results. A challenging issue in pitch estimation is the determination of the best candidate from N extrema of the similarity criteria. The analysis-synthesis method proposed in this thesis selects the pitch candidate that provides the best reproduction (synthesis) of the harmonic spectrum of the original speech. The synthesis method must be such that the distortion increases with the increasing error in the estimate of the fundamental frequency. To this end a new method of spectral synthesis is proposed using an estimate of the spectral envelop and harmonically spaced asymmetric Gaussian pulses as excitation. The N-best method provides consistent reduction in pitch estimation error. The methods described in this thesis result in a significant improvement in the pitch accuracy and outperform the benchmark YIN method
Improving automatic detection of obstructive sleep apnea through nonlinear analysis of sustained speech
We present a novel approach for the detection of severe obstructive sleep apnea (OSA) based on patients' voices introducing nonlinear measures to describe sustained speech dynamics. Nonlinear features were combined with state-of-the-art speech recognition systems using statistical modeling techniques (Gaussian mixture models, GMMs) over cepstral parameterization (MFCC) for both continuous and sustained speech. Tests were performed on a database including speech records from both severe OSA and control speakers. A 10 % relative reduction in classification error was obtained for sustained speech when combining MFCC-GMM and nonlinear features, and 33 % when fusing nonlinear features with both sustained and continuous MFCC-GMM. Accuracy reached 88.5 % allowing the system to be used in OSA early detection. Tests showed that nonlinear features and MFCCs are lightly correlated on sustained speech, but uncorrelated on continuous speech. Results also suggest the existence of nonlinear effects in OSA patients' voices, which should be found in continuous speech
DESIGN AND EVALUATION OF HARMONIC SPEECH ENHANCEMENT AND BANDWIDTH EXTENSION
Improving the quality and intelligibility of speech signals continues to be an important topic in mobile communications and hearing aid applications. This thesis explored the possibilities of improving the quality of corrupted speech by cascading a log Minimum Mean Square Error (logMMSE) noise reduction system with a Harmonic Speech Enhancement (HSE) system. In HSE, an adaptive comb filter is deployed to harmonically filter the useful speech signal and suppress the noisy components to noise floor. A Bandwidth Extension (BWE) algorithm was applied to the enhanced speech for further improvements in speech quality. Performance of this algorithm combination was evaluated using objective speech quality metrics across a variety of noisy and reverberant environments. Results showed that the logMMSE and HSE combination enhanced the speech quality in any reverberant environment and in the presence of multi-talker babble. The objective improvements associated with the BWE were found to be minima
- …