87,960 research outputs found
Assessment of severe apnoea through voice analysis, automatic speech, and speaker recognition techniques
The electronic version of this article is the complete one and can be found online at:
http://asp.eurasipjournals.com/content/2009/1/982531This study is part of an ongoing collaborative effort between the medical and the signal processing communities to promote research on applying standard Automatic Speech Recognition (ASR) techniques for the automatic diagnosis of patients with severe obstructive sleep apnoea (OSA). Early detection of severe apnoea cases is important so that patients can receive early treatment. Effective ASR-based detection could dramatically cut medical testing time. Working with a carefully designed speech database of healthy and apnoea subjects, we describe an acoustic search for distinctive apnoea voice characteristics. We also study abnormal nasalization in OSA patients by modelling vowels in nasal and nonnasal phonetic contexts using Gaussian Mixture Model (GMM) pattern recognition on speech spectra. Finally, we present experimental findings regarding the discriminative power of GMMs applied to severe apnoea detection. We have achieved an 81% correct classification rate, which is very promising and underpins the interest in this line of inquiry.The activities described in this paper were funded by the Spanish Ministry of Science and Technology as part of the TEC2006-13170-C02-02 Project
Severe apnoea detection using speaker recognition techniques
Proceedings of the International Conference on Bio-inspired Systems and Signal Processing (BIOSIGNALS 2009)The aim of this paper is to study new possibilities of using Automatic Speaker Recognition techniques
(ASR) for detection of patients with severe obstructive sleep apnoea (OSA). Early detection of severe
apnoea cases can be very useful to give priority to their early treatment optimizing the expensive and timeconsuming
tests of current diagnosis methods based on full overnight sleep in a hospital. This work is part
of an on-going collaborative project between medical and signal processing communities to promote new
research efforts on automatic OSA diagnosis through speech processing technologies applied on a carefully
designed speech database of healthy subjects and apnoea patients. So far, in this contribution we present and
discuss several approaches of applying generative Gaussian Mixture Models (GMMs), generally used in
ASR systems, to model specific acoustic properties of continuous speech signals in different linguistic
contexts reflecting discriminative physiological characteristics found in OSA patients. Finally, experimental
results on the discriminative power of speaker recognition techniques adapted to severe apnoea detection are
presented. These results obtain a correct classification rate of 81.25%, representing a promising result
underlining the interest of this research framework and opening further perspectives for improvement using
more specific speech recognition technologiesThe activities described in this paper were funded by the Spanish Ministry of Science and Technology as part of the TEC2006-13170-C02-01 project
Model-based estimation of in-car-communication feedback applied to speech zone detection
Modern cars provide versatile tools to enhance speech communication. While an
in-car communication (ICC) system aims at enhancing communication between the
passengers by playing back desired speech via loudspeakers in the car, these
loudspeaker signals may disturb a speech enhancement system required for
hands-free telephony and automatic speech recognition. In this paper, we focus
on speech zone detection, i.e. detecting which passenger in the car is
speaking, which is a crucial component of the speech enhancement system. We
propose a model-based feedback estimation method to improve robustness of
speech zone detection against ICC feedback. Specifically, since the zone
detection system typically does not have access to the ICC loudspeaker signals,
the proposed method estimates the feedback signal from the observed microphone
signals based on a free-field propagation model between the loudspeakers and
the microphones as well as the ICC gain. We propose an efficient recursive
implementation in the short-time Fourier transform domain using convolutive
transfer functions. A realistic simulation study indicates that the proposed
method allows to increase the ICC gain by about 6dB while still achieving
robust speech zone detection results.Comment: 5 pages, submitted to International Workshop on Acoustic Signal
Enhancement (IWAENC), Bamberg, Germany, 202
ASR error management for improving spoken language understanding
This paper addresses the problem of automatic speech recognition (ASR) error
detection and their use for improving spoken language understanding (SLU)
systems. In this study, the SLU task consists in automatically extracting, from
ASR transcriptions , semantic concepts and concept/values pairs in a e.g
touristic information system. An approach is proposed for enriching the set of
semantic labels with error specific labels and by using a recently proposed
neural approach based on word embeddings to compute well calibrated ASR
confidence measures. Experimental results are reported showing that it is
possible to decrease significantly the Concept/Value Error Rate with a state of
the art system, outperforming previously published results performance on the
same experimental data. It also shown that combining an SLU approach based on
conditional random fields with a neural encoder/decoder attention based
architecture , it is possible to effectively identifying confidence islands and
uncertain semantic output segments useful for deciding appropriate error
handling actions by the dialogue manager strategy .Comment: Interspeech 2017, Aug 2017, Stockholm, Sweden. 201
Face Detection And Lip Localization
Integration of audio and video signals for automatic speech recognition has become an important field of study. The Audio-Visual Speech Recognition (AVSR) system is known to have accuracy higher than audio-only or visual-only system. The research focused on the visual front end and has been centered around lip segmentation. Experiments performed for lip feature extraction were mainly done in constrained environment with controlled background noise. In this thesis we focus our attention to a database collected in the environment of a moving car which hampered the quality of the imagery.
We first introduce the concept of illumination compensation, where we try to reduce the dependency of light from over- or under-exposed images. As a precursor to lip segmentation, we focus on a robust face detection technique which reaches an accuracy of 95%. We have detailed and compared three different face detection techniques and found a successful way of concatenating them in order to increase the overall accuracy. One of the detection techniques used was the object detection algorithm proposed by Viola-Jones. We have experimented with different color spaces using the Viola-Jones algorithm and have reached interesting conclusions.
Following face detection we implement a lip localization algorithm based on the vertical gradients of hybrid equations of color. Despite the challenging background and image quality, success rate of 88% was achieved for lip segmentation
Multilingual statistical text analysis, Zipf's law and Hungarian speech generation
The practical challenge of creating a Hungarian e-mail reader has initiated our work on statistical text analysis. The starting point was statistical analysis for automatic discrimination of the language of texts. Later it was extended to automatic re-generation of diacritic signs and more detailed language structure analysis. A parallel study of three different languages-Hungarian, German and English-using text corpora of a similar size gives a possibility for the exploration of both similarities and differences. Corpora of publicly available Internet sources were used. The corpus size was the same (approximately 20 Mbytes, 2.5-3.5 million word forms) for all languages. Besides traditional corpus coverage, word length and occurrence statistics, some new features about prosodic boundaries (sentence initial and final positions, preceding and following a comma) were also computed. Among others, it was found that the coverage of corpora by the most frequent words follows a parallel logarithmic rule for all languages in the 40-85% coverage range, known as Zipf's law in linguistics. The functions are much nearer for English and German than for Hungarian. Further conclusions are also drawn. The language detection and diacritic regeneration applications are discussed in detail with implications on Hungarian speech generation. Diverse further application domains, such as predictive text input, word hyphenation, language modelling in speech recognition, corpus-based speech synthesis, etc. are also foreseen
- …