514 research outputs found

    Silent-speech enhancement using body-conducted vocal-tract resonance signals

    Get PDF
    The physical characteristics of weak body-conducted vocal-tract resonance signals called non-audible murmur (NAM) and the acoustic characteristics of three sensors developed for detecting these signals have been investigated. NAM signals attenuate 50 dB at 1 kHz; this attenuation consists of 30-dB full-range attenuation due to air-to-body transmission loss and 10 dB/octave spectral decay due to a sound propagation loss within the body. These characteristics agree with the spectral characteristics of measured NAM signals. The sensors have a sensitivity of between 41 and 58 dB [V/Pa] at I kHz, and the mean signal-to-noise ratio of the detected signals was 15 dB. On the basis of these investigations, three types of silent-speech enhancement systems were developed: (1) simple, direct amplification of weak vocal-tract resonance signals using a wired urethane-elastomer NAM microphone, (2) simple, direct amplification using a wireless urethane-elastomer-duplex NAM microphone, and (3) transformation of the weak vocal-tract resonance signals sensed by a soft-silicone NAM microphone into whispered speech using statistical conversion. Field testing of the systems showed that they enable voice impaired people to communicate verbally using body-conducted vocal-tract resonance signals. Listening tests demonstrated that weak body-conducted vocal-tract resonance sounds can be transformed into intelligible whispered speech sounds. Using these systems, people with voice impairments can re-acquire speech communication with less effort. (C) 2009 Elsevier B.V. All rights reserved.ArticleSPEECH COMMUNICATION. 52(4):301-313 (2010)journal articl

    Numerical simulation of transfer and attenuation characteristics of soft-tissue conducted sound originating from vocal tract

    Get PDF
    A non-audible murmur (NAM), a very weak speech sound produced without vocal cord vibration, can be detected by a special NAM microphone attached to the neck, thereby providing a new speech communication tool for functional speech disorders as well as human-to-machine and human-to-human interfaces with inaudible voice input for use with unimpaired. The NAM microphone is a condenser microphone covered with soft-silicone impression material that provides good impedance matching with the soft tissues of the neck. Because higher-frequency components are suppressed severely, however, the NAM detected with this device can be insufficiently clear. To improve NAM clarity, the mechanism of NAM production as well as the transfer characteristics of the NAM in soft neck tissues must be clarified. We have investigated sound propagation from the vocal tract to the neck surface, using a finite difference time domain method and a head model based on magnetic resonance imaging scans. Numerical results show that, compared to air-conducted sound detected in front of a mouth, soft-tissue-conducted sound attenuates 50 dB at 1 kHz, which consists of 30 dB full-range attenuation due to air-to-soft-tissues transmission loss and -10 dB/octave spectral decay due to a propagation loss in soft tissues. The decay agrees well with the spectral characteristics of the measured NAM. (C) 2008 Elsevier Ltd. All rights reserved.ArticleAPPLIED ACOUSTICS. 70(3):469-472 (2009)journal articl

    Towards a Multimodal Silent Speech Interface for European Portuguese

    Get PDF
    Automatic Speech Recognition (ASR) in the presence of environmental noise is still a hard problem to tackle in speech science (Ng et al., 2000). Another problem well described in the literature is the one concerned with elderly speech production. Studies (Helfrich, 1979) have shown evidence of a slower speech rate, more breaks, more speech errors and a humbled volume of speech, when comparing elderly with teenagers or adults speech, on an acoustic level. This fact makes elderly speech hard to recognize, using currently available stochastic based ASR technology. To tackle these two problems in the context of ASR for HumanComputer Interaction, a novel Silent Speech Interface (SSI) in European Portuguese (EP) is envisioned.info:eu-repo/semantics/acceptedVersio

    Noise-robust whispered speech recognition using a non-audible-murmur microphone with VTS compensation

    Get PDF

    Analysis of a Modern Voice Morphing Approach using Gaussian Mixture Models for Laryngectomees

    Full text link
    This paper proposes a voice morphing system for people suffering from Laryngectomy, which is the surgical removal of all or part of the larynx or the voice box, particularly performed in cases of laryngeal cancer. A primitive method of achieving voice morphing is by extracting the source's vocal coefficients and then converting them into the target speaker's vocal parameters. In this paper, we deploy Gaussian Mixture Models (GMM) for mapping the coefficients from source to destination. However, the use of the traditional/conventional GMM-based mapping approach results in the problem of over-smoothening of the converted voice. Thus, we hereby propose a unique method to perform efficient voice morphing and conversion based on GMM,which overcomes the traditional-method effects of over-smoothening. It uses a technique of glottal waveform separation and prediction of excitations and hence the result shows that not only over-smoothening is eliminated but also the transformed vocal tract parameters match with the target. Moreover, the synthesized speech thus obtained is found to be of a sufficiently high quality. Thus, voice morphing based on a unique GMM approach has been proposed and also critically evaluated based on various subjective and objective evaluation parameters. Further, an application of voice morphing for Laryngectomees which deploys this unique approach has been recommended by this paper.Comment: 6 pages, 4 figures, 4 tables; International Journal of Computer Applications Volume 49, Number 21, July 201

    Analisis de integrada wavelet de señales no audibles

    Get PDF
    The analysis of non-audible signals has gain a significant importance due to its many fields of application, among them, speech synthesis for people with speech disabilities. This analysis can be used to acquire information from the vocal apparatus without the need of speaking in order to produce a phonetic expression. The analysis of a Wavelet transformation of Spanish words recorded through a non-audible murmur microphone in order to achieve an embedded silent speech recognition system of Spanish language is proposed. A non-audible murmur microphone is used as sensor of non-vocal speech. Coding of the input data is done through a Wavelet transform using a fourth-order Daubechies function. The acquisition, processing and transmission system is applied through a STM32F4-Discovery evaluation board. The vocabulary utilized consists of command words aimed to control mobile robots or human-machine interfaces. The Wavelet transformation of four Spanish words, each of them having five independent samples, was accomplished. An analysis of the resulting data was performed, and features as average, peaks and frequency were distinguished. The processing of the signals is performed successfully and further work in speech activity detection and features classifiers is proposed. El análisis de señales no audibles ha ganado una importancia significativa debido a sus muchos campos de aplicación, entre ellos, la síntesis del habla para personas con discapacidades del habla. Este análisis puede usarse para obtener información del aparato vocal sin la necesidad de hablar para producir una expresión fonética. Se propone el análisis de una transformación Wavelet de palabras en español grabadas a través de un micrófono de murmullo no audible para lograr un sistema integrado de reconocimiento de voz silenciosa del idioma español. Se usa un micrófono de soplo no audible como sensor de habla no vocal. La codificación de los datos de entrada se realiza a través de una transformación Wavelet utilizando una función Daubechies de cuarto orden. El sistema de adquisición, procesamiento y transmisión se aplica a través de una placa de evaluación STM32F4-Discovery. El vocabulario utilizado consiste en palabras de comando destinadas a controlar robots móviles o interfaces hombre-máquina. Se logró la transformación Wavelet de cuatro palabras en español, cada una de ellas con cinco muestras independientes. Se realizó un análisis de los datos resultantes y se distinguieron características como promedio, picos y frecuencia. El procesamiento de las señales se realiza con éxito y se propone un trabajo adicional en la detección de la actividad del habla y clasificadores de características
    • …
    corecore