137 research outputs found

    Towards a Multimodal Silent Speech Interface for European Portuguese

    Get PDF
    Automatic Speech Recognition (ASR) in the presence of environmental noise is still a hard problem to tackle in speech science (Ng et al., 2000). Another problem well described in the literature is the one concerned with elderly speech production. Studies (Helfrich, 1979) have shown evidence of a slower speech rate, more breaks, more speech errors and a humbled volume of speech, when comparing elderly with teenagers or adults speech, on an acoustic level. This fact makes elderly speech hard to recognize, using currently available stochastic based ASR technology. To tackle these two problems in the context of ASR for HumanComputer Interaction, a novel Silent Speech Interface (SSI) in European Portuguese (EP) is envisioned.info:eu-repo/semantics/acceptedVersio

    Noise-robust whispered speech recognition using a non-audible-murmur microphone with VTS compensation

    Get PDF

    Analisis de integrada wavelet de señales no audibles

    Get PDF
    The analysis of non-audible signals has gain a significant importance due to its many fields of application, among them, speech synthesis for people with speech disabilities. This analysis can be used to acquire information from the vocal apparatus without the need of speaking in order to produce a phonetic expression. The analysis of a Wavelet transformation of Spanish words recorded through a non-audible murmur microphone in order to achieve an embedded silent speech recognition system of Spanish language is proposed. A non-audible murmur microphone is used as sensor of non-vocal speech. Coding of the input data is done through a Wavelet transform using a fourth-order Daubechies function. The acquisition, processing and transmission system is applied through a STM32F4-Discovery evaluation board. The vocabulary utilized consists of command words aimed to control mobile robots or human-machine interfaces. The Wavelet transformation of four Spanish words, each of them having five independent samples, was accomplished. An analysis of the resulting data was performed, and features as average, peaks and frequency were distinguished. The processing of the signals is performed successfully and further work in speech activity detection and features classifiers is proposed. El análisis de señales no audibles ha ganado una importancia significativa debido a sus muchos campos de aplicación, entre ellos, la síntesis del habla para personas con discapacidades del habla. Este análisis puede usarse para obtener información del aparato vocal sin la necesidad de hablar para producir una expresión fonética. Se propone el análisis de una transformación Wavelet de palabras en español grabadas a través de un micrófono de murmullo no audible para lograr un sistema integrado de reconocimiento de voz silenciosa del idioma español. Se usa un micrófono de soplo no audible como sensor de habla no vocal. La codificación de los datos de entrada se realiza a través de una transformación Wavelet utilizando una función Daubechies de cuarto orden. El sistema de adquisición, procesamiento y transmisión se aplica a través de una placa de evaluación STM32F4-Discovery. El vocabulario utilizado consiste en palabras de comando destinadas a controlar robots móviles o interfaces hombre-máquina. Se logró la transformación Wavelet de cuatro palabras en español, cada una de ellas con cinco muestras independientes. Se realizó un análisis de los datos resultantes y se distinguieron características como promedio, picos y frecuencia. El procesamiento de las señales se realiza con éxito y se propone un trabajo adicional en la detección de la actividad del habla y clasificadores de características

    Interfaces de fala silenciosa multimodais para português europeu com base na articulação

    Get PDF
    Doutoramento conjunto MAPi em InformáticaThe concept of silent speech, when applied to Human-Computer Interaction (HCI), describes a system which allows for speech communication in the absence of an acoustic signal. By analyzing data gathered during different parts of the human speech production process, Silent Speech Interfaces (SSI) allow users with speech impairments to communicate with a system. SSI can also be used in the presence of environmental noise, and in situations in which privacy, confidentiality, or non-disturbance are important. Nonetheless, despite recent advances, performance and usability of Silent Speech systems still have much room for improvement. A better performance of such systems would enable their application in relevant areas, such as Ambient Assisted Living. Therefore, it is necessary to extend our understanding of the capabilities and limitations of silent speech modalities and to enhance their joint exploration. Thus, in this thesis, we have established several goals: (1) SSI language expansion to support European Portuguese; (2) overcome identified limitations of current SSI techniques to detect EP nasality (3) develop a Multimodal HCI approach for SSI based on non-invasive modalities; and (4) explore more direct measures in the Multimodal SSI for EP acquired from more invasive/obtrusive modalities, to be used as ground truth in articulation processes, enhancing our comprehension of other modalities. In order to achieve these goals and to support our research in this area, we have created a multimodal SSI framework that fosters leveraging modalities and combining information, supporting research in multimodal SSI. The proposed framework goes beyond the data acquisition process itself, including methods for online and offline synchronization, multimodal data processing, feature extraction, feature selection, analysis, classification and prototyping. Examples of applicability are provided for each stage of the framework. These include articulatory studies for HCI, the development of a multimodal SSI based on less invasive modalities and the use of ground truth information coming from more invasive/obtrusive modalities to overcome the limitations of other modalities. In the work here presented, we also apply existing methods in the area of SSI to EP for the first time, noting that nasal sounds may cause an inferior performance in some modalities. In this context, we propose a non-invasive solution for the detection of nasality based on a single Surface Electromyography sensor, conceivable of being included in a multimodal SSI.O conceito de fala silenciosa, quando aplicado a interação humano-computador, permite a comunicação na ausência de um sinal acústico. Através da análise de dados, recolhidos no processo de produção de fala humana, uma interface de fala silenciosa (referida como SSI, do inglês Silent Speech Interface) permite a utilizadores com deficiências ao nível da fala comunicar com um sistema. As SSI podem também ser usadas na presença de ruído ambiente, e em situações em que privacidade, confidencialidade, ou não perturbar, é importante. Contudo, apesar da evolução verificada recentemente, o desempenho e usabilidade de sistemas de fala silenciosa tem ainda uma grande margem de progressão. O aumento de desempenho destes sistemas possibilitaria assim a sua aplicação a áreas como Ambientes Assistidos. É desta forma fundamental alargar o nosso conhecimento sobre as capacidades e limitações das modalidades utilizadas para fala silenciosa e fomentar a sua exploração conjunta. Assim, foram estabelecidos vários objetivos para esta tese: (1) Expansão das linguagens suportadas por SSI com o Português Europeu; (2) Superar as limitações de técnicas de SSI atuais na deteção de nasalidade; (3) Desenvolver uma abordagem SSI multimodal para interação humano-computador, com base em modalidades não invasivas; (4) Explorar o uso de medidas diretas e complementares, adquiridas através de modalidades mais invasivas/intrusivas em configurações multimodais, que fornecem informação exata da articulação e permitem aumentar a nosso entendimento de outras modalidades. Para atingir os objetivos supramencionados e suportar a investigação nesta área procedeu-se à criação de uma plataforma SSI multimodal que potencia os meios para a exploração conjunta de modalidades. A plataforma proposta vai muito para além da simples aquisição de dados, incluindo também métodos para sincronização de modalidades, processamento de dados multimodais, extração e seleção de características, análise, classificação e prototipagem. Exemplos de aplicação para cada fase da plataforma incluem: estudos articulatórios para interação humano-computador, desenvolvimento de uma SSI multimodal com base em modalidades não invasivas, e o uso de informação exata com origem em modalidades invasivas/intrusivas para superar limitações de outras modalidades. No trabalho apresentado aplica-se ainda, pela primeira vez, métodos retirados do estado da arte ao Português Europeu, verificando-se que sons nasais podem causar um desempenho inferior de um sistema de fala silenciosa. Neste contexto, é proposta uma solução para a deteção de vogais nasais baseada num único sensor de eletromiografia, passível de ser integrada numa interface de fala silenciosa multimodal

    Revisión de las Tecnologías y Aplicaciones del Habla Sub-vocal

    Get PDF
    This paper presents a review of the main applicative and methodological approaches that have been developed in recent years for sub-vocal speech or silent language. The sub-vocal speech can be defined as the identification and characterization of bioelectric signals that control the vocal tract, when is not produced sound production by the caller. The first section makes a deep review of methods for detecting silent language. In the second part are evaluated the technologies implemented in recent years, followed by a review of the main applications of this type of speech and finally present a broad comparison between jobs that have been developed in industry and academic applications.Este trabajo presenta una revisión de estado de las principales temáticas aplicativas y metodológicas del habla sub-vocal que se han venido desarrollando en los últimos años. La primera sección hace una honda revisión de los métodos de detección del lenguaje silencioso. En la segunda parte se evalúan las tecnologías implementadas en los últimos años, seguido de un análisis en las principales aplicaciones de este tipo de lenguaje y finalmente presentado una amplia comparación entre los trabajos que se han hecho en industria y academia utilizando este tipo de desarrollos

    EMG-to-Speech: Direct Generation of Speech from Facial Electromyographic Signals

    Get PDF
    The general objective of this work is the design, implementation, improvement and evaluation of a system that uses surface electromyographic (EMG) signals and directly synthesizes an audible speech output: EMG-to-speech

    Phone based heart and lung functions monitor

    Get PDF
    Tese de Mestrado Integrado. Engenharia Informática e Computação. Faculdade de Engenharia. Universidade do Porto. 201

    Towards a silent speech interface for Portuguese: Surface electromyography and the nasality challenge

    Get PDF
    A Silent Speech Interface (SSI) aims at performing Automatic Speech Recognition (ASR) in the absence of an intelligible acoustic signal. It can be used as a human-computer interaction modality in high-background-noise environments, such as living rooms, or in aiding speech-impaired individuals, increasing in prevalence with ageing. If this interaction modality is made available for users own native language, with adequate performance, and since it does not rely on acoustic information, it will be less susceptible to problems related to environmental noise, privacy, information disclosure and exclusion of speech impaired persons. To contribute to the existence of this promising modality for Portuguese, for which no SSI implementation is known, we are exploring and evaluating the potential of state-of-the-art approaches. One of the major challenges we face in SSI for European Portuguese is recognition of nasality, a core characteristic of this language Phonetics and Phonology. In this paper a silent speech recognition experiment based on Surface Electromyography is presented. Results confirmed recognition problems between minimal pairs of words that only differ on nasality of one of the phones, causing 50% of the total error and evidencing accuracy performance degradation, which correlates well with the exiting knowledge.info:eu-repo/semantics/acceptedVersio
    corecore