39 research outputs found

    Towards a Practical Silent Speech Interface Based on Vocal Tract Imaging

    Get PDF
    Intégralité des actes de cette conférence disponible au lien suivant: http://www.issp2011.uqam.ca/upload/files/proceedings.pdfInternational audienceThe paper describes advances in the development of an ultrasound silent speech interface for use in silent communications applications or as a speaking aid for persons who have undergone a laryngectomy. It reports some first steps towards making such a device lightweight, portable, interactive, and practical to use. Simple experimental tests of an interactive silent speech interface for everyday applications are described. Possible future improvements including extension to continuous speech and real time operation are discussed.Cet article décrit les avancements dans le développement d'une interface ultrasonore de parole silencieuse, pour des applications en communication silencieuse ou comme une aide pour les personnes laryngectomisées. Nous rapportons les premiers pas pour réaliser une telle interface portable, interactive, et pratique à utiliser. De simples tests expérimentaux de cette interface pour des applications quotidiennes sont décrits. Des améliorations futures possibles incluant l'extension à la parole continue et aux traitements en temps réels sont discutées

    Articulation silencieuse vs. vocalisée pour une "communication parlée silencieuse" : implications des différences articulatoires

    Get PDF
    National audienceDans la présente étude, nous décrivons des tests de reconnaissance de mots produits à la fois en parole silencieuse et en parole vocalisée dans les phases de tests et d'entrainement. Dans un premier temps [6], un système d'acquisition fixe a été comparé à une ébauche de système portatif avec des scores de reconnaissance très similaires, permettant de valider l'efficacité d'un système portatif, et ainsi d'envisager l'utilisation dans un avenir proche de tels systèmes pour utilisateurs handicapés. Les résultats du système de reconnaissance à base de modèles de Markov [6] ont mis en avant une part beaucoup plus importante de la langue - comparée aux lèvres - dans le score final. Malgré tout, les modèles de Markov ne permettent pas de décrire les mouvements des lèvres et de la langue effectués par les informateurs. Nous montrons ici des différences quantifiées d'aperture, d'étirement et d'arrondissement des lèvres, ainsi que des mouvements d'inflexion, de hauteur et d'antériorité de la langue pour la parole silencieuse comparée à la parole vocalisée. Les résultats montrent pour la parole vocalisée des stratégies similaires au renforcement des traits distinctifs des phonèmes pour la parole hyperarticulée, à savoir des phonèmes arrondis plus arrondis, des voyelles ouvertes plus ouvertes, etc. dans le cas de la parole vocalisée. En [4], nous mettions en avant la nécessité d'entrainer le système de reconnaissance sur des productions silencieuses. Cette étude permet d'en comprendre les raisons et pourquoi pas, de mieux les appréhender. Le travail final présentera également une discussion du type de corpus à utiliser et de la méthodologie des interfaces de communication parlée silencieuse

    Silent versus modal multi-speaker speech recognition from ultrasound and video

    Get PDF
    We investigate multi-speaker speech recognition from ultrasound images of the tongue and video images of the lips. We train our systems on imaging data from modal speech, and evaluate on matched test sets of two speaking modes: silent and modal speech. We observe that silent speech recognition from imaging data underperforms compared to modal speech recognition, likely due to a speaking-mode mismatch between training and testing. We improve silent speech recognition performance using techniques that address the domain mismatch, such as fMLLR and unsupervised model adaptation. We also analyse the properties of silent and modal speech in terms of utterance duration and the size of the articulatory space. To estimate the articulatory space, we compute the convex hull of tongue splines, extracted from ultrasound tongue images. Overall, we observe that the duration of silent speech is longer than that of modal speech, and that silent speech covers a smaller articulatory space than modal speech. Although these two properties are statistically significant across speaking modes, they do not directly correlate with word error rates from speech recognition.Comment: 5 pages, 5 figures, Submitted to Interspeech 202

    Towards a Multimodal Silent Speech Interface for European Portuguese

    Get PDF
    Automatic Speech Recognition (ASR) in the presence of environmental noise is still a hard problem to tackle in speech science (Ng et al., 2000). Another problem well described in the literature is the one concerned with elderly speech production. Studies (Helfrich, 1979) have shown evidence of a slower speech rate, more breaks, more speech errors and a humbled volume of speech, when comparing elderly with teenagers or adults speech, on an acoustic level. This fact makes elderly speech hard to recognize, using currently available stochastic based ASR technology. To tackle these two problems in the context of ASR for HumanComputer Interaction, a novel Silent Speech Interface (SSI) in European Portuguese (EP) is envisioned.info:eu-repo/semantics/acceptedVersio

    Revisión de las Tecnologías y Aplicaciones del Habla Sub-vocal

    Get PDF
    This paper presents a review of the main applicative and methodological approaches that have been developed in recent years for sub-vocal speech or silent language. The sub-vocal speech can be defined as the identification and characterization of bioelectric signals that control the vocal tract, when is not produced sound production by the caller. The first section makes a deep review of methods for detecting silent language. In the second part are evaluated the technologies implemented in recent years, followed by a review of the main applications of this type of speech and finally present a broad comparison between jobs that have been developed in industry and academic applications.Este trabajo presenta una revisión de estado de las principales temáticas aplicativas y metodológicas del habla sub-vocal que se han venido desarrollando en los últimos años. La primera sección hace una honda revisión de los métodos de detección del lenguaje silencioso. En la segunda parte se evalúan las tecnologías implementadas en los últimos años, seguido de un análisis en las principales aplicaciones de este tipo de lenguaje y finalmente presentado una amplia comparación entre los trabajos que se han hecho en industria y academia utilizando este tipo de desarrollos

    EMG-to-Speech: Direct Generation of Speech from Facial Electromyographic Signals

    Get PDF
    The general objective of this work is the design, implementation, improvement and evaluation of a system that uses surface electromyographic (EMG) signals and directly synthesizes an audible speech output: EMG-to-speech

    Interfaces de fala silenciosa multimodais para português europeu com base na articulação

    Get PDF
    Doutoramento conjunto MAPi em InformáticaThe concept of silent speech, when applied to Human-Computer Interaction (HCI), describes a system which allows for speech communication in the absence of an acoustic signal. By analyzing data gathered during different parts of the human speech production process, Silent Speech Interfaces (SSI) allow users with speech impairments to communicate with a system. SSI can also be used in the presence of environmental noise, and in situations in which privacy, confidentiality, or non-disturbance are important. Nonetheless, despite recent advances, performance and usability of Silent Speech systems still have much room for improvement. A better performance of such systems would enable their application in relevant areas, such as Ambient Assisted Living. Therefore, it is necessary to extend our understanding of the capabilities and limitations of silent speech modalities and to enhance their joint exploration. Thus, in this thesis, we have established several goals: (1) SSI language expansion to support European Portuguese; (2) overcome identified limitations of current SSI techniques to detect EP nasality (3) develop a Multimodal HCI approach for SSI based on non-invasive modalities; and (4) explore more direct measures in the Multimodal SSI for EP acquired from more invasive/obtrusive modalities, to be used as ground truth in articulation processes, enhancing our comprehension of other modalities. In order to achieve these goals and to support our research in this area, we have created a multimodal SSI framework that fosters leveraging modalities and combining information, supporting research in multimodal SSI. The proposed framework goes beyond the data acquisition process itself, including methods for online and offline synchronization, multimodal data processing, feature extraction, feature selection, analysis, classification and prototyping. Examples of applicability are provided for each stage of the framework. These include articulatory studies for HCI, the development of a multimodal SSI based on less invasive modalities and the use of ground truth information coming from more invasive/obtrusive modalities to overcome the limitations of other modalities. In the work here presented, we also apply existing methods in the area of SSI to EP for the first time, noting that nasal sounds may cause an inferior performance in some modalities. In this context, we propose a non-invasive solution for the detection of nasality based on a single Surface Electromyography sensor, conceivable of being included in a multimodal SSI.O conceito de fala silenciosa, quando aplicado a interação humano-computador, permite a comunicação na ausência de um sinal acústico. Através da análise de dados, recolhidos no processo de produção de fala humana, uma interface de fala silenciosa (referida como SSI, do inglês Silent Speech Interface) permite a utilizadores com deficiências ao nível da fala comunicar com um sistema. As SSI podem também ser usadas na presença de ruído ambiente, e em situações em que privacidade, confidencialidade, ou não perturbar, é importante. Contudo, apesar da evolução verificada recentemente, o desempenho e usabilidade de sistemas de fala silenciosa tem ainda uma grande margem de progressão. O aumento de desempenho destes sistemas possibilitaria assim a sua aplicação a áreas como Ambientes Assistidos. É desta forma fundamental alargar o nosso conhecimento sobre as capacidades e limitações das modalidades utilizadas para fala silenciosa e fomentar a sua exploração conjunta. Assim, foram estabelecidos vários objetivos para esta tese: (1) Expansão das linguagens suportadas por SSI com o Português Europeu; (2) Superar as limitações de técnicas de SSI atuais na deteção de nasalidade; (3) Desenvolver uma abordagem SSI multimodal para interação humano-computador, com base em modalidades não invasivas; (4) Explorar o uso de medidas diretas e complementares, adquiridas através de modalidades mais invasivas/intrusivas em configurações multimodais, que fornecem informação exata da articulação e permitem aumentar a nosso entendimento de outras modalidades. Para atingir os objetivos supramencionados e suportar a investigação nesta área procedeu-se à criação de uma plataforma SSI multimodal que potencia os meios para a exploração conjunta de modalidades. A plataforma proposta vai muito para além da simples aquisição de dados, incluindo também métodos para sincronização de modalidades, processamento de dados multimodais, extração e seleção de características, análise, classificação e prototipagem. Exemplos de aplicação para cada fase da plataforma incluem: estudos articulatórios para interação humano-computador, desenvolvimento de uma SSI multimodal com base em modalidades não invasivas, e o uso de informação exata com origem em modalidades invasivas/intrusivas para superar limitações de outras modalidades. No trabalho apresentado aplica-se ainda, pela primeira vez, métodos retirados do estado da arte ao Português Europeu, verificando-se que sons nasais podem causar um desempenho inferior de um sistema de fala silenciosa. Neste contexto, é proposta uma solução para a deteção de vogais nasais baseada num único sensor de eletromiografia, passível de ser integrada numa interface de fala silenciosa multimodal

    Discrete Anatomical Coordinates for Speech Production and Synthesis

    Get PDF
    The sounds of all languages are described by a finite set of symbols, which are extracted from the continuum of sounds produced by the vocal organ. How the discrete phonemic identity is encoded in the continuous movements producing speech remains an open question for the experimental phonology. In this work, this question is assessed by using Hall-effect transducers and magnets—mounted on the tongue, lips, and jaw—to track the kinematics of the oral tract during the vocalization of vowel-consonant-vowel structures. Using a threshold strategy, the time traces of the transducers were converted into discrete motor coordinates unambiguously associated with the vocalized phonemes. Furthermore, the signals of the transducers combined with the discretization strategy were used to drive a low-dimensional vocal model capable of synthesizing intelligible speech. The current work not only assesses a relevant inquiry of the biology of language, but also demonstrates the performance of the experimental technique to monitor the displacement of the main articulators of the vocal tract while speaking. This novel electronic device represents an economic and portable option to the standard systems used to study the vocal tract movements

    Advancing Electromyographic Continuous Speech Recognition: Signal Preprocessing and Modeling

    Get PDF
    Speech is the natural medium of human communication, but audible speech can be overheard by bystanders and excludes speech-disabled people. This work presents a speech recognizer based on surface electromyography, where electric potentials of the facial muscles are captured by surface electrodes, allowing speech to be processed nonacoustically. A system which was state-of-the-art at the beginning of this book is substantially improved in terms of accuracy, flexibility, and robustness
    corecore