31 research outputs found

    Overt speech decoding from cortical activity: a comparison of different linear methods

    Get PDF
    IntroductionSpeech BCIs aim at reconstructing speech in real time from ongoing cortical activity. Ideal BCIs would need to reconstruct speech audio signal frame by frame on a millisecond-timescale. Such approaches require fast computation. In this respect, linear decoder are good candidates and have been widely used in motor BCIs. Yet, they have been very seldomly studied for speech reconstruction, and never for reconstruction of articulatory movements from intracranial activity. Here, we compared vanilla linear regression, ridge-regularized linear regressions, and partial least squares regressions for offline decoding of overt speech from cortical activity.MethodsTwo decoding paradigms were investigated: (1) direct decoding of acoustic vocoder features of speech, and (2) indirect decoding of vocoder features through an intermediate articulatory representation chained with a real-time-compatible DNN-based articulatory-to-acoustic synthesizer. Participant's articulatory trajectories were estimated from an electromagnetic-articulography dataset using dynamic time warping. The accuracy of the decoders was evaluated by computing correlations between original and reconstructed features.ResultsWe found that similar performance was achieved by all linear methods well above chance levels, albeit without reaching intelligibility. Direct and indirect methods achieved comparable performance, with an advantage for direct decoding.DiscussionFuture work will address the development of an improved neural speech decoder compatible with fast frame-by-frame speech reconstruction from ongoing activity at a millisecond timescale

    Neurolinguistics Research Advancing Development of a Direct-Speech Brain-Computer Interface

    Get PDF
    A direct-speech brain-computer interface (DS-BCI) acquires neural signals corresponding to imagined speech, then processes and decodes these signals to produce a linguistic output in the form of phonemes, words, or sentences. Recent research has shown the potential of neurolinguistics to enhance decoding approaches to imagined speech with the inclusion of semantics and phonology in experimental procedures. As neurolinguistics research findings are beginning to be incorporated within the scope of DS-BCI research, it is our view that a thorough understanding of imagined speech, and its relationship with overt speech, must be considered an integral feature of research in this field. With a focus on imagined speech, we provide a review of the most important neurolinguistics research informing the field of DS-BCI and suggest how this research may be utilized to improve current experimental protocols and decoding techniques. Our review of the literature supports a cross-disciplinary approach to DS-BCI research, in which neurolinguistics concepts and methods are utilized to aid development of a naturalistic mode of communication. : Cognitive Neuroscience; Computer Science; Hardware Interface Subject Areas: Cognitive Neuroscience, Computer Science, Hardware Interfac

    Vers une interface cerveau-machine pour la restauration de la parole

    No full text
    Restoring natural speech in paralyzed and aphasic people could be achieved using a brain-computer interface controlling a speech synthesizer in real-time. The aim of this thesis was thus to develop three main steps toward such proof of concept.First, a prerequisite was to develop a speech synthesizer producing intelligible speech in real-time with a reasonable number of control parameters. Here we chose to synthesize speech from movements of the speech articulators since recent studies suggested that neural activity from the speech motor cortex contains relevant information to decode speech, and especially articulatory features of speech. We thus developed a speech synthesizer that produced intelligible speech from articulatory data. This was achieved by first recording a large dataset of synchronous articulatory and acoustic data in a single speaker. Then, we used machine learning techniques, especially deep neural networks, to build a model able to convert articulatory data into speech. This synthesizer was built to run in real time. Finally, as a first step toward future brain control of this synthesizer, we tested that it could be controlled in real-time by several speakers to produce intelligible speech from articulatory movements in a closed-loop paradigm.Second, we investigated the feasibility of decoding speech and articulatory features from neural activity essentially recorded in the speech motor cortex. We built a tool that allowed to localize active cortical speech areas online during awake brain surgery at the Grenoble Hospital and tested this system in two patients with brain cancer. Results show that the motor cortex exhibits specific activity during speech production in the beta and gamma bands, which are also present during speech imagination. The recorded data could be successfully analyzed to decode speech intention, voicing activity and the trajectories of the main articulators of the vocal tract above chance.Finally, we addressed ethical issues that arise with the development and use of brain-computer interfaces. We considered three levels of ethical questionings, dealing respectively with the animal, the human being, and the human species.Restorer la faculté de parler chez des personnes paralysées et aphasiques pourrait être envisagée via l’utilisation d’une interface cerveau-machine permettant de contrôler un synthétiseur de parole en temps réel. L’objectif de cette thèse était de développer trois aspects nécessaires à la mise au point d’une telle preuve de concept.Premièrement, un synthétiseur permettant de produire en temps-réel de la parole intelligible et controlé par un nombre raisonable de paramètres est nécessaire. Nous avons choisi de synthétiser de la parole à partir des mouvements des articulateurs du conduit vocal. En effet, des études récentes ont suggéré que l’activité neuronale du cortex moteur de la parole pourrait contenir suffisamment d’information pour décoder la parole, et particulièrement ses propriété articulatoire (ex. l’ouverture des lèvres). Nous avons donc développé un synthétiseur produisant de la parole intelligible à partir de données articulatoires. Dans un premier temps, nous avons enregistré un large corpus de données articulatoire et acoustiques synchrones chez un locuteur. Ensuite, nous avons utilisé des techniques d’apprentissage automatique, en particulier des réseaux de neurones profonds, pour construire un modèle permettant de convertir des données articulatoires en parole. Ce synthétisuer a été construit pour fonctionner en temps réel. Enfin, comme première étape vers un contrôle neuronal de ce synthétiseur, nous avons testé qu’il pouvait être contrôlé en temps réel par plusieurs locuteurs, pour produire de la parole inetlligible à partir de leurs mouvements articulatoires dans un paradigme de boucle fermée.Deuxièmement, nous avons étudié le décodage de la parole et de ses propriétés articulatoires à partir d’activités neuronales essentiellement enregistrées dans le cortex moteur de la parole. Nous avons construit un outil permettant de localiser les aires corticales actives, en ligne pendant des chirurgies éveillées à l’hôpital de Grenoble, et nous avons testé ce système chez deux patients atteints d’un cancer du cerveau. Les résultats ont montré que le cortex moteur exhibe une activité spécifique pendant la production de parole dans les bandes beta et gamma du signal, y compris lors de l’imagination de la parole. Les données enregistrées ont ensuite pu être analysées pour décoder l’intention de parler du sujet (réelle ou imaginée), ainsi que la vibration des cordes vocales et les trajectoires des articulateurs principaux du conduit vocal significativement au dessus du niveau de la chance.Enfin, nous nous sommes intéressés aux questions éthiques qui accompagnent le développement et l’usage des interfaces cerveau-machine. Nous avons en particulier considéré trois niveaux de réflexion éthique concernant respectivement l’animal, l’humain et l’humanité

    Introduction

    No full text

    Toward a brain-computer interface for speech restoration

    No full text
    Restorer la faculté de parler chez des personnes paralysées et aphasiques pourrait être envisagée via l’utilisation d’une interface cerveau-machine permettant de contrôler un synthétiseur de parole en temps réel. L’objectif de cette thèse était de développer trois aspects nécessaires à la mise au point d’une telle preuve de concept.Premièrement, un synthétiseur permettant de produire en temps-réel de la parole intelligible et controlé par un nombre raisonable de paramètres est nécessaire. Nous avons choisi de synthétiser de la parole à partir des mouvements des articulateurs du conduit vocal. En effet, des études récentes ont suggéré que l’activité neuronale du cortex moteur de la parole pourrait contenir suffisamment d’information pour décoder la parole, et particulièrement ses propriété articulatoire (ex. l’ouverture des lèvres). Nous avons donc développé un synthétiseur produisant de la parole intelligible à partir de données articulatoires. Dans un premier temps, nous avons enregistré un large corpus de données articulatoire et acoustiques synchrones chez un locuteur. Ensuite, nous avons utilisé des techniques d’apprentissage automatique, en particulier des réseaux de neurones profonds, pour construire un modèle permettant de convertir des données articulatoires en parole. Ce synthétisuer a été construit pour fonctionner en temps réel. Enfin, comme première étape vers un contrôle neuronal de ce synthétiseur, nous avons testé qu’il pouvait être contrôlé en temps réel par plusieurs locuteurs, pour produire de la parole inetlligible à partir de leurs mouvements articulatoires dans un paradigme de boucle fermée.Deuxièmement, nous avons étudié le décodage de la parole et de ses propriétés articulatoires à partir d’activités neuronales essentiellement enregistrées dans le cortex moteur de la parole. Nous avons construit un outil permettant de localiser les aires corticales actives, en ligne pendant des chirurgies éveillées à l’hôpital de Grenoble, et nous avons testé ce système chez deux patients atteints d’un cancer du cerveau. Les résultats ont montré que le cortex moteur exhibe une activité spécifique pendant la production de parole dans les bandes beta et gamma du signal, y compris lors de l’imagination de la parole. Les données enregistrées ont ensuite pu être analysées pour décoder l’intention de parler du sujet (réelle ou imaginée), ainsi que la vibration des cordes vocales et les trajectoires des articulateurs principaux du conduit vocal significativement au dessus du niveau de la chance.Enfin, nous nous sommes intéressés aux questions éthiques qui accompagnent le développement et l’usage des interfaces cerveau-machine. Nous avons en particulier considéré trois niveaux de réflexion éthique concernant respectivement l’animal, l’humain et l’humanité.Restoring natural speech in paralyzed and aphasic people could be achieved using a brain-computer interface controlling a speech synthesizer in real-time. The aim of this thesis was thus to develop three main steps toward such proof of concept.First, a prerequisite was to develop a speech synthesizer producing intelligible speech in real-time with a reasonable number of control parameters. Here we chose to synthesize speech from movements of the speech articulators since recent studies suggested that neural activity from the speech motor cortex contains relevant information to decode speech, and especially articulatory features of speech. We thus developed a speech synthesizer that produced intelligible speech from articulatory data. This was achieved by first recording a large dataset of synchronous articulatory and acoustic data in a single speaker. Then, we used machine learning techniques, especially deep neural networks, to build a model able to convert articulatory data into speech. This synthesizer was built to run in real time. Finally, as a first step toward future brain control of this synthesizer, we tested that it could be controlled in real-time by several speakers to produce intelligible speech from articulatory movements in a closed-loop paradigm.Second, we investigated the feasibility of decoding speech and articulatory features from neural activity essentially recorded in the speech motor cortex. We built a tool that allowed to localize active cortical speech areas online during awake brain surgery at the Grenoble Hospital and tested this system in two patients with brain cancer. Results show that the motor cortex exhibits specific activity during speech production in the beta and gamma bands, which are also present during speech imagination. The recorded data could be successfully analyzed to decode speech intention, voicing activity and the trajectories of the main articulators of the vocal tract above chance.Finally, we addressed ethical issues that arise with the development and use of brain-computer interfaces. We considered three levels of ethical questionings, dealing respectively with the animal, the human being, and the human species

    Real-Time Control of an Articulatory-Based Speech Synthesizer for Brain Computer Interfaces

    No full text
    International audienceRestoring natural speech in paralyzed and aphasic people could be achieved using a Brain-Computer Interface (BCI) controlling a speech synthesizer in real-time. To reach this goal, a prerequisite is to develop a speech synthesizer producing intelligible speech in real-time with a reasonable number of control parameters. We present here an articulatory-based speech synthesizer that can be controlled in real-time for future BCI applications. This synthesizer converts movements of the main speech articulators (tongue, jaw, velum, and lips) into intelligible speech. The articulatory-to-acoustic mapping is performed using a deep neural network (DNN) trained on electromagnetic articulography (EMA) data recorded on a reference speaker synchronously with the produced speech signal. This DNN is then used in both offline and online modes to map the position of sensors glued on different speech articulators into acoustic parameters that are further converted into an audio signal using a vocoder. In offline mode, highly intelligible speech could be obtained as assessed by perceptual evaluation performed by 12 listeners. Then, to anticipate future BCI applications, we further assessed the real-time control of the synthesizer by both the reference speaker and new speakers, in a closed-loop paradigm using EMA data recorded in real time. A short calibration period was used to compensate for differences in sensor positions and articulatory differences between new speakers and the reference speaker. We found that real-time synthesis of vowels and consonants was possible with good intelligibility. In conclusion, these results open to future speech BCI applications using such articulatory-based speech synthesizer

    Tongue Tracking in Ultrasound Images using EigenTongue Decomposition and Artificial Neural Networks

    No full text
    International audienceThis paper describes a machine learning approach for extracting automatically the tongue contour in ultrasound images. This method is developed in the context of visual articulatory biofeedback for speech therapy. The goal is to provide a speaker with an intuitive visualization of his/her tongue movement, in real-time, and with minimum human intervention. Contrary to most widely used techniques based on active contours, the proposed method aims at exploiting the information of all image pixels to infer the tongue contour. For that purpose, a compact representation of each image is extracted using a PCA-based decomposition technique (named EigenTongue). Artificial neural networks are then used to convert the extracted visual features into control parameters of a PCA-based tongue contour model. The proposed method is evaluated on 9 speakers, using data recorded with the ultrasound probe hold manually (as in the targeted application). Speaker-dependent experiments demonstrated the effectiveness of the proposed method (with an average error of ~1.3 mm when training from 80 manually annotated images), even when the tongue contour is poorly imaged. The performance was significantly lower in speaker-independent experiments (i.e. when estimating contours on an unknown speaker), likely due to anatomical differences across speakers

    Key considerations in designing a speech brain-computer interface

    No full text
    International audienceRestoring communication in case of aphasia is a key challenge for neurotechnologies. To this end, brain-computer strategies can be envisioned to allow artificial speech synthesis from the continuous decoding of neural signals underlying speech imagination. Such speech brain-computer interfaces do not exist yet and their design should consider three key choices that need to be made: the choice of appropriate brain regions to record neural activity from, the choice of an appropriate recording technique, and the choice of a neural decoding scheme in association with an appropriate speech synthesis method. These key considerations are discussed here in light of (1) the current understanding of the functional neuroanatomy of cortical areas underlying overt and covert speech production, (2) the available literature making use of a variety of brain recording techniques to better characterize and address the challenge of decoding cor-tical speech signals, and (3) the different speech synthesis approaches that can be considered depending on the level of speech representation (phonetic, acoustic or articulatory) envisioned to be decoded at the core of a speech BCI paradigm

    Robust Articulatory Speech Synthesis using Deep Neural Networks for BCI Applications

    No full text
    International audienceBrain-Computer Interfaces (BCIs) usually propose typing strategies to restore communication for paralyzed and aphasic people. A more natural way would be to use speech BCI directly controlling a speech synthesizer. Toward this goal, a prerequisite is the development a synthesizer that should i) produce intelligible speech, ii) run in real time, iii) depend on as few parameters as possible, and iv) be robust to error fluctuations on the control parameters. In this context, we describe here an articulatory-to-acoustic mapping approach based on deep neural network (DNN) trained on electromagnetic articulography (EMA) data recorded synchronously with produced speech sounds. On this corpus, the DNN-based model provided a speech synthesis quality (as assessed by automatic speech recognition and behavioral testing) comparable to a state-of-the-art Gaussian mixture model (GMM), yet showing higher robustness when noise was added to the EMA coordinates. Moreover, to envision BCI applications, this robustness was also assessed when the space covered by the 12 original articulatory parameters was reduced to 7 parameters using deep auto-encoders (DAE). Given that this method can be implemented in real time, DNN-based articulatory speech synthesis seems a good candidate for speech BCI applications
    corecore