38 research outputs found

    Characterization and Decoding of Speech Representations From the Electrocorticogram

    Get PDF
    Millions of people worldwide suffer from various neuromuscular disorders such as amyotrophic lateral sclerosis (ALS), brainstem stroke, muscular dystrophy, cerebral palsy, and others, which adversely affect the neural control of muscles or the muscles themselves. The patients who are the most severely affected lose all voluntary muscle control and are completely locked-in, i.e., they are unable to communicate with the outside world in any manner. In the direction of developing neuro-rehabilitation techniques for these patients, several studies have used brain signals related to mental imagery and attention in order to control an external device, a technology known as a brain-computer interface (BCI). Some recent studies have also attempted to decode various aspects of spoken language, imagined language, or perceived speech directly from brain signals. In order to extend research in this direction, this dissertation aims to characterize and decode various speech representations popularly used in speech recognition systems directly from brain activity, specifically the electrocorticogram (ECoG). The speech representations studied in this dissertation range from simple features such as the speech power and the fundamental frequency (pitch), to complex representations such as the linear prediction coding and mel frequency cepstral coefficients. These decoded speech representations may eventually be used to enhance existing speech recognition systems or to reconstruct intended or imagined speech directly from brain activity. This research will ultimately pave the way for an ECoG-based neural speech prosthesis, which will offer a more natural communication channel for individuals who have lost the ability to speak normally

    Characterization of Language Cortex Activity During Speech Production and Perception

    Get PDF
    Millions of people around the world suffer from severe neuromuscular disorders such as spinal cord injury, cerebral palsy, amyotrophic lateral sclerosis (ALS), and others. Many of these individuals cannot perform daily tasks without assistance and depend on caregivers, which adversely impacts their quality of life. A Brain-Computer Interface (BCI) is technology that aims to give these people the ability to interact with their environment and communicate with the outside world. Many recent studies have attempted to decode spoken and imagined speech directly from brain signals toward the development of a natural-speech BCI. However, the current progress has not reached practical application. An approach to improve the performance of this technology is to better understand the underlying speech processes in the brain for further optimization of existing models. In order to extend research in this direction, this thesis aims to characterize and decode the auditory and articulatory features from the motor cortex using the electrocorticogram (ECoG). Consonants were chosen as auditory representations, and both places of articulation and manners of articulation were chosen as articulatory representations. The auditory and articulatory representations were decoded at different time lags with respect to the speech onset to determine optimal temporal decoding parameters. In addition, this work explores the role of the temporal lobe during speech production directly from ECoG signals. A novel decoding model using temporal lobe activity was developed to predict a spectral representation of the speech envelope during speech production. This new knowledge may be used to enhance existing speech-based BCI systems, which will offer a more natural communication modality. In addition, the work contributes to the field of speech neurophysiology by providing a better understanding of speech processes in the brain

    Speech Processes for Brain-Computer Interfaces

    Get PDF
    Speech interfaces have become widely used and are integrated in many applications and devices. However, speech interfaces require the user to produce intelligible speech, which might be hindered by loud environments, concern to bother bystanders or the general in- ability to produce speech due to disabilities. Decoding a usera s imagined speech instead of actual speech would solve this problem. Such a Brain-Computer Interface (BCI) based on imagined speech would enable fast and natural communication without the need to actually speak out loud. These interfaces could provide a voice to otherwise mute people. This dissertation investigates BCIs based on speech processes using functional Near In- frared Spectroscopy (fNIRS) and Electrocorticography (ECoG), two brain activity imaging modalities on opposing ends of an invasiveness scale. Brain activity data have low signal- to-noise ratio and complex spatio-temporal and spectral coherence. To analyze these data, techniques from the areas of machine learning, neuroscience and Automatic Speech Recog- nition are combined in this dissertation to facilitate robust classification of detailed speech processes while simultaneously illustrating the underlying neural processes. fNIRS is an imaging modality based on cerebral blood flow. It only requires affordable hardware and can be set up within minutes in a day-to-day environment. Therefore, it is ideally suited for convenient user interfaces. However, the hemodynamic processes measured by fNIRS are slow in nature and the technology therefore offers poor temporal resolution. We investigate speech in fNIRS and demonstrate classification of speech processes for BCIs based on fNIRS. ECoG provides ideal signal properties by invasively measuring electrical potentials artifact- free directly on the brain surface. High spatial resolution and temporal resolution down to millisecond sampling provide localized information with accurate enough timing to capture the fast process underlying speech production. This dissertation presents the Brain-to- Text system, which harnesses automatic speech recognition technology to decode a textual representation of continuous speech from ECoG. This could allow to compose messages or to issue commands through a BCI. While the decoding of a textual representation is unparalleled for device control and typing, direct communication is even more natural if the full expressive power of speech - including emphasis and prosody - could be provided. For this purpose, a second system is presented, which directly synthesizes neural signals into audible speech, which could enable conversation with friends and family through a BCI. Up to now, both systems, the Brain-to-Text and synthesis system are operating on audibly produced speech. To bridge the gap to the final frontier of neural prostheses based on imagined speech processes, we investigate the differences between audibly produced and imagined speech and present first results towards BCI from imagined speech processes. This dissertation demonstrates the usage of speech processes as a paradigm for BCI for the first time. Speech processes offer a fast and natural interaction paradigm which will help patients and healthy users alike to communicate with computers and with friends and family efficiently through BCIs

    A Blueprint for Real-Time Functional Mapping via Human Intracranial Recordings

    Get PDF
    International audienceBACKGROUND: The surgical treatment of patients with intractable epilepsy is preceded by a pre-surgical evaluation period during which intracranial EEG recordings are performed to identify the epileptogenic network and provide a functional map of eloquent cerebral areas that need to be spared to minimize the risk of post-operative deficits. A growing body of research based on such invasive recordings indicates that cortical oscillations at various frequencies, especially in the gamma range (40 to 150 Hz), can provide efficient markers of task-related neural network activity. PRINCIPAL FINDINGS: Here we introduce a novel real-time investigation framework for mapping human brain functions based on online visualization of the spectral power of the ongoing intracranial activity. The results obtained with the first two implanted epilepsy patients who used the proposed online system illustrate its feasibility and utility both for clinical applications, as a complementary tool to electrical stimulation for presurgical mapping purposes, and for basic research, as an exploratory tool used to detect correlations between behavior and oscillatory power modulations. Furthermore, our findings suggest a putative role for high gamma oscillations in higher-order auditory processing involved in speech and music perception. CONCLUSION/SIGNIFICANCE: The proposed real-time setup is a promising tool for presurgical mapping, the investigation of functional brain dynamics, and possibly for neurofeedback training and brain computer interfaces

    Electrocorticographic Representations of Segmental Features in Continuous Speech

    Get PDF
    International audienceAcoustic speech output results from coordinated articulation of dozens of muscles, bones and cartilages of the vocal mechanism. While we commonly take the fluency and speed of our speech productions for granted, the neural mechanisms facilitating the requisite muscular control are not completely understood. Previous neuroimaging and electrophysiology studies of speech sensorimotor control has typically concentrated on speech sounds (i.e., phonemes, syllables and words) in isolation; sentence-length investigations have largely been used toinform coincident linguistic processing. In this study, we examined the neural representations of segmental features (place and manner of articulation, and voicing status) in the context of fluent, continuous speech production. We used recordings from the cortical surface electrocorticography (ECoG)) to simultaneously evaluate the spatial topography and temporal dynamics of the neural correlates of speech articulation that may mediate the generation of hypothesized gestural or articulatory scores. We found that the representation of place ofarticulation involved broad networks of brain regions during all phases of speech production: preparation, execution and monitoring. In contrast, manner of articulation and voicing status were dominated by auditory cortical responses after speech had been initiated. These resultsprovide a new insight into the articulatory and auditory processes underlying speech production in terms of their motor requirements and acoustic correlates

    Multi-Modality Assessment Of Language Function

    Get PDF
    The work presented as part of this dissertation represents a multi-modality study of language structure and function. The primary functional modality employed is task-related electrocorticography (ECoG). This is complemented by discussion and evaluation of previously published functional magnetic resonance imaging (fMRI) data. Language-related structure is explored using diffusion-weighted magnetic resonance imaging in conjunction with ECoG data. The scientific questions pursued are broad and include reevaluation of previously proposed theories. We start by taking the first steps in validating our naming-related ECoG approach by comparing our results from a small cohort of patients to the clinical gold-standard technique of electrical brain stimulation. This evaluation begins to address a clinical problem involving the insensitivity of electrical brain stimulation in language mapping of young children. Thus, our patients across all studies are a mixture of children, adolescents, and adults. Combining data presented within this thesis, data from other members of our team, and published data from teams at other institutions, evidence suggests that language-related ECoG mapping is a powerful language mapping tool when it is employed with an appropriate task. The task employed here is a now well-studied auditory descriptive naming task. Language-related ECoG is then utilized to dissect language function mechanistically employing contrast tasks alongside the descriptive naming task. Working memory and language functions of the frontal lobe are dissected and conclusions are drawn to shed light on their degree of overlap and interaction during ongoing language processing. Evidence of secondary auditory processing and language comprehension gained from other modalities is reevaluated. In particular, reverse speech and signal-correlated noise are employed and evaluated as control tasks for non-language-specific auditory function. A discrepancy between language-related ECoG and language-related fMRI is discovered in regards to the use of reverse speech as such a control task. It is found that signal correlated noise may be more reliable in identifying non-language auditory functions of the temporal lobe. Age-old questions of language-related connectivity are explored by combining diffusion-weighted magnetic resonance imaging tractography with language-related ECoG findings to evaluate terminations of the arcuate fasciculus. Results support recent evidence suggesting that the precentral gyrus is an important termination of this language-related white matter pathway. New models modifying century-old, entrenched models are evaluated in light of these findings and proposals for follow-up work that may create further clarity are provided. Finally, the thesis rounds out with a study exploring the effects of focal interictal epileptiform activity upon ongoing language processes; contributing beyond the neuroscience of language to the epilepsy literature, in honor of the patients providing the data for these studies. Our data demonstrates that such localized pathological activity can have clinically imperceptible effects upon language functions, suggesting one possible mechanism toward cognitive deficits frequently reported in such patients
    corecore