78 research outputs found

    Temporal evolution of gamma activity in human cortex during an overt and covert word repetition task

    Get PDF
    Several scientists have proposed different models for cortical processing of speech. Classically, the regions participating in language were thought to be modular with a linear sequence of activations. More recently, modern theoretical models have posited a more hierarchical and distributed interaction of anatomic areas for the various stages of speech processing. Traditional imaging techniques can only define the location or time of cortical activation, which impedes the further evaluation and refinement of these models. In this study, we take advantage of recordings from the surface of the brain [electrocorticography (ECoG)], which can accurately detect the location and timing of cortical activations, to study the time course of ECoG high gamma (HG) modulations during an overt and covert word repetition task for different cortical areas. For overt word production, our results show substantial perisylvian cortical activations early in the perceptual phase of the task that were maintained through word articulation. However, this broad activation is attenuated during the expressive phase of covert word repetition. Across the different repetition tasks, the utilization of the different cortical sites within the perisylvian region varied in the degree of activation dependent on which stimulus was provided (auditory or visual cue) and whether the word was to be spoken or imagined. Taken together, the data support current models of speech that have been previously described with functional imaging. Moreover, this study demonstrates that the broad perisylvian speech network activates early and maintains suprathreshold activation throughout the word repetition task that appears to be modulated by the demands of different conditions

    Characterization and Decoding of Speech Representations From the Electrocorticogram

    Get PDF
    Millions of people worldwide suffer from various neuromuscular disorders such as amyotrophic lateral sclerosis (ALS), brainstem stroke, muscular dystrophy, cerebral palsy, and others, which adversely affect the neural control of muscles or the muscles themselves. The patients who are the most severely affected lose all voluntary muscle control and are completely locked-in, i.e., they are unable to communicate with the outside world in any manner. In the direction of developing neuro-rehabilitation techniques for these patients, several studies have used brain signals related to mental imagery and attention in order to control an external device, a technology known as a brain-computer interface (BCI). Some recent studies have also attempted to decode various aspects of spoken language, imagined language, or perceived speech directly from brain signals. In order to extend research in this direction, this dissertation aims to characterize and decode various speech representations popularly used in speech recognition systems directly from brain activity, specifically the electrocorticogram (ECoG). The speech representations studied in this dissertation range from simple features such as the speech power and the fundamental frequency (pitch), to complex representations such as the linear prediction coding and mel frequency cepstral coefficients. These decoded speech representations may eventually be used to enhance existing speech recognition systems or to reconstruct intended or imagined speech directly from brain activity. This research will ultimately pave the way for an ECoG-based neural speech prosthesis, which will offer a more natural communication channel for individuals who have lost the ability to speak normally

    Keyword Spotting Using Human Electrocorticographic Recordings

    Get PDF
    Neural keyword spotting could form the basis of a speech brain-computer-interface for menu-navigation if it can be done with low latency and high specificity comparable to the “wake-word” functionality of modern voice-activated AI assistant technologies. This study investigated neural keyword spotting using motor representations of speech via invasively-recorded electrocorticographic signals as a proof-of-concept. Neural matched filters were created from monosyllabic consonant-vowel utterances: one keyword utterance, and 11 similar non-keyword utterances. These filters were used in an analog to the acoustic keyword spotting problem, applied for the first time to neural data. The filter templates were cross-correlated with the neural signal, capturing temporal dynamics of neural activation across cortical sites. Neural vocal activity detection (VAD) was used to identify utterance times and a discriminative classifier was used to determine if these utterances were the keyword or non-keyword speech. Model performance appeared to be highly related to electrode placement and spatial density. Vowel height (/a/ vs /i/) was poorly discriminated in recordings from sensorimotor cortex, but was highly discriminable using neural features from superior temporal gyrus during self-monitoring. The best performing neural keyword detection (5 keyword detections with two false-positives across 60 utterances) and neural VAD (100% sensitivity, ~1 false detection per 10 utterances) came from high-density (2 mm electrode diameter and 5 mm pitch) recordings from ventral sensorimotor cortex, suggesting the spatial fidelity and extent of high-density ECoG arrays may be sufficient for the purpose of speech brain-computer-interfaces

    Neurolinguistics Research Advancing Development of a Direct-Speech Brain-Computer Interface

    Get PDF
    A direct-speech brain-computer interface (DS-BCI) acquires neural signals corresponding to imagined speech, then processes and decodes these signals to produce a linguistic output in the form of phonemes, words, or sentences. Recent research has shown the potential of neurolinguistics to enhance decoding approaches to imagined speech with the inclusion of semantics and phonology in experimental procedures. As neurolinguistics research findings are beginning to be incorporated within the scope of DS-BCI research, it is our view that a thorough understanding of imagined speech, and its relationship with overt speech, must be considered an integral feature of research in this field. With a focus on imagined speech, we provide a review of the most important neurolinguistics research informing the field of DS-BCI and suggest how this research may be utilized to improve current experimental protocols and decoding techniques. Our review of the literature supports a cross-disciplinary approach to DS-BCI research, in which neurolinguistics concepts and methods are utilized to aid development of a naturalistic mode of communication. : Cognitive Neuroscience; Computer Science; Hardware Interface Subject Areas: Cognitive Neuroscience, Computer Science, Hardware Interfac

    Brain-to-text: Decoding spoken phrases from phone representations in the brain

    Get PDF
    It has long been speculated whether communication between humans and machines based on natural speech related cortical activity is possible. Over the past decade, studies have suggested that it is feasible to recognize isolated aspects of speech from neural signals, such as auditory features, phones or one of a few isolated words. However, until now it remained an unsolved challenge to decode continuously spoken speech from the neural substrate associated with speech and language processing. Here, we show for the first time that continuously spoken speech can be decoded into the expressed words from intracranial electrocorticographic (ECoG) recordings. Specifically, we implemented a system, which we call Brain-To-Text that models single phones, employs techniques from automatic speech recognition (ASR), and thereby transforms brain activity while speaking into the corresponding textual representation. Our results demonstrate that our system can achieve word error rates as low as 25% and phone error rates below 50%. Additionally, our approach contributes to the current understanding of the neural basis of continuous speech production by identifying those cortical regions that hold substantial information about individual phones. In conclusion, the Brain-To-Text system described in this paper represents an important step toward human-machine communication based on imagined speech

    Intra-Cranial Recordings of Brain Activity During Language Production

    Get PDF
    Recent findings in the neurophysiology of language production have provided a detailed description of the brain network underlying this behavior, as well as some indications about the timing of operations. Despite their invaluable utility, these data generally suffer from limitations either in terms of temporal resolution, or in terms of spatial localization. In addition, studying the neural basis of speech is complicated by the presence of articulation artifacts such as electro-myographic activity that interferes with the neural signal. These difficulties are virtually absent in a powerful albeit much less frequent methodology, namely the recording of intra-cranial brain activity (intra-cranial electroencephalography). Such recordings are only possible under very specific clinical circumstances requiring functional mapping before brain surgery, most notably in patients that suffer from pharmaco-resistant epilepsy. Here we review the research conducted with this methodology in the field of language production, with explicit consideration of its advantages and drawbacks. The available evidence is shown to be diverse, both in terms of the tasks and the cognitive processes tested and in terms of the brain localizations being studied. Still, the review provides valuable information for characterizing the dynamics of the neural events occurring in the language production network. Following modality specific activities (in auditory or visual cortices), there is a convergence of activity in superior temporal sulcus, which is a plausible neural correlate of phonological encoding processes. Later, between 500 and 800 ms, inferior frontal gyrus (around Broca’s area) is involved. Peri-rolandic areas are recruited in the two modalities relatively early (200–500 ms window), suggesting a very early involvement of (pre-) motor processes. We discuss how some of these findings may be at odds with conclusions drawn from available meta-analysis of language production studies

    Characterization of Language Cortex Activity During Speech Production and Perception

    Get PDF
    Millions of people around the world suffer from severe neuromuscular disorders such as spinal cord injury, cerebral palsy, amyotrophic lateral sclerosis (ALS), and others. Many of these individuals cannot perform daily tasks without assistance and depend on caregivers, which adversely impacts their quality of life. A Brain-Computer Interface (BCI) is technology that aims to give these people the ability to interact with their environment and communicate with the outside world. Many recent studies have attempted to decode spoken and imagined speech directly from brain signals toward the development of a natural-speech BCI. However, the current progress has not reached practical application. An approach to improve the performance of this technology is to better understand the underlying speech processes in the brain for further optimization of existing models. In order to extend research in this direction, this thesis aims to characterize and decode the auditory and articulatory features from the motor cortex using the electrocorticogram (ECoG). Consonants were chosen as auditory representations, and both places of articulation and manners of articulation were chosen as articulatory representations. The auditory and articulatory representations were decoded at different time lags with respect to the speech onset to determine optimal temporal decoding parameters. In addition, this work explores the role of the temporal lobe during speech production directly from ECoG signals. A novel decoding model using temporal lobe activity was developed to predict a spectral representation of the speech envelope during speech production. This new knowledge may be used to enhance existing speech-based BCI systems, which will offer a more natural communication modality. In addition, the work contributes to the field of speech neurophysiology by providing a better understanding of speech processes in the brain

    Understanding and Decoding Imagined Speech using Electrocorticographic Recordings in Humans

    Get PDF
    Certain brain disorders, resulting from brainstem infarcts, traumatic brain injury, stroke and amyotrophic lateral sclerosis, limit verbal communication despite the patient being fully aware. People that cannot communicate due to neurological disorders would benefit from a system that can infer internal speech directly from brain signals. Investigating how the human cortex encodes imagined speech remains a difficult challenge, due to the lack of behavioral and observable measures. As a consequence, the fine temporal properties of speech cannot be synchronized precisely with brain signals during internal subjective experiences, like imagined speech. This thesis aims at understanding and decoding the neural correlates of imagined speech (also called internal speech or covert speech), for targeting speech neuroprostheses. In this exploratory work, various imagined speech features, such as acoustic sound features, phonetic representations, and individual words were investigated and decoded from electrocorticographic signals recorded in epileptic patients in three different studies. This recording technique provides high spatiotemporal resolution, via electrodes placed beneath the skull, but without penetrating the cortex In the first study, we reconstructed continuous spectrotemporal acoustic features from brain signals recorded during imagined speech using cross-condition linear regression. Using this technique, we showed that significant acoustic features of imagined speech could be reconstructed in seven patients. In the second study, we decoded continuous phoneme sequences from brain signals recorded during imagined speech using hidden Markov models. This technique allowed incorporating a language model that defined phoneme transitions probabilities. In this preliminary study, decoding accuracy was significant across eight phonemes in one patients. In the third study, we classified individual words from brain signals recorded during an imagined speech word repetition task, using support-vector machines. To account for temporal irregularities during speech production, we introduced a non-linear time alignment into the classification framework. Classification accuracy was significant across five patients. In order to compare speech representations across conditions and integrate imagined speech into the general speech network, we investigated imagined speech in parallel with overt speech production and/or speech perception. Results shared across the three studies showed partial overlapping between imagined speech and speech perception/production in speech areas, such as superior temporal lobe, anterior frontal gyrus and sensorimotor cortex. In an attempt to understanding higher-level cognitive processing of auditory processes, we also investigated the neural encoding of acoustic features during music imagery using linear regression. Despite this study was not directly related to speech representations, it provided a unique opportunity to quantitatively study features of inner subjective experiences, similar to speech imagery. These studies demonstrated the potential of using predictive models for basic decoding of speech features. Despite low performance, results show the feasibility for direct decoding of natural speech. In this respect, we highlighted numerous challenges that were encountered, and suggested new avenues to improve performances

    Direct Classification of All American English Phonemes Using Signals From Functional Speech Motor Cortex

    Get PDF
    Although brain-computer interfaces (BCIs) can be used in several different ways to restore communication, communicative BCI has not approached the rate or efficiency of natural human speech. Electrocorticography (ECoG) has precise spatiotemporal resolution that enables recording of brain activity distributed over a wide area of cortex, such as during speech production. In this study, we investigated words that span the entire set of phonemes in the General American accent using ECoG with 4 subjects. We classified phonemes with up to 36% accuracy when classifying all phonemes and up to 63% accuracy for a single phoneme. Further, misclassified phonemes follow articulation organization described in phonology literature, aiding classification of whole words. Precise temporal alignment to phoneme onset was crucial for classification success. We identified specific spatiotemporal features that aid classification, which could guide future applications. Word identification was equivalent to information transfer rates as high as 3.0 bits/s (33.6 words min), supporting pursuit of speech articulation for BCI control

    Leveraging Spatiotemporal Relationships of High-frequency Activation in Human Electrocorticographic Recordings for Speech Brain-Computer-Interface

    Get PDF
    Speech production is one of the most intricate yet natural human behaviors and is most keenly appreciated when it becomes difficult or impossible; as is the case for patients suffering from locked-in syndrome. Burgeoning understanding of the various cortical representations of language has brought into question the viability of a speech neuroprosthesis using implanted electrodes. The temporal resolution of intracranial electrophysiological recordings, frequently billed as a great asset of electrocorticography (ECoG), has actually been a hindrance as speech decoders have struggled to take advantage of this timing information. There have been few demonstrations of how well a speech neuroprosthesis will realistically generalize across contexts when constructed using causal feature extraction and language models that can be applied and adapted in real-time. The research detailed in this dissertation aims primarily to characterize the spatiotemporal relationships of high frequency activity across ECoG arrays during word production. Once identified, these relationships map to motor and semantic representations of speech through the use of algorithms and classifiers that rapidly quantify these relationships in single-trials. The primary hypothesis put forward by this dissertation is that the onset, duration and temporal profile of high frequency activity in ECoG recordings is a useful feature for speech decoding. These features have rarely been used in state-of-the-art speech decoders, which tend to produce output from instantaneous high frequency power across cortical sites, or rely upon precise behavioral time-locking to take advantage of high frequency activity at several time-points relative to behavioral onset times. This hypothesis was examined in three separate studies. First, software was created that rapidly characterizes spatiotemporal relationships of neural features. Second, semantic representations of speech were examined using these spatiotemporal features. Finally, utterances were discriminated in single-trials with low latency and high accuracy using spatiotemporal matched filters in a neural keyword-spotting paradigm. Outcomes from this dissertation inform implant placement for a human speech prosthesis and provide the scientific and methodological basis to motivate further research of an implant specifically for speech-based brain-computer-interfaces
    corecore