1,814 research outputs found

    Unsupervised decoding of long-term, naturalistic human neural recordings with automated video and audio annotations

    Get PDF
    Fully automated decoding of human activities and intentions from direct neural recordings is a tantalizing challenge in brain-computer interfacing. Most ongoing efforts have focused on training decoders on specific, stereotyped tasks in laboratory settings. Implementing brain-computer interfaces (BCIs) in natural settings requires adaptive strategies and scalable algorithms that require minimal supervision. Here we propose an unsupervised approach to decoding neural states from human brain recordings acquired in a naturalistic context. We demonstrate our approach on continuous long-term electrocorticographic (ECoG) data recorded over many days from the brain surface of subjects in a hospital room, with simultaneous audio and video recordings. We first discovered clusters in high-dimensional ECoG recordings and then annotated coherent clusters using speech and movement labels extracted automatically from audio and video recordings. To our knowledge, this represents the first time techniques from computer vision and speech processing have been used for natural ECoG decoding. Our results show that our unsupervised approach can discover distinct behaviors from ECoG data, including moving, speaking and resting. We verify the accuracy of our approach by comparing to manual annotations. Projecting the discovered cluster centers back onto the brain, this technique opens the door to automated functional brain mapping in natural settings

    AJILE Movement Prediction: Multimodal Deep Learning for Natural Human Neural Recordings and Video

    Full text link
    Developing useful interfaces between brains and machines is a grand challenge of neuroengineering. An effective interface has the capacity to not only interpret neural signals, but predict the intentions of the human to perform an action in the near future; prediction is made even more challenging outside well-controlled laboratory experiments. This paper describes our approach to detect and to predict natural human arm movements in the future, a key challenge in brain computer interfacing that has never before been attempted. We introduce the novel Annotated Joints in Long-term ECoG (AJILE) dataset; AJILE includes automatically annotated poses of 7 upper body joints for four human subjects over 670 total hours (more than 72 million frames), along with the corresponding simultaneously acquired intracranial neural recordings. The size and scope of AJILE greatly exceeds all previous datasets with movements and electrocorticography (ECoG), making it possible to take a deep learning approach to movement prediction. We propose a multimodal model that combines deep convolutional neural networks (CNN) with long short-term memory (LSTM) blocks, leveraging both ECoG and video modalities. We demonstrate that our models are able to detect movements and predict future movements up to 800 msec before movement initiation. Further, our multimodal movement prediction models exhibit resilience to simulated ablation of input neural signals. We believe a multimodal approach to natural neural decoding that takes context into account is critical in advancing bioelectronic technologies and human neuroscience

    Investigating the Neural Basis of Audiovisual Speech Perception with Intracranial Recordings in Humans

    Get PDF
    Speech is inherently multisensory, containing auditory information from the voice and visual information from the mouth movements of the talker. Hearing the voice is usually sufficient to understand speech, however in noisy environments or when audition is impaired due to aging or disabilities, seeing mouth movements greatly improves speech perception. Although behavioral studies have well established this perceptual benefit, it is still not clear how the brain processes visual information from mouth movements to improve speech perception. To clarify this issue, I studied the neural activity recorded from the brain surfaces of human subjects using intracranial electrodes, a technique known as electrocorticography (ECoG). First, I studied responses to noisy speech in the auditory cortex, specifically in the superior temporal gyrus (STG). Previous studies identified the anterior parts of the STG as unisensory, responding only to auditory stimulus. On the other hand, posterior parts of the STG are known to be multisensory, responding to both auditory and visual stimuli, which makes it a key region for audiovisual speech perception. I examined how these different parts of the STG respond to clear versus noisy speech. I found that noisy speech decreased the amplitude and increased the across-trial variability of the response in the anterior STG. However, possibly due to its multisensory composition, posterior STG was not as sensitive to auditory noise as the anterior STG and responded similarly to clear and noisy speech. I also found that these two response patterns in the STG were separated by a sharp boundary demarcated by the posterior-most portion of the Heschl’s gyrus. Second, I studied responses to silent speech in the visual cortex. Previous studies demonstrated that visual cortex shows response enhancement when the auditory component of speech is noisy or absent, however it was not clear which regions of the visual cortex specifically show this response enhancement and whether this response enhancement is a result of top-down modulation from a higher region. To test this, I first mapped the receptive fields of different regions in the visual cortex and then measured their responses to visual (silent) and audiovisual speech stimuli. I found that visual regions that have central receptive fields show greater response enhancement to visual speech, possibly because these regions receive more visual information from mouth movements. I found similar response enhancement to visual speech in frontal cortex, specifically in the inferior frontal gyrus, premotor and dorsolateral prefrontal cortices, which have been implicated in speech reading in previous studies. I showed that these frontal regions display strong functional connectivity with visual regions that have central receptive fields during speech perception

    Brain-Switches for Asynchronous Brain−Computer Interfaces: A Systematic Review

    Get PDF
    A brain–computer interface (BCI) has been extensively studied to develop a novel communication system for disabled people using their brain activities. An asynchronous BCI system is more realistic and practical than a synchronous BCI system, in that, BCI commands can be generated whenever the user wants. However, the relatively low performance of an asynchronous BCI system is problematic because redundant BCI commands are required to correct false-positive operations. To significantly reduce the number of false-positive operations of an asynchronous BCI system, a two-step approach has been proposed using a brain-switch that first determines whether the user wants to use an asynchronous BCI system before the operation of the asynchronous BCI system. This study presents a systematic review of the state-of-the-art brain-switch techniques and future research directions. To this end, we reviewed brain-switch research articles published from 2000 to 2019 in terms of their (a) neuroimaging modality, (b) paradigm, (c) operation algorithm, and (d) performance

    Neural population coding: combining insights from microscopic and mass signals

    Get PDF
    Behavior relies on the distributed and coordinated activity of neural populations. Population activity can be measured using multi-neuron recordings and neuroimaging. Neural recordings reveal how the heterogeneity, sparseness, timing, and correlation of population activity shape information processing in local networks, whereas neuroimaging shows how long-range coupling and brain states impact on local activity and perception. To obtain an integrated perspective on neural information processing we need to combine knowledge from both levels of investigation. We review recent progress of how neural recordings, neuroimaging, and computational approaches begin to elucidate how interactions between local neural population activity and large-scale dynamics shape the structure and coding capacity of local information representations, make them state-dependent, and control distributed populations that collectively shape behavior

    Speech Processes for Brain-Computer Interfaces

    Get PDF
    Speech interfaces have become widely used and are integrated in many applications and devices. However, speech interfaces require the user to produce intelligible speech, which might be hindered by loud environments, concern to bother bystanders or the general in- ability to produce speech due to disabilities. Decoding a usera s imagined speech instead of actual speech would solve this problem. Such a Brain-Computer Interface (BCI) based on imagined speech would enable fast and natural communication without the need to actually speak out loud. These interfaces could provide a voice to otherwise mute people. This dissertation investigates BCIs based on speech processes using functional Near In- frared Spectroscopy (fNIRS) and Electrocorticography (ECoG), two brain activity imaging modalities on opposing ends of an invasiveness scale. Brain activity data have low signal- to-noise ratio and complex spatio-temporal and spectral coherence. To analyze these data, techniques from the areas of machine learning, neuroscience and Automatic Speech Recog- nition are combined in this dissertation to facilitate robust classification of detailed speech processes while simultaneously illustrating the underlying neural processes. fNIRS is an imaging modality based on cerebral blood flow. It only requires affordable hardware and can be set up within minutes in a day-to-day environment. Therefore, it is ideally suited for convenient user interfaces. However, the hemodynamic processes measured by fNIRS are slow in nature and the technology therefore offers poor temporal resolution. We investigate speech in fNIRS and demonstrate classification of speech processes for BCIs based on fNIRS. ECoG provides ideal signal properties by invasively measuring electrical potentials artifact- free directly on the brain surface. High spatial resolution and temporal resolution down to millisecond sampling provide localized information with accurate enough timing to capture the fast process underlying speech production. This dissertation presents the Brain-to- Text system, which harnesses automatic speech recognition technology to decode a textual representation of continuous speech from ECoG. This could allow to compose messages or to issue commands through a BCI. While the decoding of a textual representation is unparalleled for device control and typing, direct communication is even more natural if the full expressive power of speech - including emphasis and prosody - could be provided. For this purpose, a second system is presented, which directly synthesizes neural signals into audible speech, which could enable conversation with friends and family through a BCI. Up to now, both systems, the Brain-to-Text and synthesis system are operating on audibly produced speech. To bridge the gap to the final frontier of neural prostheses based on imagined speech processes, we investigate the differences between audibly produced and imagined speech and present first results towards BCI from imagined speech processes. This dissertation demonstrates the usage of speech processes as a paradigm for BCI for the first time. Speech processes offer a fast and natural interaction paradigm which will help patients and healthy users alike to communicate with computers and with friends and family efficiently through BCIs
    • …
    corecore