32,732 research outputs found
Unsupervised decoding of long-term, naturalistic human neural recordings with automated video and audio annotations
Fully automated decoding of human activities and intentions from direct
neural recordings is a tantalizing challenge in brain-computer interfacing.
Most ongoing efforts have focused on training decoders on specific, stereotyped
tasks in laboratory settings. Implementing brain-computer interfaces (BCIs) in
natural settings requires adaptive strategies and scalable algorithms that
require minimal supervision. Here we propose an unsupervised approach to
decoding neural states from human brain recordings acquired in a naturalistic
context. We demonstrate our approach on continuous long-term
electrocorticographic (ECoG) data recorded over many days from the brain
surface of subjects in a hospital room, with simultaneous audio and video
recordings. We first discovered clusters in high-dimensional ECoG recordings
and then annotated coherent clusters using speech and movement labels extracted
automatically from audio and video recordings. To our knowledge, this
represents the first time techniques from computer vision and speech processing
have been used for natural ECoG decoding. Our results show that our
unsupervised approach can discover distinct behaviors from ECoG data, including
moving, speaking and resting. We verify the accuracy of our approach by
comparing to manual annotations. Projecting the discovered cluster centers back
onto the brain, this technique opens the door to automated functional brain
mapping in natural settings
The aceToolbox: low-level audiovisual feature extraction for retrieval and classification
In this paper we present an overview of a software platform
that has been developed within the aceMedia project,
termed the aceToolbox, that provides global and local lowlevel feature extraction from audio-visual content. The toolbox is based on the MPEG-7 eXperimental Model (XM),
with extensions to provide descriptor extraction from arbitrarily shaped image segments, thereby supporting local descriptors reflecting real image content. We describe the architecture of the toolbox as well as providing an overview of the descriptors supported to date. We also briefly describe the segmentation algorithm provided. We then demonstrate the usefulness of the toolbox in the context of two different content processing scenarios: similarity-based retrieval in large collections and scene-level classification of still images
Steady-State movement related potentials for brainâcomputer interfacing
An approach for brain-computer interfacing (BCI) by analysis of steady-state movement related potentials (ssMRPs) produced during rhythmic finger movements is proposed in this paper. The neurological background of ssMRPs is briefly reviewed. Averaged ssMRPs represent the development of a lateralized rhythmic potential, and the energy of the EEG signals at the finger tapping frequency can be used for single-trial ssMRP classification. The proposed ssMRP-based BCI approach is tested using the classic Fisher's linear discriminant classifier. Moreover, the influence of the current source density transform on the performance of BCI system is investigated. The averaged correct classification rates (CCRs) as well as averaged information transfer rates (ITRs) for different sliding time windows are reported. Reliable single-trial classification rates of 88%-100% accuracy are achievable at relatively high ITRs. Furthermore, we have been able to achieve CCRs of up to 93% in classification of the ssMRPs recorded during imagined rhythmic finger movements. The merit of this approach is in the application of rhythmic cues for BCI, the relatively simple recording setup, and straightforward computations that make the real-time implementations plausible
Deep Learning for Audio Signal Processing
Given the recent surge in developments of deep learning, this article
provides a review of the state-of-the-art deep learning techniques for audio
signal processing. Speech, music, and environmental sound processing are
considered side-by-side, in order to point out similarities and differences
between the domains, highlighting general methods, problems, key references,
and potential for cross-fertilization between areas. The dominant feature
representations (in particular, log-mel spectra and raw waveform) and deep
learning models are reviewed, including convolutional neural networks, variants
of the long short-term memory architecture, as well as more audio-specific
neural network models. Subsequently, prominent deep learning application areas
are covered, i.e. audio recognition (automatic speech recognition, music
information retrieval, environmental sound detection, localization and
tracking) and synthesis and transformation (source separation, audio
enhancement, generative models for speech, sound, and music synthesis).
Finally, key issues and future questions regarding deep learning applied to
audio signal processing are identified.Comment: 15 pages, 2 pdf figure
- âŚ