23,276 research outputs found
Classifying motor imagery in presence of speech
In the near future, brain-computer interface (BCI) applications for non-disabled users will require multimodal interaction and tolerance to dynamic environment. However, this conflicts with the highly sensitive recording techniques used for BCIs, such as electroencephalography (EEG). Advanced machine learning and signal processing techniques are required to decorrelate desired brain signals from the rest. This paper proposes a signal processing pipeline and two classification methods suitable for multiclass EEG analysis. The methods were tested in an experiment on separating left/right hand imagery in presence/absence of speech. The analyses showed that the presence of speech during motor imagery did not affect the classification accuracy significantly and regardless of the presence of speech, the proposed methods were able to separate left and right hand imagery with an accuracy of 60%. The best overall accuracy achieved for the 5-class separation of all the tasks was 47% and both proposed methods performed equally well. In addition, the analysis of event-related spectral power changes revealed characteristics related to motor imagery and speech
An Overview of Deep-Learning-Based Audio-Visual Speech Enhancement and Separation
Speech enhancement and speech separation are two related tasks, whose purpose
is to extract either one or more target speech signals, respectively, from a
mixture of sounds generated by several sources. Traditionally, these tasks have
been tackled using signal processing and machine learning techniques applied to
the available acoustic signals. Since the visual aspect of speech is
essentially unaffected by the acoustic environment, visual information from the
target speakers, such as lip movements and facial expressions, has also been
used for speech enhancement and speech separation systems. In order to
efficiently fuse acoustic and visual information, researchers have exploited
the flexibility of data-driven approaches, specifically deep learning,
achieving strong performance. The ceaseless proposal of a large number of
techniques to extract features and fuse multimodal information has highlighted
the need for an overview that comprehensively describes and discusses
audio-visual speech enhancement and separation based on deep learning. In this
paper, we provide a systematic survey of this research topic, focusing on the
main elements that characterise the systems in the literature: acoustic
features; visual features; deep learning methods; fusion techniques; training
targets and objective functions. In addition, we review deep-learning-based
methods for speech reconstruction from silent videos and audio-visual sound
source separation for non-speech signals, since these methods can be more or
less directly applied to audio-visual speech enhancement and separation.
Finally, we survey commonly employed audio-visual speech datasets, given their
central role in the development of data-driven approaches, and evaluation
methods, because they are generally used to compare different systems and
determine their performance
Multimodal music information processing and retrieval: survey and future challenges
Towards improving the performance in various music information processing
tasks, recent studies exploit different modalities able to capture diverse
aspects of music. Such modalities include audio recordings, symbolic music
scores, mid-level representations, motion, and gestural data, video recordings,
editorial or cultural tags, lyrics and album cover arts. This paper critically
reviews the various approaches adopted in Music Information Processing and
Retrieval and highlights how multimodal algorithms can help Music Computing
applications. First, we categorize the related literature based on the
application they address. Subsequently, we analyze existing information fusion
approaches, and we conclude with the set of challenges that Music Information
Retrieval and Sound and Music Computing research communities should focus in
the next years
- …