Search CORE

23,276 research outputs found

Classifying motor imagery in presence of speech

Author: Gürkök Hayrettin
Poel Mannes
Zwiers Job
Publication venue: IEEE
Publication date: 01/01/2010
Field of study

In the near future, brain-computer interface (BCI) applications for non-disabled users will require multimodal interaction and tolerance to dynamic environment. However, this conflicts with the highly sensitive recording techniques used for BCIs, such as electroencephalography (EEG). Advanced machine learning and signal processing techniques are required to decorrelate desired brain signals from the rest. This paper proposes a signal processing pipeline and two classification methods suitable for multiclass EEG analysis. The methods were tested in an experiment on separating left/right hand imagery in presence/absence of speech. The analyses showed that the presence of speech during motor imagery did not affect the classification accuracy significantly and regardless of the presence of speech, the proposed methods were able to separate left and right hand imagery with an accuracy of 60%. The best overall accuracy achieved for the 5-class separation of all the tasks was 47% and both proposed methods performed equally well. In addition, the analysis of event-related spectral power changes revealed characteristics related to motor imagery and speech

University of Twente Research Information

An Overview of Deep-Learning-Based Audio-Visual Speech Enhancement and Separation

Author: Jensen Jesper
Michelsanti Daniel
Tan Zheng-Hua
Xu Yong
Yu Dong
Yu Meng
Zhang Shi-Xiong
Publication venue
Publication date: 01/01/2021
Field of study

Speech enhancement and speech separation are two related tasks, whose purpose is to extract either one or more target speech signals, respectively, from a mixture of sounds generated by several sources. Traditionally, these tasks have been tackled using signal processing and machine learning techniques applied to the available acoustic signals. Since the visual aspect of speech is essentially unaffected by the acoustic environment, visual information from the target speakers, such as lip movements and facial expressions, has also been used for speech enhancement and speech separation systems. In order to efficiently fuse acoustic and visual information, researchers have exploited the flexibility of data-driven approaches, specifically deep learning, achieving strong performance. The ceaseless proposal of a large number of techniques to extract features and fuse multimodal information has highlighted the need for an overview that comprehensively describes and discusses audio-visual speech enhancement and separation based on deep learning. In this paper, we provide a systematic survey of this research topic, focusing on the main elements that characterise the systems in the literature: acoustic features; visual features; deep learning methods; fusion techniques; training targets and objective functions. In addition, we review deep-learning-based methods for speech reconstruction from silent videos and audio-visual sound source separation for non-speech signals, since these methods can be more or less directly applied to audio-visual speech enhancement and separation. Finally, we survey commonly employed audio-visual speech datasets, given their central role in the development of data-driven approaches, and evaluation methods, because they are generally used to compare different systems and determine their performance

arXiv.org e-Print Archive

VBN

Multimodal music information processing and retrieval: survey and future challenges

Author: Avanzini Federico
Ntalampiras Stavros
Simonetta Federico
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 14/02/2019
Field of study

Towards improving the performance in various music information processing tasks, recent studies exploit different modalities able to capture diverse aspects of music. Such modalities include audio recordings, symbolic music scores, mid-level representations, motion, and gestural data, video recordings, editorial or cultural tags, lyrics and album cover arts. This paper critically reviews the various approaches adopted in Music Information Processing and Retrieval and highlights how multimodal algorithms can help Music Computing applications. First, we categorize the related literature based on the application they address. Subsequently, we analyze existing information fusion approaches, and we conclude with the set of challenges that Music Information Retrieval and Sound and Music Computing research communities should focus in the next years

arXiv.org e-Print Archive

Crossref