191 research outputs found
Hierarchical Deep Feature Learning For Decoding Imagined Speech From EEG
We propose a mixed deep neural network strategy, incorporating parallel
combination of Convolutional (CNN) and Recurrent Neural Networks (RNN),
cascaded with deep autoencoders and fully connected layers towards automatic
identification of imagined speech from EEG. Instead of utilizing raw EEG
channel data, we compute the joint variability of the channels in the form of a
covariance matrix that provide spatio-temporal representations of EEG. The
networks are trained hierarchically and the extracted features are passed onto
the next network hierarchy until the final classification. Using a publicly
available EEG based speech imagery database we demonstrate around 23.45%
improvement of accuracy over the baseline method. Our approach demonstrates the
promise of a mixed DNN approach for complex spatial-temporal classification
problems.Comment: Accepted in AAAI 2019 under Student Abstract and Poster Progra
Towards Automatic Speech Identification from Vocal Tract Shape Dynamics in Real-time MRI
Vocal tract configurations play a vital role in generating distinguishable
speech sounds, by modulating the airflow and creating different resonant
cavities in speech production. They contain abundant information that can be
utilized to better understand the underlying speech production mechanism. As a
step towards automatic mapping of vocal tract shape geometry to acoustics, this
paper employs effective video action recognition techniques, like Long-term
Recurrent Convolutional Networks (LRCN) models, to identify different
vowel-consonant-vowel (VCV) sequences from dynamic shaping of the vocal tract.
Such a model typically combines a CNN based deep hierarchical visual feature
extractor with Recurrent Networks, that ideally makes the network
spatio-temporally deep enough to learn the sequential dynamics of a short video
clip for video classification tasks. We use a database consisting of 2D
real-time MRI of vocal tract shaping during VCV utterances by 17 speakers. The
comparative performances of this class of algorithms under various parameter
settings and for various classification tasks are discussed. Interestingly, the
results show a marked difference in the model performance in the context of
speech classification with respect to generic sequence or video classification
tasks.Comment: To appear in the INTERSPEECH 2018 Proceeding
Evaluation of Background Subtraction Algorithms with Post-processing
Processing a video stream to segment foreground objects from the background is a critical first step in many computer vision applications. Background subtraction (BGS) is a commonly used technique for achieving this segmentation. The popularity of BGS largely comes from its computational efficiency, which allows applications such as humancomputer interaction, video surveillance, and traffic monitoring to meet their real-time goals. Numerous BGS algorithms and a number of postprocessing techniques that aim to improve the results of these algorithms have been proposed. In this paper, we evaluate several popular, state-of-the-art BGS algorithms and examine how post-processing techniques affect their performance. Our experimental results demonstrate that post-processing techniques can significantly improve the foreground segmentation masks produced by a BGS algorithm. We provide recommendations for achieving robust foreground segmentation based on the lessons learned performing this comparative study. 1
An extended two-dimensional vocal tract model for fast acoustic simulation of single-axis symmetric three-dimensional tubes
The simulation of two-dimensional (2D) wave propagation is an affordable
computational task and its use can potentially improve time performance in
vocal tracts' acoustic analysis. Several models have been designed that rely on
2D wave solvers and include 2D representations of three-dimensional (3D) vocal
tract-like geometries. However, until now, only the acoustics of straight 3D
tubes with circular cross-sections have been successfully replicated with this
approach. Furthermore, the simulation of the resulting 2D shapes requires
extremely high spatio-temporal resolutions, dramatically reducing the speed
boost deriving from the usage of a 2D wave solver. In this paper, we introduce
an in-progress novel vocal tract model that extends the 2D Finite-Difference
Time-Domain wave solver (2.5D FDTD) by adding tube depth, derived from the area
functions, to the acoustic solver. The model combines the speed of a light 2D
numerical scheme with the ability to natively simulate 3D tubes that are
symmetric in one dimension, hence relaxing previous resolution requirements. An
implementation of the 2.5D FDTD is presented, along with evaluation of its
performance in the case of static vowel modeling. The paper discusses the
current features and limits of the approach, and the potential impact on
computational acoustics applications.Comment: 5 pages, 2 figures, Interspeech 2019 submissio
- …