Search CORE

1,643 research outputs found

Multimodal Speech Processing Using Asynchronous Hidden Markov Models

Author: Bengio Samy
Publication venue: 'Elsevier BV'
Publication date: 10/03/2006
Field of study

Infoscience - École polytechnique fédérale de Lausanne

Multimodal speech processing using asynchronous Hidden Markov Models

Author: Bengio
Bishop
Dempster
Dupont
Durbin
Rabiner
Reynolds
Samy Bengio
Sumby
Summerfield
Viterbi
Publication venue: 'Elsevier BV'
Publication date
Field of study

Crossref

Using multiple visual tandem streams in audio-visual speech recognition

Author: Erdogan Hakan
Erdoğan Hakan
Topkaya İbrahim Saygın
Topkaya Ibrahim Saygin
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2011
Field of study

The method which is called the "tandem approach" in speech recognition has been shown to increase performance by using classifier posterior probabilities as observations in a hidden Markov model. We study the effect of using visual tandem features in audio-visual speech recognition using a novel setup which uses multiple classifiers to obtain multiple visual tandem features. We adopt the approach of multi-stream hidden Markov models where visual tandem features from two different classifiers are considered as additional streams in the model. It is shown in our experiments that using multiple visual tandem features improve the recognition accuracy in various noise conditions. In addition, in order to handle asynchrony between audio and visual observations, we employ coupled hidden Markov models and obtain improved performance as compared to the synchronous model

CiteSeerX

Crossref

Sabanci University Research Database

Crossmodal Attentive Skill Learner

Author: How Jonathan P.
Kim Dong-Ki
Omidshafiei Shayegan
Pazis Jason
Publication venue
Publication date: 22/05/2018
Field of study

This paper presents the Crossmodal Attentive Skill Learner (CASL), integrated with the recently-introduced Asynchronous Advantage Option-Critic (A2OC) architecture [Harb et al., 2017] to enable hierarchical reinforcement learning across multiple sensory inputs. We provide concrete examples where the approach not only improves performance in a single task, but accelerates transfer to new tasks. We demonstrate the attention mechanism anticipates and identifies useful latent features, while filtering irrelevant sensor modalities during execution. We modify the Arcade Learning Environment [Bellemare et al., 2013] to support audio queries, and conduct evaluations of crossmodal learning in the Atari 2600 game Amidar. Finally, building on the recent work of Babaeizadeh et al. [2017], we open-source a fast hybrid CPU-GPU implementation of CASL.Comment: International Conference on Autonomous Agents and Multiagent Systems (AAMAS) 2018, NIPS 2017 Deep Reinforcement Learning Symposiu

arXiv.org e-Print Archive

DSpace@MIT

Adaptive Multimodal Fusion by Uncertainty Compensation With Application to Audiovisual Speech Recognition

Author: Athanassios Katsamanis
George Papandreou
Petros Maragos
Vassilis Pitsikalis
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date
Field of study

Crossref

Recommended from our members

Ensemble-Based Human Communication Recognition ; CU-CS-935-02

Author: Barthelmess Paulo
Publication venue: CU Scholar
Publication date: 01/05/2002
Field of study

CU Scholar Institutional Repository