Search CORE

28,968 research outputs found

Articulatory features for robust visual speech recognition

Author: Saenko Ekaterina, 1976-
Publication venue: Massachusetts Institute of Technology
Publication date: 01/01/2004
Field of study

Thesis (S.M.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2004.Includes bibliographical references (p. 99-105).This thesis explores a novel approach to visual speech modeling. Visual speech, or a sequence of images of the speaker's face, is traditionally viewed as a single stream of contiguous units, each corresponding to a phonetic segment. These units are defined heuristically by mapping several visually similar phonemes to one visual phoneme, sometimes referred to as a viseme. However, experimental evidence shows that phonetic models trained from visual data are not synchronous in time with acoustic phonetic models, indicating that visemes may not be the most natural building blocks of visual speech. Instead, we propose to model the visual signal in terms of the underlying articulatory features. This approach is a natural extension of feature-based modeling of acoustic speech, which has been shown to increase robustness of audio-based speech recognition systems. We start by exploring ways of defining visual articulatory features: first in a data-driven manner, using a large, multi-speaker visual speech corpus, and then in a knowledge-driven manner, using the rules of speech production. Based on these studies, we propose a set of articulatory features, and describe a computational framework for feature-based visual speech recognition. Multiple feature streams are detected in the input image sequence using Support Vector Machines, and then incorporated in a Dynamic Bayesian Network to obtain the final word hypothesis. Preliminary experiments show that our approach increases viseme classification rates in visually noisy conditions, and improves visual word recognition through feature-based context modeling.by Ekaterina Saenko.S.M

DSpace@MIT

Using multiple visual tandem streams in audio-visual speech recognition

Author: Erdogan Hakan
Erdoğan Hakan
Topkaya İbrahim Saygın
Topkaya Ibrahim Saygin
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2011
Field of study

The method which is called the "tandem approach" in speech recognition has been shown to increase performance by using classifier posterior probabilities as observations in a hidden Markov model. We study the effect of using visual tandem features in audio-visual speech recognition using a novel setup which uses multiple classifiers to obtain multiple visual tandem features. We adopt the approach of multi-stream hidden Markov models where visual tandem features from two different classifiers are considered as additional streams in the model. It is shown in our experiments that using multiple visual tandem features improve the recognition accuracy in various noise conditions. In addition, in order to handle asynchrony between audio and visual observations, we employ coupled hidden Markov models and obtain improved performance as compared to the synchronous model

CiteSeerX

Crossref

Sabanci University Research Database

Towards Emotion Recognition: A Persistent Entropy Application

Author: A Geron
A Ortony
A Zomorodian
AS Popova
B Schuller
B Yang
C Cortes
D Ververidis
DM Howard
E Globerson
G Bredon
H Edelsbrunner
J Russell
K Pearson
L Wasserman
M Rucco
N Cristianini
SR Livingstone
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 21/11/2018
Field of study

Emotion recognition and classification is a very active area of research. In this paper, we present a first approach to emotion classification using persistent entropy and support vector machines. A topology-based model is applied to obtain a single real number from each raw signal. These data are used as input of a support vector machine to classify signals into 8 different emotions (calm, happy, sad, angry, fearful, disgust and surprised)

arXiv.org e-Print Archive

Crossref

Towards Emotion Recognition: A Persistent Entropy Application

Author: González Díaz Rocío
Paluzo Hidalgo Eduardo
Quesada Moreno José Francisco
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2019
Field of study

idUS. Depósito de Investigación Universidad de Sevilla