Search CORE

113 research outputs found

Multiple Classifier Systems for the Classification of Audio-Visual Emotional States

Author: B. Schuller
B. Schölkopf
D.W. Robinson
E. Rolls
F. Schwenker
F. Schwenker
F. Zheng
H. Hermansky
H. Hermansky
H. Hermansky
J. Mutch
L. Breiman
L. Devillers
L. Kuncheva
L.R. Rabiner
M. Riesenhuber
M. Schmidt
P. Bayerl
P. Oudeyer
R. Cowie
S. Davis
S. Walter
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2011
Field of study

Abstract. Research activities in the field of human-computer inter-action increasingly addressed the aspect of integrating some type of emotional intelligence. Human emotions are expressed through differ-ent modalities such as speech, facial expressions, hand or body gestures, and therefore the classification of human emotions should be considered as a multimodal pattern recognition problem. The aim of our paper is to investigate multiple classifier systems utilizing audio and visual features to classify human emotional states. For that a variety of features have been derived. From the audio signal the fundamental frequency, LPC-and MFCC coefficients, and RASTA-PLP have been used. In addition to that two types of visual features have been computed, namely form and motion features of intermediate complexity. The numerical evaluation has been performed on the four emotional labels Arousal, Expectancy, Power, Valence as defined in the AVEC data set. As classifier architec-tures multiple classifier systems are applied, these have been proven to be accurate and robust against missing and noisy data.

CiteSeerX

Crossref

Design, development and field evaluation of a Spanish into sign language translation system

Author: A. García
D. Sánchez
DI Fels
E Efthimiou
F Casacuberta
F. Fernández
H Hermansky
J Och
J Wong
J. M. Montero
JB Mariño
JL Gauvain
L. F. D’Haro
R San-Segundo
R San-Segundo
R. Córdoba
R. San-Segundo
S Möller
V. López-Ludeña
V. Sama
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2012
Field of study

This paper describes the design, development and field evaluation of a machine translation system from Spanish to Spanish Sign Language (LSE: Lengua de Signos Española). The developed system focuses on helping Deaf people when they want to renew their Driver’s License. The system is made up of a speech recognizer (for decoding the spoken utterance into a word sequence), a natural language translator (for converting a word sequence into a sequence of signs belonging to the sign language), and a 3D avatar animation module (for playing back the signs). For the natural language translator, three technological approaches have been implemented and evaluated: an example-based strategy, a rule-based translation method and a statistical translator. For the final version, the implemented language translator combines all the alternatives into a hierarchical structure. This paper includes a detailed description of the field evaluation. This evaluation was carried out in the Local Traffic Office in Toledo involving real government employees and Deaf people. The evaluation includes objective measurements from the system and subjective information from questionnaires. The paper details the main problems found and a discussion on how to solve them (some of them specific for LSE)

Crossref

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Archivo Digital UPM

DWT and LPC based feature extraction methods for isolated word recognition

Author: AE Rosenberg
B Kotnik
DS Pallett
F Itakura
H Hermansky
H Hermansky
J Xu
JN Gowdy
K Wang
KP Soman
L Rabiner
M Gupta
M Krishnan
MJF Gales
Navnath S Nehe
NS Nehe
O Farooq
O Farooq
Raghunath S Holambe
S Mallat
SB Davis
SF Boll
Y Hao
Z Tufekci
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref

Microdevices for extensional rheometry of low viscosity elastic liquids : a review

Author: A Bazilevsky
AG Balducci
AG Banpurkar
AM Ardekani
B Berge
C Pipe
C Pipe
CG Hermansky
CJS Petrie
CJS Petrie
CW Macosko
DR Link
E Bänsch
ESG Shaqfeh
F Mugele
F. J. Galindo-Rosales
FR Phelan Jr
FT Trouton
G Beni
GG Fuller
GH McKinley
GH McKinley
GH McKinley
GI Taylor
GM Whitesides
H Münsted
HA Barnes
HCH Bandalusena
HP Babcock
J Husny
J Meissner
J Meissner
J Remmelgas
J Soulages
J Wang
JA Odell
JA Pathak
JE Matta
JH Song
JM Maia
JP Rothstein
JP Rothstein
JS Lee
K Niedzwiedz
K Niedzwiedz
K Nijenhuis
L Campo-Deaño
L Campo-Deaño
LE Rodd
LE Rodd
LE Rodd
M Padmanabhan
M Padmanabhan
M Roche
M Sentmanat
M Tanyeri
M Tanyeri
M. A. Alves
M. S. N. Oliveira
MA Alves
MG Pollack
MK Tan
MSN Oliveira
MSN Oliveira
MSN Oliveira
MSN Oliveira
MSN Oliveira
MSN Oliveira
MSN Oliveira
MSN Oliveira
N Kojic
N Kumari
P Becherer
P Dontula
P Erni
P Guillot
P Guillot
PC Sousa
PE Arratia
PE Arratia
PK Bhattacharjee
R Dylla-Spears
R Sattler
R Zheng
RB Bird
RI Tanner
RJ Poole
RR Lagnado
S Gaudet
S Ríos
SD Hudson
SJ Haward
SJ Haward
SJ Haward
SL Anna
SL Anna
SL Anna
SL Ng
SS Hsieh
T Cubaud
T Funami
T Schweizer
T Sridhar
TM Squires
TM Squires
TT Perkins
W Lee
WC Nelson
WW Schultz
YY Lin
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2013
Field of study

Extensional flows and the underlying stability/instability mechanisms are of extreme relevance to the efficient operation of inkjet printing, coating processes and drug delivery systems, as well as for the generation of micro droplets. The development of an extensional rheometer to characterize the extensional properties of low viscosity fluids has therefore stimulated great interest of researchers, particularly in the last decade. Microfluidics has proven to be an extraordinary working platform and different configurations of potential extensional microrheometers have been proposed. In this review, we present an overview of several successful designs, together with a critical assessment of their capabilities and limitations

Crossref

University of Strathclyde Institutional Repository

Continuous Audio-Visual Speech Recognition

Author: A. Lanitis
A. Q. Summerfield
B. K. P. Horn
B. Moghaddam
B. P. Yuhas
C. Bregler
G. Chollet
H. Fletcher
H. Hermansky
J. B. Allen
J. Luettin
K. P. Green
L. Braida
M. I. Jordan
M. J. Tomlinson
M. S. Gray
N. P. Erber
P. L. Silsbee
R. Cole
T. Coianiz
T. F. Cootes
T. F. Cootes
W. J. Hardcastle
Y. Gong
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 10/03/2006
Field of study

We address the problem of robust lip tracking, visual speech feature extraction, and sensor integration for audio-visual speech recognition applications. An appearance based model of the articulators, which represents linguistically important features, is learned from example images and is used to locate, track, and recover visual speech information. We tackle the problem of joint temporal modelling of the acoustic and visual speech signals by applying Multi-Stream hidden Markov models. This approach allows the use of different temporal topologies and levels of stream integration and hence enables to model temporal dependencies more accurately. The system has been evaluated for a continuously spoken digit recognition task of 37 subjects

Infoscience - École polytechnique fédérale de Lausanne

Crossref

Speech dereverberation for enhancement and recognition using dynamic features constrained deep neural networks and feature adaptation

Author: A Rix
AL Maas
B Li
BDV Veen
C-P Chen
CH Knapp
E Habets
E Habets
F Weninger
GE Hinton
GE Hinton
H Hermansky
H Kuttruff
J Allen
J Li
JL Gauvain
K Lebart
M Delcroix
MJF Gales
O Cappe
OLF III
R Chen
S Fischer
S Furui
S Gannot
S Subramaniam
T Toda
T Yoshioka
TH Falk
TH Li
X Xiao
X Xiao
Y Hu
Y Xu
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2016
Field of study

This paper investigates deep neural networks (DNN) based on nonlinear feature mapping and statistical linear feature adaptation approaches for reducing reverberation in speech signals. In the nonlinear feature mapping approach, DNN is trained from parallel clean/distorted speech corpus to map reverberant and noisy speech coefficients (such as log magnitude spectrum) to the underlying clean speech coefficients. The constraint imposed by dynamic features (i.e., the time derivatives of the speech coefficients) are used to enhance the smoothness of predicted coefficient trajectories in two ways. One is to obtain the enhanced speech coefficients with a least square estimation from the coefficients and dynamic features predicted by DNN. The other is to incorporate the constraint of dynamic features directly into the DNN training process using a sequential cost function. In the linear feature adaptation approach, a sparse linear transform, called cross transform, is used to transform multiple frames of speech coefficients to a new feature space. The transform is estimated to maximize the likelihood of the transformed coefficients given a model of clean speech coefficients. Unlike the DNN approach, no parallel corpus is used and no assumption on distortion types is made. The two approaches are evaluated on the REVERB Challenge 2014 tasks. Both speech enhancement and automatic speech recognition (ASR) results show that the DNN-based mappings significantly reduce the reverberation in speech and improve both speech quality and ASR performance. For the speech enhancement task, the proposed dynamic feature constraint help to improve cepstral distance, frequency-weighted segmental signal-to-noise ratio (SNR), and log likelihood ratio metrics while moderately degrades the speech-to-reverberation modulation energy ratio. In addition, the cross transform feature adaptation improves the ASR performance significantly for clean-condition trained acoustic models.Published versio

Crossref

Springer - Publisher Connector

DR-NTU (Digital Repository of NTU)

ScholarBank@NUS