Search CORE

2,740 research outputs found

Cue Integration in Categorical Tasks: Insights from Audio-Visual Speech Perception

Author: A Papoulis
AL Yuille
D Alais
David C. Knill
DC Knill
DC Knill
DC Knill
Denis G. Pelli
DH Klatt
DM Wolpert
DW Massaro
DW Massaro
DW Massaro
DW Massaro
DW Massaro
GE Peterson
H McGurk
J-L Schwartz
J-L Schwartz
JM Hillis
K Sekiyama
K Sekiyama
KP Kording
KP Körding
L Shams
LA Ross
LD Rosenblum
LL Holt
M Clayards
Meghan Clayards
MM Cohen
MO Ernst
MO Ernst
MS Banks
MS Landy
MT Wallace
NH Feldman
NP Erber
PW Battaglia
Q Summerfield
R Campbell
RA Jacobs
RA Jacobs
RE Remez
Richard N. Aslin
RJ van Beers
RN Desjardins
RN Desjardins
T Teinonen
Vikranth Rao Bejjanki
WH Sumby
WH Swanson
WJ Ma
Publication venue: Public Library of Science
Publication date: 01/01/2011
Field of study

Previous cue integration studies have examined continuous perceptual dimensions (e.g., size) and have shown that human cue integration is well described by a normative model in which cues are weighted in proportion to their sensory reliability, as estimated from single-cue performance. However, this normative model may not be applicable to categorical perceptual dimensions (e.g., phonemes). In tasks defined over categorical perceptual dimensions, optimal cue weights should depend not only on the sensory variance affecting the perception of each cue but also on the environmental variance inherent in each task-relevant category. Here, we present a computational and experimental investigation of cue integration in a categorical audio-visual (articulatory) speech perception task. Our results show that human performance during audio-visual phonemic labeling is qualitatively consistent with the behavior of a Bayes-optimal observer. Specifically, we show that the participants in our task are sensitive, on a trial-by-trial basis, to the sensory uncertainty associated with the auditory and visual cues, during phonemic categorization. In addition, we show that while sensory uncertainty is a significant factor in determining cue weights, it is not the only one and participants' performance is consistent with an optimal model in which environmental, within category variability also plays a role in determining cue weights. Furthermore, we show that in our task, the sensory variability affecting the visual modality during cue-combination is not well estimated from single-cue performance, but can be estimated from multi-cue performance. The findings and computational principles described here represent a principled first step towards characterizing the mechanisms underlying human cue integration in categorical tasks

CiteSeerX

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

PubMed Central

Deep Learning for Environmentally Robust Speech Recognition: An Overview of Recent Developments

Author: Geiger Jürgen
Jin Wenyu
Mousa Amr El-Desoky
Pohjalainen Jouni
Schuller Björn
Zhang Zixing
Publication venue
Publication date: 01/01/2018
Field of study

Eliminating the negative effect of non-stationary environmental noise is a long-standing research topic for automatic speech recognition that stills remains an important challenge. Data-driven supervised approaches, including ones based on deep neural networks, have recently emerged as potential alternatives to traditional unsupervised approaches and with sufficient training, can alleviate the shortcomings of the unsupervised methods in various real-life acoustic environments. In this light, we review recently developed, representative deep learning approaches for tackling non-stationary additive and convolutional degradation of speech with the aim of providing guidelines for those involved in the development of environmentally robust speech recognition systems. We separately discuss single- and multi-channel techniques developed for the front-end and back-end of speech recognition systems, as well as joint front-end and back-end training frameworks

arXiv.org e-Print Archive

OPUS Augsburg

Crossref

Robust Audio Localization for Mobile Robots in Industrial Environments

Author: Antoni Grau
Manuel Manzanares
Yolanda Bolea
Publication venue: 'IntechOpen'
Publication date: 11/04/2011
Field of study

IntechOpen

Perceptual Plasticity in Adverse Listening Conditions: Factors Affecting Adaptation to Accented and Noise-Vocoded Speech

Author: Banks Briony
Publication venue
Publication date: 01/08/2016
Field of study

The University of Manchester - Institutional Repository

Fusion for Audio-Visual Laughter Detection

Author: Reuderink B.
Publication venue: Centre for Telematics and Information Technology, University of Twente
Publication date: 01/01/2007
Field of study

Laughter is a highly variable signal, and can express a spectrum of emotions. This makes the automatic detection of laughter a challenging but interesting task. We perform automatic laughter detection using audio-visual data from the AMI Meeting Corpus. Audio-visual laughter detection is performed by combining (fusing) the results of a separate audio and video classifier on the decision level. The video-classifier uses features based on the principal components of 20 tracked facial points, for audio we use the commonly used PLP and RASTA-PLP features. Our results indicate that RASTA-PLP features outperform PLP features for laughter detection in audio. We compared hidden Markov models (HMMs), Gaussian mixture models (GMMs) and support vector machines (SVM) based classifiers, and found that RASTA-PLP combined with a GMM resulted in the best performance for the audio modality. The video features classified using a SVM resulted in the best single-modality performance. Fusion on the decision-level resulted in laughter detection with a significantly better performance than single-modality classification

University of Twente Research Information

Effects of intrinsic and imposed modulation masking on speech perception

Author: Curetti Lorenza Zaira
Publication venue
Publication date: 01/08/2023
Field of study

The University of Manchester - Institutional Repository