Search CORE

2,626 research outputs found

Multimodal person recognition for human-vehicle interaction

Author: Abut Huseyin
Abut Hüseyin
Ercil Aytul
Erdogan Hakan
Erdoğan Hakan
Erzin Engin
Erçil Aytül
Tekalp A. Murat
Yemez Yucel
Yemez Yücel
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/04/2006
Field of study

Next-generation vehicles will undoubtedly feature biometric person recognition as part of an effort to improve the driving experience. Today's technology prevents such systems from operating satisfactorily under adverse conditions. A proposed framework for achieving person recognition successfully combines different biometric modalities, borne out in two case studies

Sabanci University Research Database

Anti-social behavior detection in audio-visual surveillance systems

Author: Kelly Philip
Kuklyte Jogile
O'Connor Noel E.
Xu Li-Qun
Ó Conaire Ciarán
Publication venue
Publication date: 01/12/2009
Field of study

In this paper we propose a general purpose framework for detection of unusual events. The proposed system is based on the unsupervised method for unusual scene detection in web{cam images that was introduced in [1]. We extend their algorithm to accommodate data from different modalities and introduce the concept of time-space blocks. In addition, we evaluate early and late fusion techniques for our audio-visual data features. The experimental results on 192 hours of data show that data fusion of audio and video outperforms using a single modality

CiteSeerX

Irish Universities

DCU Online Research Access Service

Adaptive Decision Fusion for Audio-Visual Speech Recognition

Author: Cheol Hoon Park
Jong-Seok Lee
Publication venue: 'IntechOpen'
Publication date: 01/11/2008
Field of study

IntechOpen

Crossref

Bimodal Emotion Recognition using Speech and Physiological Changes

Author: Jonghwa Kim
Publication venue: 'IntechOpen'
Publication date: 01/01/2007
Field of study

With exponentially evolving technology it is no exaggeration to say that any interface fo

IntechOpen

CiteSeerX

Crossref

On combining multi-normalization and ancillary measures for the optimal score level fusion of fingerprint and voice biometrics

Author: A Harbi
A Mishra
AA Ross
AK Jain
AP James
D Maltoni
D Reynolds
F Alonso-Fernandez
F Alsaade
F Van der Heijden
J Kim
K Kryszczuk
K Kryszczuk
K Nandakumar
K Nandakumar
KA Toh
L Feng
M Bendris
M Fons
M He
M Tariquzzaman
N Morizet
N Poh
N Poh
N Poh
P Grother
R Bolle
R Rajavel
R Rajavel
S Ribaric
S Sumathi
SK Sahoo
SM Anzar
SM Anzar
SM Anzar
SM Anzar
TW Lewis
W Yang
Y Singh
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref

A Multimodal Sensor Fusion Architecture for Audio-Visual Speech Recognition

Author: Makkook Mustapha
Publication venue: 'University of Waterloo'
Publication date: 01/01/2007
Field of study

A key requirement for developing any innovative system in a computing environment is to integrate a sufficiently friendly interface with the average end user. Accurate design of such a user-centered interface, however, means more than just the ergonomics of the panels and displays. It also requires that designers precisely define what information to use and how, where, and when to use it. Recent advances in user-centered design of computing systems have suggested that multimodal integration can provide different types and levels of intelligence to the user interface. The work of this thesis aims at improving speech recognition-based interfaces by making use of the visual modality conveyed by the movements of the lips. Designing a good visual front end is a major part of this framework. For this purpose, this work derives the optical flow fields for consecutive frames of people speaking. Independent Component Analysis (ICA) is then used to derive basis flow fields. The coefficients of these basis fields comprise the visual features of interest. It is shown that using ICA on optical flow fields yields better classification results than the traditional approaches based on Principal Component Analysis (PCA). In fact, ICA can capture higher order statistics that are needed to understand the motion of the mouth. This is due to the fact that lips movement is complex in its nature, as it involves large image velocities, self occlusion (due to the appearance and disappearance of the teeth) and a lot of non-rigidity. Another issue that is of great interest to audio-visual speech recognition systems designers is the integration (fusion) of the audio and visual information into an automatic speech recognizer. For this purpose, a reliability-driven sensor fusion scheme is developed. A statistical approach is developed to account for the dynamic changes in reliability. This is done in two steps. The first step derives suitable statistical reliability measures for the individual information streams. These measures are based on the dispersion of the N-best hypotheses of the individual stream classifiers. The second step finds an optimal mapping between the reliability measures and the stream weights that maximizes the conditional likelihood. For this purpose, genetic algorithms are used. The addressed issues are challenging problems and are substantial for developing an audio-visual speech recognition framework that can maximize the information gather about the words uttered and minimize the impact of noise

University of Waterloo's Institutional Repository

Audiovisual head orientation estimation with particle filtering in multisensor scenarios

Author: Canton Ferrer Cristian
Casas Pla Josep Ramon
Hernando Pericás Francisco Javier
Pardàs Feliu Montse
Segura Perales Carlos
Publication venue: 'Hindawi Limited'
Publication date: 01/01/2008
Field of study

This article presents a multimodal approach to head pose estimation of individuals in environments equipped with multiple cameras and microphones, such as SmartRooms or automatic video conferencing. Determining the individuals head orientation is the basis for many forms of more sophisticated interactions between humans and technical devices and can also be used for automatic sensor selection (camera, microphone) in communications or video surveillance systems. The use of particle filters as a unified framework for the estimation of the head orientation for both monomodal and multimodal cases is proposed. In video, we estimate head orientation from color information by exploiting spatial redundancy among cameras. Audio information is processed to estimate the direction of the voice produced by a speaker making use of the directivity characteristics of the head radiation pattern. Furthermore, two different particle filter multimodal information fusion schemes for combining the audio and video streams are analyzed in terms of accuracy and robustness. In the first one, fusion is performed at a decision level by combining each monomodal head pose estimation, while the second one uses a joint estimation system combining information at data level. Experimental results conducted over the CLEAR 2006 evaluation database are reported and the comparison of the proposed multimodal head pose estimation algorithms with the reference monomodal approaches proves the effectiveness of the proposed approach.Postprint (published version

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

UPCommons. Portal del coneixement obert de la UPC

3D AUDIO-VISUAL SPEAKER TRACKING WITH AN ADAPTIVE PARTICLE FILTER

Author: Brutti A
Cavallaro A
IEEE
Omologo M
Qian X
Publication venue
Publication date: 21/11/2017
Field of study

reserved4siWe propose an audio-visual fusion algorithm for 3D speaker tracking from a localised multi-modal sensor platform composed of a camera and a small microphone array. After extracting audio-visual cues from individual modalities we fuse them adaptively using their reliability in a particle filter framework. The reliability of the audio signal is measured based on the maximum Global Coherence Field (GCF) peak value at each frame. The visual reliability is based on colour-histogram matching with detection results compared with a reference image in the RGB space. Experiments on the AV16.3 dataset show that the proposed adaptive audio-visual tracker outperforms both the individual modalities and a classical approach with fixed parameters in terms of tracking accuracy.Qian, Xinyuan; Brutti, Alessio; Omologo, Maurizio; Cavallaro, AndreaQian, Xinyuan; Brutti, Alessio; Omologo, Maurizio; Cavallaro, Andre

Crossref

Archivio della ricerca - Fondazione Bruno Kessler

Queen Mary Research Online