5 research outputs found
Biometric system verification close to "real world" conditions
The final publication is available at Springer via http://dx.doi.org/10.1007/978-3-642-04391-8_31Proceedings of Joint COST 2101 and 2102 International Conference, BioID_MultiComm 2009, Madrid, Spain.In this paper we present an autonomous biometric device developed in the framework of a national project. This system is able to capture speech, hand-geometry, online signature and face, and can open a door when the user is positively verified. Nevertheless the main purpose is to acquire a database without supervision (normal databases are collected in the presence of a supervisor that tells you what to do in front of the device, which is an unrealistic situation). This system will permit us to explain the main differences between what we call "real conditions" as opposed to "laboratory conditions".This work has been supported by FEDER and MEC, TEC2006-13141-C03/TCM, and COST-2102
Anchor model fusion for emotion recognition in speech
Proceedings of Joint COST 2101 and 2102 International Conference, BioID_MultiComm 2009, Madrid (Spain)The final publication is available at Springer via http://dx.doi.org/10.1007/978-3-642-04391-8_7In this work, a novel method for system fusion in emotion recognition for speech is presented. The proposed approach, namely Anchor Model Fusion (AMF), exploits the characteristic behaviour of the scores of a speech utterance among different emotion models, by a mapping to a back-end anchor-model feature space followed by a SVM classifier. Experiments are presented in three different databases: Ahumada III, with speech obtained from real forensic cases; and SUSAS Actual and SUSAS Simulated. Results comparing AMF with a simple sum-fusion scheme after normalization show a significant performance improvement of the proposed technique for two of the three experimental set-ups, without degrading performance in the third one.This work has been financed under project TEC2006-13170-C02-01
Comparative Analysis of Arabic Vowels using Formants and an Automatic Speech Recognition System
Arabic, the world's second most spoken language in terms of number of speakers, has not received much attention from the traditional speech processing research community. This study is specifically concerned with the analysis of vowels in modern standard Arabic dialect. The first and second formant values in these vowels are investigated and the differences and similarities between the vowels explored using consonant-vowels-consonant (CVC) utterances. For this purpose, a Hidden Markov Model (HMM) based recognizer is built to classify the vowels and the performance of the recognizer analyzed to help understand the similarities and dissimilarities between the phonetic features of vowels. The vowels are also analyzed in both time and frequency domains, and the consistent findings of the analysis are expected to enable future Arabic speech processing tasks such as vowel and speech recognition and classification
Towards An Intelligent Fuzzy Based Multimodal Two Stage Speech Enhancement System
This thesis presents a novel two stage multimodal speech enhancement system, making use of both visual and audio information to filter speech, and explores the extension of
this system with the use of fuzzy logic to demonstrate proof of concept for an envisaged autonomous, adaptive, and context aware multimodal system. The design of the proposed cognitively inspired framework is scalable, meaning that it is possible for the techniques used in individual parts of the system to be upgraded and there is scope for the initial framework presented here to be expanded.
In the proposed system, the concept of single modality two stage filtering is extended to include the visual modality. Noisy speech information received by a microphone array is first pre-processed by visually derived Wiener filtering employing the novel use of the Gaussian Mixture Regression (GMR) technique, making use of associated visual speech information, extracted using a state of the art Semi Adaptive Appearance Models (SAAM) based lip tracking approach. This pre-processed speech is then enhanced further by audio only beamforming using a state of the art Transfer Function Generalised Sidelobe Canceller (TFGSC) approach. This results in a system which is designed to function in challenging noisy speech environments (using speech sentences with different speakers from the GRID corpus and a range of noise recordings), and both objective and subjective test results (employing the widely used Perceptual Evaluation of Speech Quality (PESQ) measure, a composite objective measure, and subjective listening tests), showing that this initial system is capable of delivering very encouraging results with regard to filtering speech mixtures in difficult reverberant speech environments.
Some limitations of this initial framework are identified, and the extension of this multimodal system is explored, with the development of a fuzzy logic based framework and a proof of concept demonstration implemented. Results show that this proposed autonomous,adaptive, and context aware multimodal framework is capable of delivering very positive results in difficult noisy speech environments, with cognitively inspired use of audio and visual information, depending on environmental conditions. Finally some concluding remarks
are made along with proposals for future work
Application of CBIR techniques for the purpose of biometric identification based on human gait
Intenzivan razvoj informaciono-komunikacionih tehnologija otvorio je vrata primeni biometrijskih
tehnologija u menadžmentu identiteta. Biometrijski modalitet koji ima veliki potencijal za primenu u
praksi je ljudski hod. Njega odlikuju neinvazivnost i neintruzivnost. Ovakve osobine posebno pogoduju
primeni u uslovima tehnologije prismotre. Zahvaljujući tome, ovaj biometrijski modalitet tokom
prethodnih godina izaziva veliko interesovanje akademske zajednice. Ovo interesovanje rezultiralo je
razvojem velikog broja pristupa za prepoznavanje osoba na osnovu hoda. Uprkos tome, primena
biometrijskih tehnologija zasnovanih na ljudskom hodu u praksi i dalje zaostaje za dobro ustanovljenim
modalitetima poput otiska prsta, lica ili glasa. Glavni razlog je nedostatak odgovarajućeg pristupa koji bi
omogućio stabilnu primenu u realnim uslovima. Cilj ovog rada je predlog novog postupka za
prepoznavanje osoba na osnovu hoda koji bi omogućio razvoj robusnog i pristupačnog biometrijskog
sistema. Inicijalno, urađen je sveobuhvatan pregled oblasti i aktuelnih istraživanja na osnovu čega je
predložen novi postupak. Predloženi postupak se zasniva na ideji da se sekvenca ljudskog hoda može
predstaviti kao jedna nepomična 2D slika. Ovakav postupak omogućio bi da se za potrebe prepoznavanja
primene generičke metode za pretragu slika na osnovu sadržaja. Na ovakav način problem bi bio prenet
iz prostorno-vremenskog domena u prostorni domen, konkretno domen 2D nepomične slike, koji je
poznat i u kome postoji veliki broj dokazanih rešenja. Za potrebe akvizicije, postupak se oslanja na novu
tehnologiju iz oblasti interakcije čovek-računar, Microsoft Kinect. Na osnovu predloženog postupka
razvijen je modularni laboratorijski prototip kao i okruženje za testiranje i evaluaciju. Naučna
zasnovanost i opravdanost predloženog postupka proverena je nizom eksperimenata. Eksperimenti su
organizovani na takav način da ispitaju različite faktore koji tokom primene postupka mogu uticati na
konačne performanse u prepoznavanju. Na osnovu dobijenih rezultata može se zaključiti da predloženi
postupak odlilkuje visok stepen robusnosti kao i visoka preciznost u prepoznavanju...Intense progress of information and communications technology enabled application of biometric
technology in identity management. Human gait, as a biometric modality, has great potential for
practical application. This is due to its noninvasive and nonintrusive nature. Surveillance technology is
especially fertile ground for recognition based on human gait. These facts caused spike in academic
interest for this biometric modality. This in turn resulted in development of large number of different
approaches to human gait recognition. Nevertheless, practical application of biometric technology
based on human gait still trails those well established modalities such as fingerprint, face or voice. Main
reason for this is lacking of such approach that would enable stable use in realistic conditions. Goal of
this paper is to propose a new approach for human gait recognition that would result in robust and
affordable biometric system. Initially, a comprehensive review of research area and existing research
was done that served as a base for the proposition of new approach. This new approach is based on the
idea that human gait sequence can be represented as a single 2D still image. Using images would open
the possibility of applying Content Based Image Retrieval (CBIR) techniques for the purpose of final
recognition. This procedure shifts the problem form spatio-temporal towards spatial domain, specifically
the space of 2D still image that is well researched and familiar. For acquisition purposes approach relies
on new human-computer interaction technology, Microsoft Kinect. As proof of concept, a modular
laboratory prototype was developed as well as environment for testing and evaluation. Foundation of
the proposed approach was tested through a series of experiments. Empirical evaluation was performed
in such a manner to investigate the influence of different contributing factors to system performance.
Based on retrieved results a conclusion is reached that the proposed approach is highly robust and
achieves high recognition rates..