Search CORE

747 research outputs found

Automatic Emotion Recognition from Mandarin Speech

Author: Gu Yu
Publication venue: [s.n.]
Publication date: 01/01/2018
Field of study

Tilburg University Repository

Recommended from our members

Evaluation and analysis of hybrid intelligent pattern recognition techniques for speaker identification

Author: Almaadeed Noor
Publication venue: Brunel University School of Engineering and Design PhD Theses
Publication date: 01/01/2014
Field of study

This thesis was submitted for the degree of Doctor of Philosophy and awarded by Brunel University.The rapid momentum of the technology progress in the recent years has led to a tremendous rise in the use of biometric authentication systems. The objective of this research is to investigate the problem of identifying a speaker from its voice regardless of the content (i.e. text-independent), and to design efficient methods of combining face and voice in producing a robust authentication system. A novel approach towards speaker identification is developed using wavelet analysis, and multiple neural networks including Probabilistic Neural Network (PNN), General Regressive Neural Network (GRNN)and Radial Basis Function-Neural Network (RBF NN) with the AND voting scheme. This approach is tested on GRID and VidTIMIT cor-pora and comprehensive test results have been validated with state- of-the-art approaches. The system was found to be competitive and it improved the recognition rate by 15% as compared to the classical Mel-frequency Cepstral Coe±cients (MFCC), and reduced the recognition time by 40% compared to Back Propagation Neural Network (BPNN), Gaussian Mixture Models (GMM) and Principal Component Analysis (PCA). Another novel approach using vowel formant analysis is implemented using Linear Discriminant Analysis (LDA). Vowel formant based speaker identification is best suitable for real-time implementation and requires only a few bytes of information to be stored for each speaker, making it both storage and time efficient. Tested on GRID and Vid-TIMIT, the proposed scheme was found to be 85.05% accurate when Linear Predictive Coding (LPC) is used to extract the vowel formants, which is much higher than the accuracy of BPNN and GMM. Since the proposed scheme does not require any training time other than creating a small database of vowel formants, it is faster as well. Furthermore, an increasing number of speakers makes it di±cult for BPNN and GMM to sustain their accuracy, but the proposed score-based methodology stays almost linear. Finally, a novel audio-visual fusion based identification system is implemented using GMM and MFCC for speaker identi¯cation and PCA for face recognition. The results of speaker identification and face recognition are fused at different levels, namely the feature, score and decision levels. Both the score-level and decision-level (with OR voting) fusions were shown to outperform the feature-level fusion in terms of accuracy and error resilience. The result is in line with the distinct nature of the two modalities which lose themselves when combined at the feature-level. The GRID and VidTIMIT test results validate that the proposed scheme is one of the best candidates for the fusion of face and voice due to its low computational time and high recognition accuracy

Brunel University Research Archive

A Structured Model of Video Reproduces Primary Visual Cortical Organisation

Author: A Hyvärinen
A Yuille
AJ Bell
AL Humphrey
AP Dempster
AS Kayser
BA Olshausen
BA Olshausen
BA Olshausen
BC Skottun
BT Vincent
BW Mel
BY Betsch
C Blakemore
C von der Malsburg
CKI Williams
CM Bishop
D Pollen
D Ringach
DA Ross
DB Chklovskii
DB Grimes
DH Hubel
DJ Field
DJC MacKay
DL Ringach
E Sudderth
E Sudderth
EH Adelson
F Attneave
F Mechler
F Sengpiel
FS Chance
GC DeAngelis
GC DeAngelis
GJ Goodhill
H Attias
H Barlow
HB Barlow
HVB Hirsch
I Biederman
J Cremieux
J Lücke
J Lücke
J Touryan
JB Tenenbaum
JH van Hateren
JH van Hateren
K Friston
KD Miller
KD Miller
Konrad Kording
KP Körding
L Wiskott
L Zhu
L Zhu
M Franzius
M Rehn
Maneesh Sahani
MJ Beal
MJ Wainwright
N Jojic
NC Rust
NV Swindale
O Schwartz
O Schwartz
O Schwartz
P Berkes
P Berkes
P Földiák
Pietro Berkes
PO Hoyer
PZ Marmarelis
R de Ruyter van Steveninck
R Linsker
R Turner
RE Turner
Richard E. Turner
RL De Valois
RM Neal
S Tanaka
SM Stringer
TS Lee
VB Mountcastle
W Einhäuser
X Chen
Y Karklin
Y Karklin
Publication venue: Public Library of Science
Publication date: 01/01/2008
Field of study

The visual system must learn to infer the presence of objects and features in the world from the images it encounters, and as such it must, either implicitly or explicitly, model the way these elements interact to create the image. Do the response properties of cells in the mammalian visual system reflect this constraint? To address this question, we constructed a probabilistic model in which the identity and attributes of simple visual elements were represented explicitly and learnt the parameters of this model from unparsed, natural video sequences. After learning, the behaviour and grouping of variables in the probabilistic model corresponded closely to functional and anatomical properties of simple and complex cells in the primary visual cortex (V1). In particular, feature identity variables were activated in a way that resembled the activity of complex cells, while feature attribute variables responded much like simple cells. Furthermore, the grouping of the attributes within the model closely parallelled the reported anatomical grouping of simple cells in cat V1. Thus, this generative model makes explicit an interpretation of complex and simple cells as elements in the segmentation of a visual scene into basic independent features, along with a parametrisation of their moment-by-moment appearances. We speculate that such a segmentation may form the initial stage of a hierarchical system that progressively separates the identity and appearance of more articulated visual elements, culminating in view-invariant object recognition

CiteSeerX

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

PubMed Central

UCL Discovery

Communication and Automatic Interpretation of Affect from Facial Expressions

Author: Gevers T.
Salah A.A.
Sebe N.
Publication venue: Information Science Reference
Publication date: 01/01/2010
Field of study

International Migration, Integration and Social Cohesion online publications

Combining timbric and rhythmic features for semantic music tagging

Author
Publication venue
Publication date
Field of study

In this thesis we propose a novel approach to semantic music tagging. The project uses a modified Hidden Markov Model to semantically link two acoustic features. We make the assumption that acoustically similar songs have similar tags. We model our known collection as a graph where the states represent the songs and the model's probabilities are related\nto the timbric and rhythmic similarity. Tags are inferred from songs in acoustically meaningful paths, all starting from the query song

Padua Thesis and Dissertation Archive

Novel Methods for Forensic Multimedia Data Analysis: Part I

Author: Perner Petra
Publication venue: 'IntechOpen'
Publication date: 02/06/2020
Field of study

The increased usage of digital media in daily life has resulted in the demand for novel multimedia data analysis techniques that can help to use these data for forensic purposes. Processing of such data for police investigation and as evidence in a court of law, such that data interpretation is reliable, trustworthy, and efficient in terms of human time and other resources required, will help greatly to speed up investigation and make investigation more effective. If such data are to be used as evidence in a court of law, techniques that can confirm origin and integrity are necessary. In this chapter, we are proposing a new concept for new multimedia processing techniques for varied multimedia sources. We describe the background and motivation for our work. The overall system architecture is explained. We present the data to be used. After a review of the state of the art of related work of the multimedia data we consider in this work, we describe the method and techniques we are developing that go beyond the state of the art. The work will be continued in a Chapter Part II of this topic

IntechOpen

Crossref

Map-Building and Position Estimation in Mobile Robots Using Self-Organizing Maps

Author: Palamas George
Publication venue
Publication date: 01/01/2015
Field of study

University of South Wales Research Explorer