1,499 research outputs found

    Speaker identification using multimodal neural networks and wavelet analysis

    Get PDF
    © 2014 The Authors. Published by IET. This is an open access article available under a Creative Commons licence. The published version can be accessed at the following link on the publisher’s website: https://doi.org/10.1049/iet-bmt.2014.0011The rapid momentum of the technology progress in the recent years has led to a tremendous rise in the use of biometric authentication systems. The objective of this research is to investigate the problem of identifying a speaker from its voice regardless of the content. In this study, the authors designed and implemented a novel text-independent multimodal speaker identification system based on wavelet analysis and neural networks. Wavelet analysis comprises discrete wavelet transform, wavelet packet transform, wavelet sub-band coding and Mel-frequency cepstral coefficients (MFCCs). The learning module comprises general regressive, probabilistic and radial basis function neural networks, forming decisions through a majority voting scheme. The system was found to be competitive and it improved the identification rate by 15% as compared with the classical MFCC. In addition, it reduced the identification time by 40% as compared with the back-propagation neural network, Gaussian mixture model and principal component analysis. Performance tests conducted using the GRID database corpora have shown that this approach has faster identification time and greater accuracy compared with traditional approaches, and it is applicable to real-time, text-independent speaker identification systems

    Multi-modal association learning using spike-timing dependent plasticity (STDP)

    Get PDF
    We propose an associative learning model that can integrate facial images with speech signals to target a subject in a reinforcement learning (RL) paradigm. Through this approach, the rules of learning will involve associating paired stimuli (stimulus–stimulus, i.e., face–speech), which is also known as predictor-choice pairs. Prior to a learning simulation, we extract the features of the biometrics used in the study. For facial features, we experiment by using two approaches: principal component analysis (PCA)-based Eigenfaces and singular value decomposition (SVD). For speech features, we use wavelet packet decomposition (WPD). The experiments show that the PCA-based Eigenfaces feature extraction approach produces better results than SVD. We implement the proposed learning model by using the Spike- Timing-Dependent Plasticity (STDP) algorithm, which depends on the time and rate of pre-post synaptic spikes. The key contribution of our study is the implementation of learning rules via STDP and firing rate in spatiotemporal neural networks based on the Izhikevich spiking model. In our learning, we implement learning for response group association by following the reward-modulated STDP in terms of RL, wherein the firing rate of the response groups determines the reward that will be given. We perform a number of experiments that use existing face samples from the Olivetti Research Laboratory (ORL) dataset, and speech samples from TIDigits. After several experiments and simulations are performed to recognize a subject, the results show that the proposed learning model can associate the predictor (face) with the choice (speech) at optimum performance rates of 77.26% and 82.66% for training and testing, respectively. We also perform learning by using real data, that is, an experiment is conducted on a sample of face–speech data, which have been collected in a manner similar to that of the initial data. The performance results are 79.11% and 77.33% for training and testing, respectively. Based on these results, the proposed learning model can produce high learning performance in terms of combining heterogeneous data (face–speech). This finding opens possibilities to expand RL in the field of biometric authenticatio

    UR-FUNNY: A Multimodal Language Dataset for Understanding Humor

    Full text link
    Humor is a unique and creative communicative behavior displayed during social interactions. It is produced in a multimodal manner, through the usage of words (text), gestures (vision) and prosodic cues (acoustic). Understanding humor from these three modalities falls within boundaries of multimodal language; a recent research trend in natural language processing that models natural language as it happens in face-to-face communication. Although humor detection is an established research area in NLP, in a multimodal context it is an understudied area. This paper presents a diverse multimodal dataset, called UR-FUNNY, to open the door to understanding multimodal language used in expressing humor. The dataset and accompanying studies, present a framework in multimodal humor detection for the natural language processing community. UR-FUNNY is publicly available for research

    Multimodal Biometrics Enhancement Recognition System based on Fusion of Fingerprint and PalmPrint: A Review

    Get PDF
    This article is an overview of a current multimodal biometrics research based on fingerprint and palm-print. It explains the pervious study for each modal separately and its fusion technique with another biometric modal. The basic biometric system consists of four stages: firstly, the sensor which is used for enrolmen

    Person Identification Using Multimodal Biometrics under Different Challenges

    Get PDF
    The main aims of this chapter are to show the importance and role of human identification and recognition in the field of human-robot interaction, discuss the methods of person identification systems, namely traditional and biometrics systems, and compare the most commonly used biometric traits that are used in recognition systems such as face, ear, palmprint, iris, and speech. Then, by showing and comparing the requirements, advantages, disadvantages, recognition algorithms, challenges, and experimental results for each trait, the most suitable and efficient biometric trait for human-robot interaction will be discussed. The cases of human-robot interaction that require to use the unimodal biometric system and why the multimodal biometric system is also required will be discussed. Finally, two fusion methods for the multimodal biometric system will be presented and compared
    • …
    corecore