Search CORE

76,229 research outputs found

Recommended from our members

Towards a Multimodal Time-Based Empathy Prediction System

Author: Barbieri F.
del Prado Martin F. M.
Guizzo E.
Lucchesi F.
Maffei G.
Weyde T.
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2019
Field of study

We describe our system for empathic emotion recognition. It is based on deep learning on multiple modalities in a late fusion architecture. We describe the modules of our system and discuss the evaluation results. Our code is also available for the research community

City Research Online

Crossref

VOICE RECOGNITION SECURITY SYSTEM USING MEL-FREQUENCY CEPSTRUM COEFFICIENTS

Author: A Sharmila
M Muruganandam
Mahalakshmi P
Publication venue: 'Innovare Academic Sciences Pvt Ltd'
Publication date: 01/12/2016
Field of study

ABSTRACTObjective: Voice Recognition is a fascinating field spanning several areas of computer science and mathematics. Reliable speaker recognition is a hardproblem, requiring a combination of many techniques; however modern methods have been able to achieve an impressive degree of accuracy. Theobjective of this work is to examine various speech and speaker recognition techniques and to apply them to build a simple voice recognition system.Method: The project is implemented on software which uses different techniques such as Mel frequency Cepstrum Coefficient (MFCC), VectorQuantization (VQ) which are implemented using MATLAB.Results: MFCC is used to extract the characteristics from the input speech signal with respect to a particular word uttered by a particular speaker. VQcodebook is generated by clustering the training feature vectors of each speaker and then stored in the speaker database.Conclusion: Verification of the speaker is carried out using Euclidian Distance. For voice recognition we implement the MFCC approach using softwareplatform MatlabR2013b.Keywords: Mel-frequency cepstrum coefficient, Vector quantization, Voice recognition, Hidden Markov model, Euclidean distance

Innovare Academic Sciences: E-Journals

Fusion of Audio and Visual Information for Implementing Improved Speech Recognition System

Author: Acharya Vikrant Satish
Publication venue: ScholarWorks@GVSU
Publication date: 01/04/2018
Field of study

Speech recognition is a very useful technology because of its potential to develop applications, which are suitable for various needs of users. This research is an attempt to enhance the performance of a speech recognition system by combining the visual features (lip movement) with audio features. The results were calculated using utterances of numerals collected from participants inclusive of both male and female genders. Discrete Cosine Transform (DCT) coefficients were used for computing visual features and Mel Frequency Cepstral Coefficients (MFCC) were used for computing audio features. The classification was then carried out using Support Vector Machine (SVM). The results obtained from the combined/fused system were compared with the recognition rates of two standalone systems (Audio only and visual only)

Scholarworks@GVSU

Optimal Representation of Anuran Call Spectrum in Environmental Monitoring Systems Using Wireless Sensor Networks

Author: Aguayo-González Francisco (Coordinador)
Barbancho Concejero Julio
Carrasco Muñoz Alejandro
Gómez-Bellido Jesús
León de Mora Carlos (Coordinador)
Luque Sendra Amalia
Publication venue: 'MDPI AG'
Publication date: 01/01/2018
Field of study

The analysis and classiﬁcation of the sounds produced by certain animal species, notably anurans, have revealed these amphibians to be a potentially strong indicator of temperature ﬂuctuations and therefore of the existence of climate change. Environmental monitoring systems using Wireless Sensor Networks are therefore of interest to obtain indicators of global warming. For the automatic classiﬁcation of the sounds recorded on such systems, the proper representation of the sound spectrum is essential since it contains the information required for cataloguing anuran calls. The present paper focuses on this process of feature extraction by exploring three alternatives: the standardized MPEG-7, the Filter Bank Energy (FBE), and the Mel Frequency Cepstral Coefﬁcients (MFCC). Moreover, various values for every option in the extraction of spectrum features have been considered. Throughout the paper, it is shown that representing the frame spectrum with pure FBE offers slightly worse results than using the MPEG-7 features. This performance can easily be increased, however, by rescaling the FBE in a double dimension: vertically, by taking the logarithm of the energies; and, horizontally, by applying mel scaling in the ﬁlter banks. On the other hand, representing the spectrum in the cepstral domain, as in MFCC, has shown additional marginal improvements in classiﬁcation performance.University of Seville: Telefónica Chair "Intelligence Networks

idUS. Depósito de Investigación Universidad de Sevilla

Designing Gabor windows using convex optimization

Author: Balazs Peter
Holighaus Nicki
Perraudin Nathanaël
Søndergaard Peter L.
Publication venue
Publication date: 11/04/2018
Field of study

Redundant Gabor frames admit an infinite number of dual frames, yet only the canonical dual Gabor system, constructed from the minimal l2-norm dual window, is widely used. This window function however, might lack desirable properties, e.g. good time-frequency concentration, small support or smoothness. We employ convex optimization methods to design dual windows satisfying the Wexler-Raz equations and optimizing various constraints. Numerical experiments suggest that alternate dual windows with considerably improved features can be found

arXiv.org e-Print Archive

Infoscience - École polytechnique fédérale de Lausanne

Spoken Word Recognition Using Hidden Markov Model

Author: Ramesh P
Publication venue
Publication date: 01/01/2013
Field of study

The main aim of this project is to develop isolated spoken word recognition system using Hidden Markov Model (HMM) with a good accuracy at all the possible frequency range of human voice. Here ten different words are recorded by different speakers including male and female and results are compared with different feature extraction methods. Earlier work includes recognition of seven small utterances using HMM with the use only one feature extraction method. This spoken word recognition system mainly divided into two major blocks. First includes recording data base and feature extraction of recorded signals. Here we use Mel frequency cepstral coefficients, linear cepstral coefficients and fundamental frequency as feature extraction methods. To obtain Mel frequency cepstral coefficients signal should go through the following: pre emphasis, framing, applying window function, Fast Fourier transform, filter bank and then discrete cosine transform, where as a linear frequency cepstral coefficients does not use Mel frequency. Second part describes HMM used for modeling and recognizing the spoken words. All the raining samples are clustered using K-means algorithm. Gaussian mixture containing mean, variance and weight are modeling parameters. Here Baum Welch algorithm is used for training the samples and re-estimate the parameters. Finally Viterbi algorithm recognizes best sequence that exactly matches for given sequence there is given spoken utterance to be recognized. Here all the simulations are done by the MATLAB tool and Microsoft window 7 operating system

ethesis@nitr

Speech recognition system based on Hidden Markov Model concerning the Moroccan dialect DARIJA

Author: C. DAOUI
Dr. A. EL GHAZI
Publication venue: Global Journals Inc. (US)
Publication date: 23/08/2011
Field of study

In this work, we present a system for automatic speech recognition on the Moroccan dialect. We used the hidden Markov model to model the phonetic units corresponding to words taken from the training base. The results obtained are very encouraging given the size of the training set and the number of people taken to the registration. To demonstrate the flexibility of the hidden Markov model we conducted a comparison of results obtained by the latter and dynamic programming

Global Journal of Computer Science and Technology (GJCST)

Machine Analysis of Facial Expressions

Author: Bartlett M.S.
Pantic M.
Publication venue: I-Tech Education and Publishing
Publication date: 01/01/2007
Field of study

No abstract

IntechOpen

CiteSeerX

Crossref

University of Twente Research Information

Speaker Gender Recognition Using Hidden Markov Model

Author: abdulkafor Abeer
Al-Irhayim Yusra
Publication venue: The International Institute for Science, Technology and Education (IISTE)
Publication date: 28/04/2016
Field of study

Gender is an important demographic attribute of people. With the evolution in modern technologies in various fields of life and entering the computer systems in all applications, this led to the use of transactions instead of these technologies and human speech processing, and speaker recognition technology race. In this research we build a system to distinguish the gender of the speaker, and through the audio information that has been obtained from the speech signal, passes the system in four phases, namely the phase of initial processing, and phase of features extraction, we use (MFCC) (Mel Frequency Cepstral Coefficients) technique, then comes the phase of training the EM algorithm was used to achieve the greatest expected limit, and finally the testing phase, which has been applied hidden Markov models in it. All algorithms and programs have been written using the language of Matlab. Keywords: Gender Recognition, Hidden Markov Model, Mel Frequency Cepstral Coefficients, Speech Recognitio

International Institute for Science, Technology and Education (IISTE): E-Journals