1,513 research outputs found

    An investigation of supervector regression for forensic voice comparison on small data

    Get PDF
    International audienceThe present paper deals with an observer design for a nonlinear lateral vehicle model. The nonlinear model is represented by an exact Takagi-Sugeno (TS) model via the sector nonlinearity transformation. A proportional multiple integral observer (PMIO) based on the TS model is designed to estimate simultaneously the state vector and the unknown input (road curvature). The convergence conditions of the estimation error are expressed under LMI formulation using the Lyapunov theory which guaranties bounded error. Simulations are carried out and experimental results are provided to illustrate the proposed observer

    Fast speaker independent large vocabulary continuous speech recognition [online]

    Get PDF

    Hierachical methods for large population speaker identification using telephone speech

    Get PDF
    This study focuses on speaker identificat ion. Several problems such as acoustic noise, channel noise, speaker variability, large population of known group of speakers wi thin the system and many others limit good SiD performance. The SiD system extracts speaker specific features from digitised speech signa] for accurate identification. These feature sets are clustered to form the speaker template known as a speaker model. As the number of speakers enrolling into the system gets larger, more models accumulate and the interspeaker confusion results. This study proposes the hierarchical methods which aim to split the large population of enrolled speakers into smaller groups of model databases for minimising interspeaker confusion

    Data-driven multivariate and multiscale methods for brain computer interface

    Get PDF
    This thesis focuses on the development of data-driven multivariate and multiscale methods for brain computer interface (BCI) systems. The electroencephalogram (EEG), the most convenient means to measure neurophysiological activity due to its noninvasive nature, is mainly considered. The nonlinearity and nonstationarity inherent in EEG and its multichannel recording nature require a new set of data-driven multivariate techniques to estimate more accurately features for enhanced BCI operation. Also, a long term goal is to enable an alternative EEG recording strategy for achieving long-term and portable monitoring. Empirical mode decomposition (EMD) and local mean decomposition (LMD), fully data-driven adaptive tools, are considered to decompose the nonlinear and nonstationary EEG signal into a set of components which are highly localised in time and frequency. It is shown that the complex and multivariate extensions of EMD, which can exploit common oscillatory modes within multivariate (multichannel) data, can be used to accurately estimate and compare the amplitude and phase information among multiple sources, a key for the feature extraction of BCI system. A complex extension of local mean decomposition is also introduced and its operation is illustrated on two channel neuronal spike streams. Common spatial pattern (CSP), a standard feature extraction technique for BCI application, is also extended to complex domain using the augmented complex statistics. Depending on the circularity/noncircularity of a complex signal, one of the complex CSP algorithms can be chosen to produce the best classification performance between two different EEG classes. Using these complex and multivariate algorithms, two cognitive brain studies are investigated for more natural and intuitive design of advanced BCI systems. Firstly, a Yarbus-style auditory selective attention experiment is introduced to measure the user attention to a sound source among a mixture of sound stimuli, which is aimed at improving the usefulness of hearing instruments such as hearing aid. Secondly, emotion experiments elicited by taste and taste recall are examined to determine the pleasure and displeasure of a food for the implementation of affective computing. The separation between two emotional responses is examined using real and complex-valued common spatial pattern methods. Finally, we introduce a novel approach to brain monitoring based on EEG recordings from within the ear canal, embedded on a custom made hearing aid earplug. The new platform promises the possibility of both short- and long-term continuous use for standard brain monitoring and interfacing applications

    Speech and neural network dynamics

    Get PDF

    Modelling of a speech-to-text recognition system for air traffic control and NATO air command

    Get PDF
    Accent invariance in speech recognition is a challenging problem especially in the are of aviation. In this paper a speech recognition system is developed to transcribe accented speech between pilots and air traffic controllers. The system allows handling of accents in continuous speech by modelling phonemes using Hidden Markov Models (HMMs) with Gaussian mixture model (GMM) probability density functions for each state. These phonemes are used to build word models of the NATO phonetic alphabet as well as the numerals 0 to 9 with transcriptions obtained from the Carnegie Mellon University (CMU) pronouncing dictionary. Mel-Frequency Cepstral Co-efficients (MFCC) with delta and delta-delta coefficients are used for the feature extraction process. Amplitude normalisation and covariance scaling is implemented to improve recognition accuracy. A word error rate (WER) of 2% for seen speakers and 22% for unseen speakers is obtained.http://jit.ndhu.edu.twhj2023Electrical, Electronic and Computer Engineerin

    Identity Verification with Speech Recognition : A Study

    Get PDF
    Since verification and authentication based on speech recognition are important develop-ments in the field of information security, this study revolves around the topic of speech-based Identity verification. The goal was to create a speech recognition system that can be embedded to an existing user authentication system. For this purpose, the study relied on a free speech modelling tool and modelled a limited dictionary for speech recognition. Speech samples were collected for training and testing purposes. The training sample was collected from three respondents and the testing data from seven, including those from previous three respondents. A freely available tool, HTK Toolkit, was used to train the model with speech samples recorded from respondents. The results showed that the accuracy of the model was dependent on whether the training da-taset included the users’ speech or not. The study supports the significance of real use of the model. However, the scope of the study is not large enough for authentication and verification system
    • …
    corecore