146 research outputs found

    An application of an auditory periphery model in speaker identification

    Get PDF
    The number of applications of automatic Speaker Identification (SID) is growing due to the advanced technologies for secure access and authentication in services and devices. In 2016, in a study, the Cascade of Asymmetric Resonators with Fast Acting Compression (CAR FAC) cochlear model achieved the best performance among seven recent cochlear models to fit a set of human auditory physiological data. Motivated by the performance of the CAR-FAC, I apply this cochlear model in an SID task for the first time to produce a similar performance to a human auditory system. This thesis investigates the potential of the CAR-FAC model in an SID task. I investigate the capability of the CAR-FAC in text-dependent and text-independent SID tasks. This thesis also investigates contributions of different parameters, nonlinearities, and stages of the CAR-FAC that enhance SID accuracy. The performance of the CAR-FAC is compared with another recent cochlear model called the Auditory Nerve (AN) model. In addition, three FFT-based auditory features – Mel frequency Cepstral Coefficient (MFCC), Frequency Domain Linear Prediction (FDLP), and Gammatone Frequency Cepstral Coefficient (GFCC), are also included to compare their performance with cochlear features. This comparison allows me to investigate a better front-end for a noise-robust SID system. Three different statistical classifiers: a Gaussian Mixture Model with Universal Background Model (GMM-UBM), a Support Vector Machine (SVM), and an I-vector were used to evaluate the performance. These statistical classifiers allow me to investigate nonlinearities in the cochlear front-ends. The performance is evaluated under clean and noisy conditions for a wide range of noise levels. Techniques to improve the performance of a cochlear algorithm are also investigated in this thesis. It was found that the application of a cube root and DCT on cochlear output enhances the SID accuracy substantially

    Online signature verification systems on a low-cost FPGA

    Get PDF
    This paper describes three different approaches for the implementation of an online signature verification system on a low-cost FPGA. The system is based on an algorithm, which operates on real numbers using the double-precision floating-point IEEE 754 format. The doubleprecision computations are replaced by simpler formats, without affecting the biometrics performance, in order to permit efficient implementations on low-cost FPGA families. The first approach is an embedded system based on MicroBlaze, a 32-bit soft-core microprocessor designed for Xilinx FPGAs, which can be configured by including a single-precision floating-point unit (FPU). The second implementation attaches a hardware accelerator to the embedded system to reduce the execution time on floating-point vectors. The last approach is a custom computing system, which is built from a large set of arithmetic circuits that replace the floating-point data with a more efficient representation based on fixed-point format. The latter system provides a very high runtime acceleration factor at the expense of using a large number of FPGA resources, a complex development cycle and no flexibility since it cannot be adapted to other biometric algorithms. By contrast, the first system provides just the opposite features, while the second approach is a mixed solution between both of them. The experimental results show that both the hardware accelerator and the custom computing system reduce the execution time by a factor Ă—7.6 and Ă—201 but increase the logic FPGA resources by a factor Ă—2.3 and Ă—5.2, respectively, in comparison with the MicroBlaze embedded system.This research was funded by Spanish MCIN/AEI/10.13039/501100011033, grant number PID2019-107274RB-I00.Peer ReviewedPostprint (published version

    A Fully-Pipelined Hardware Design for Gaussian Mixture Models

    No full text
    Gaussian Mixture Models (GMMs) are widely used in many applications such as data mining, signal processing and computer vision, for probability density modeling and soft clustering. However, the parameters of a GMM need to be estimated from data by, for example, the Expectation-Maximization algorithm for Gaussian Mixture Models (EM-GMM), which is computationally demanding. This paper presents a novel design for the EM-GMM algorithm targeting reconfigurable platforms, with five main contributions. First, a pipeline-friendly EM-GMM with diagonal covariance matrices that can easily be mapped to hardware architectures. Second, a function evaluation unit for Gaussian probability density based on fixed-point arithmetic. Third, our approach is extended to support a wide range of dimensions or/and components by fitting multiple pieces of smaller dimensions onto an FPGA chip. Fourth, we derive a cost and performance model that estimates logic resources. Fifth, our dataflow design targeting the Maxeler MPCX2000 with a Stratix-5SGSD8 FPGA can run over 200 times faster than a 6-core Xeon E5645 processor, and over 39 times faster than a Pascal TITAN-X GPU. Our design provides a practical solution to applications for training and explores better parameters for GMMs with hundreds of millions of high dimensional input instances, for low-latency and high-performance applications

    Performance evaluation and implementations of MFCC, SVM and MLP algorithms in the FPGA board

    Get PDF
    One of the most difficult speech recognition tasks is accurate recognition of human-to-human communication. Advances in deep learning over the last few years have produced major speech improvements in recognition on the representative Switch-board conversational corpus. Word error rates that just a few years ago were 14% have dropped to 8.0%, then 6.6% and most recently 5.8%, and are now believed to be within striking range of human performance. This raises two issues - what is human performance, and how far down can we still drive speech recognition error rates? The main objective of this article is the development of a comparative study of the performance of Automatic Speech Recognition (ASR) algorithms using a database made up of a set of signals created by female and male speakers of different ages. We will also develop techniques for the Software and Hardware implementation of these algorithms and test them in an embedded electronic card based on a reconfigurable circuit (Field Programmable Gate Array FPGA). We will present an analysis of the results of classifications for the best Support Vector Machine architectures (SVM) and Artificial Neural Networks of Multi-Layer Perceptron (MLP). Following our analysis, we created NIOSII processors and we tested their operations as well as their characteristics. The characteristics of each processor are specified in this article (cost, size, speed, power consumption and complexity). At the end of this work, we physically implemented the architecture of the Mel Frequency Cepstral Coefficients (MFCC) extraction algorithm as well as the classification algorithm that provided the best results

    Influence of Noisy Environment on the Speech Recognition Rate Based on the Altera FPGA

    Get PDF
    This paper introduce an approach to study the effects of different levels of environment noise on the recognition rate of speech recognition systems, which are not used any type of filters to deal with this issue. This is achieved by implementing an embedded SoPC (System on a Programmable Chip) technique with Altera Nios II processor for real-time speech recognition system. Mel Frequency Cepstral Coefficients (MFCCs) technique was used for speech signal feature extraction (observation vector). Model the observation vector of voice information by using Gaussian Mixture Model (GMM), this model passed to the Hidden Markov Model (HMM) as probabilistic model to process the GMM statistically to make decision on utterance words recognition, whether a single or composite, one or more syllable words. The framework was implemented on Altera Cyclone II EP2C70F896C6N FPGA chip sitting on ALTERA DE2-70 Development Board. Each word model (template) stored as Transition Matrix, Diagonal Covariance Matrices, and Mean Vectors in the system memory. Each word model utilizes only 4.45Kbytes regardless of the spoken word length. Recognition words rate (digit/0 to digit/10) given 100% for the individual speaker. The test was conducted at different sound levels of the surrounding environment (53dB to 73dB) as measured by Sound Level Meter (SLM) instrument

    Characterization of speaker recognition in noisy channels

    Get PDF
    Speaker recognition is a frequently overlooked form of biometric security. Text-independent speaker identification is used by financial services, forensic experts, and human computer interaction developers to extract information that is transmitted along with a spoken message such as identity, gender, age, emotional state, etc. of a speaker. Speech features are classified as either low-level or high-level characteristics. Highlevel speech features are associated with syntax, dialect, and the overall meaning of a spoken message. In contrast, low-level features such as pitch, and phonemic spectra are associated much more with the physiology of the human vocal tract. It is these lowlevel features that are also the easiest and least computationally intensive characteristics of speech to extract. Once extracted, modern speaker recognition systems attempt to fit these features best to statistical classification models. One such widely used model is the Gaussian Mixture Model (GMM). The current standard of testing of speaker recognition systems is standardized by NIST in the often updated NIST Speaker Recognition Evaluation (NIST-SRE) standard. The results measured by the tests outlined in the standard are ultimately presented as Detection Error Tradeoff (DET) curves and detection cost function scores. A new method of measuring the effects of channel impediments on the quality of identifications made by Gaussian Mixture Model based speaker recognition systems will be presented in this thesis. With the exception of the NIST-SRE, no standardized or extensive testing of speaker recognition systems in noisy channels has been conducted. Thorough testing of speaker recognition systems will be conducted in channel model simulators. Additionally, the NIST-SRE error metric will be evaluated against a new proposed metric for gauging the performance and improvements of speaker recognition systems

    Noise-robust text-dependent speaker identification using cochlear models

    Get PDF
    One challenging issue in speaker identification (SID) is to achieve noise-robust performance. Humans can accurately identify speakers, even in noisy environments. We can leverage our knowledge of the function and anatomy of the human auditory pathway to design SID systems that achieve better noise-robust performance than conventional approaches. We propose a text-dependent SID system based on a real-time cochlear model called cascade of asymmetric resonators with fast-acting compression (CARFAC). We investigate the SID performance of CARFAC on signals corrupted by noise of various types and levels. We compare its performance with conventional auditory feature generators including mel-frequency cepstrum coefficients, frequency domain linear predictions, as well as another biologically inspired model called the auditory nerve model. We show that CARFAC outperforms other approaches when signals are corrupted by noise. Our results are consistent across datasets, types and levels of noise, different speaking speeds, and back-end classifiers. We show that the noise-robust SID performance of CARFAC is largely due to its nonlinear processing of auditory input signals. Presumably, the human auditory system achieves noise-robust performance via inherent nonlinearities as well
    • …
    corecore