4 research outputs found

    Automatic Identity Recognition Using Speech Biometric

    Get PDF
    Biometric technology refers to the automatic identification of a person using physical or behavioral traits associated with him/her. This technology can be an excellent candidate for developing intelligent systems such as speaker identification, facial recognition, signature verification...etc. Biometric technology can be used to design and develop automatic identity recognition systems, which are highly demanded and can be used in banking systems, employee identification, immigration, e-commerce…etc. The first phase of this research emphasizes on the development of automatic identity recognizer using speech biometric technology based on Artificial Intelligence (AI) techniques provided in MATLAB. For our phase one, speech data is collected from 20 (10 male and 10 female) participants in order to develop the recognizer. The speech data include utterances recorded for the English language digits (0 to 9), where each participant recorded each digit 3 times, which resulted in a total of 600 utterances for all participants. For our phase two, speech data is collected from 100 (50 male and 50 female) participants in order to develop the recognizer. The speech data is divided into text-dependent and text-independent data, whereby each participant selected his/her full name and recorded it 30 times, which makes up the text-independent data. On the other hand, the text-dependent data is represented by a short Arabic language story that contains 16 sentences, whereby every sentence was recorded by every participant 5 times. As a result, this new corpus contains 3000 (30 utterances * 100 speakers) sound files that represent the text-independent data using their full names and 8000 (16 sentences * 5 utterances * 100 speakers) sound files that represent the text-dependent data using the short story. For the purpose of our phase one of developing the automatic identity recognizer using speech, the 600 utterances have undergone the feature extraction and feature classification phases. The speech-based automatic identity recognition system is based on the most dominating feature extraction technique, which is known as the Mel-Frequency Cepstral Coefficient (MFCC). For feature classification phase, the system is based on the Vector Quantization (VQ) algorithm. Based on our experimental results, the highest accuracy achieved is 76%. The experimental results have shown acceptable performance, but can be improved further in our phase two using larger speech data size and better performance classification techniques such as the Hidden Markov Model (HMM)

    Robust text independent closed set speaker identification systems and their evaluation

    Get PDF
    PhD ThesisThis thesis focuses upon text independent closed set speaker identi cation. The contributions relate to evaluation studies in the presence of various types of noise and handset e ects. Extensive evaluations are performed on four databases. The rst contribution is in the context of the use of the Gaussian Mixture Model-Universal Background Model (GMM-UBM) with original speech recordings from only the TIMIT database. Four main simulations for Speaker Identi cation Accuracy (SIA) are presented including di erent fusion strategies: Late fusion (score based), early fusion (feature based) and early-late fusion (combination of feature and score based), late fusion using concatenated static and dynamic features (features with temporal derivatives such as rst order derivative delta and second order derivative delta-delta features, namely acceleration features), and nally fusion of statistically independent normalized scores. The second contribution is again based on the GMM-UBM approach. Comprehensive evaluations of the e ect of Additive White Gaussian Noise (AWGN), and Non-Stationary Noise (NSN) (with and without a G.712 type handset) upon identi cation performance are undertaken. In particular, three NSN types with varying Signal to Noise Ratios (SNRs) were tested corresponding to: street tra c, a bus interior and a crowded talking environment. The performance evaluation also considered the e ect of late fusion techniques based on score fusion, namely mean, maximum, and linear weighted sum fusion. The databases employed were: TIMIT, SITW, and NIST 2008; and 120 speakers were selected from each database to yield 3,600 speech utterances. The third contribution is based on the use of the I-vector, four combinations of I-vectors with 100 and 200 dimensions were employed. Then, various fusion techniques using maximum, mean, weighted sum and cumulative fusion with the same I-vector dimension were used to improve the SIA. Similarly, both interleaving and concatenated I-vector fusion were exploited to produce 200 and 400 I-vector dimensions. The system was evaluated with four di erent databases using 120 speakers from each database. TIMIT, SITW and NIST 2008 databases were evaluated for various types of NSN namely, street-tra c NSN, bus-interior NSN and crowd talking NSN; and the G.712 type handset at 16 kHz was also applied. As recommendations from the study in terms of the GMM-UBM approach, mean fusion is found to yield overall best performance in terms of the SIA with noisy speech, whereas linear weighted sum fusion is overall best for original database recordings. However, in the I-vector approach the best SIA was obtained from the weighted sum and the concatenated fusion.Ministry of Higher Education and Scienti c Research (MoHESR), and the Iraqi Cultural Attach e, Al-Mustansiriya University, Al-Mustansiriya University College of Engineering in Iraq for supporting my PhD scholarship

    sistemi di interazione vocale per la domotica

    Get PDF
    Una delle questioni aperte nell’ambito dell’home automation è la realizzazione di interfacce uomo-macchina che siano non solo efficaci per il controllo di un sistema, ma anche facilmente accessibili. La voce è il mezzo naturale per comunicare richieste e comandi, quindi l’interfaccia vocale presenta notevoli vantaggi rispetto alle soluzioni touch-screen, interruttori ecc. Il lavoro di tesi proposto è finalizzato alla realizzazione di un sistema di interazione vocale per l’home automation, in grado non solo di riconoscere singoli comandi veicolati da segnali vocali, ma anche di personalizzare i servizi richiesti tramite il riconoscimento del parlatore e di interagire mediante il parlato sintetizzato. Per ciascuna tipologia di interazione vocale, verranno proposte soluzioni volte a superare i limiti dell’approccio classico in letteratura. In prima analisi, verrà presentato un sistema di riconoscimento vocale distribuito (DSR) per il controllo delle luci, che implementa ottimizzazioni ad-hoc per operare nell’ambiente in modo non invasivo e risolvere le problematiche di uno scenario reale. Nel sistema DSR sarà integrato un algoritmo di identificazione del parlatore per ottenere un sistema in grado di personalizzare i comandi sulla base dell’utente riconosciuto. Un sistema di identificazione vocale deve essere in grado di classificare l’utente con frasi della durata inferiore a 5 s. A tal fine verrà proposto un algoritmo basato su truncated Karhunen-Loève transform con performance, su brevi sequenze di speech (< 3.5 s), migliori della convenzionale tecnica basata su Mel-Cepstral coefficients. Verrà infine proposto un framework di sintesi vocale Hidden Markov Model/unit-selection basato su Modified Discrete Cosine Transform, che garantisce la perfetta ricostruibilità del segnale e supera i limiti imposti dalla tecnica Mel-cepstral. Gli algoritmi ed il sistema proposto saranno applicati a segnali acquisiti in condizioni realistiche, al fine di verificarne l’adeguatezza.One of the open questions in home automation is the realization of human-machine interfaces that are not only effective for the control of the available functions, but also easily accessible. The voice is the natural way to communicate requests and commands, in this way speech interface offers considerable advantages over solutions such as touch-screen, switches etc. The proposed thesis is aimed at studying and realizing a speech interaction system for home automation to be able not only to recognize individual commands conveyed by voice signals, but also to customize the services requested through a speaker recognizer and to interact by means of synthesized speech. For each speech interaction mechanism, solutions are suggested to overcome the traditional limitations of previous work. In the first analysis, it is offered a speech distributed recognition system (DSR), for the voice control of a lighting system, that implements strategies and ad-hoc optimizations and is able to solve the typical problems of a real scenario. The DSR system can also be integrated with a speaker identification algorithm in order to obtain a system able to customize the spoken commands on the user specific settings. In the home automation, a speaker identification system must be able to classify the user with sequences of speech frames of a duration less than 5 s. To this goal, an algorithm based on truncated Karhunen-Loève transform able to produce results, with short sequences of speech frames (< 3.5 s), better than those achieved with the Mel-Cepstral coefficients, is suggested. Moreover, this work presents a novel Hidden Markov Models/unit-selection speech synthesis framework based on Modified Discrete Cosine Transform, which guarantees the perfect reconstruction of the speech signal and overcomes the main lacks of Mel-cepstral technique. The algorithms and the proposed system will be applied to signals acquired under realistic conditions, in order to verify its adequacy

    GFM-Based Methods for Speaker Identification

    No full text
    corecore