8 research outputs found

    Speaker Recognition Using Multiple Parametric Self-Organizing Maps

    Get PDF
    Speaker Recognition is the process of automatically recognizing a person who is speaking on the basis of individual parameters included in his/her voice. This technology allows systems to automatically verify identify in applications such as banking by telephone or forensic science. A Speaker Recognition system has the following main modules: Feature Extraction and Classification. For feature extraction the most commonly used techniques are MEL-Frequency Cepstrum Coefficients (MFCC) and Linear Predictive Coding (LPC). For classification and verification, technologies such as Vector Quantization (VQ), Hidden Markov Models (HMM) and Neural Networks have been used. The contribution of this thesis is a new methodology to achieve high accuracy identification and impostor rejection. The new proposed method, Multiple Parametric Self-Organizing Maps (M-PSOM) is a classification and verification technique. The new method was successfully implemented and tested using the CSLU Speaker Recognition Corpora of the Oregon School of Engineering with excellent results

    Features and Measures for Speaker Recognition

    Get PDF
    Electrical Engineerin

    Multi-media personal identity verification

    Get PDF

    Discrimination parole/musique et étude de nouveaux paramètres et modèles pour un système d'identification du locuteur dans le contexte de conférences téléphoniques

    Get PDF
    La mise en oeuvre de systèmes de compréhension automatique de parole pouvant fonctionner dans des conditions réelles implique de reproduire certaines aptitudes de l'être humain. Outre les aptitudes à comprendre la parole même lorsqu'elle est corrompue par du bruit, nous sommes capables de tenir une conversation impliquant plusieurs interlocuteurs. Ce dernier point est lié au fait que nous identifions implicitement les interlocuteurs. Cette caractérisation du locuteur nous permet par exemple de réaliser des conversations téléphoniques en mode conférence. En plus de la reconnaissance du vocabulaire ou de l'identification du locuteur, on est également capable de distinguer les séquences de la musique (en alternance, en arrière plan, etc.) qui peuvent apparaître lorsqu'un des correspondants se place en mode attente. En partant de ce contexte, on s'est intéressé à développer un système capable d'une part de discriminer entre les séquences de Parole/Musique et d'autre part d'identifier le locuteur dans des conditions téléphoniques fonctionnant en mode conférence avec une variabilité des combinés. Autrement dit, cette thèse s'intéresse à deux sujets du domaine du traitement de la parole. Le premier sujet porte sur la recherche de nouveaux paramètres pour améliorer les performances des algorithmes qui identifient les locuteurs en mode téléphonique. Le deuxième sujet est consacré à la proposition de nouvelles approches en discrimination de la parole, de la musique et de la musique chantée. En discrimination du locuteur, on présentera une première étude visant à caractériser le locuteur par des paramètres AM-FM synchrones à la glotte, extraits à la sortie d'un banc de filtres cochléaires. L'objectif visé est de trouver de nouveaux paramètres plus robustes aux bruits et à la variabilité des combinés téléphoniques. Comme résultats, on a obtenu des scores presque similaires entre le système proposé et le système de référence. Les meilleures performances ont été enregistrées lorsque le système utilise une architecture parallèle composée de deux reconnaisseurs qui se basent respectivement sur les paramètres MFCC et AM-FM. Dans le même cadre, on s'est intéressé à proposer une nouvelle technique de modélisation qui tient compte de la dépendance temporelle entre la source d'excitation et le conduit vocal. Avec les tests de courtes durées, on a obtenu de meilleures performances en comparaison à l'approche classique. Cependant, quand on augmente la durée de test, on obtient presque les mêmes performances pour tous les systèmes proposés. En discrimination Parole/Musique, on a proposé deux systèmes, le premier utilise trois modèles paramétriques entraînés respectivement pour la parole, la musique et la musique chantée sans effectuer aucune normalisation sur les vecteurs paramètres. Sur une durée test de 100 ms, on a obtenu un taux de reconnaissance en moyenne de 93,77%. Le deuxième système ne requiert aucun entraînement et se base simplement sur un seuil pour effectuer la classification

    A study of voice quality in a group of irradiated laryngeal cancer patients tumour stages T1 and T2.

    Get PDF
    This is a longitudinal study of voice quality in a group of 35 patients irradiated for early vocal fold tumours, stages T1 and T2. Electrolaryngograph (ELG) based analyses were used to obtain objective measurements of speaking fundamental frequency parameters over a wide range of time intervals following radiotherapy. Lx waveforms were also analysed. Perceptual evaluation of voice quality and patients' self assessments of their experience of vocal symptoms and limitations in vocal function after radiotherapy, were carried out. The relationship between perceptual and self assessment parameters and objective voice quality measurements was determined. A few patients underwent periods of voice therapy. A comparison is made of their voice measurements before and after therapy intervention with a group of patients, who did not receive voice therapy. The findings in this study show that, contrary to some early reports that the voice returns to normal in the majority of patients after radiotherapy, most patients' show evidence of residual abnormal voice quality and symptoms as measured and as rated by clinicians and by patients themselves. The majority of patients do not consider these a major problem, however. Evidence is presented of the beneficial effect of voice therapy to help patients compensate for the inevitable tissue damage caused by radiotherapy to the larynx. Electrolaryngograph generated objective measures and Lx waveforms proved sensitive, reliable and clinically applicable for objective voice analysis

    Evaluation of glottal characteristics for speaker identification.

    Get PDF
    Based on the assumption that the physical characteristics of people's vocal apparatus cause their voices to have distinctive characteristics, this thesis reports on investigations into the use of the long-term average glottal response for speaker identification. The long-term average glottal response is a new feature that is obtained by overlaying successive vocal tract responses within an utterance. The way in which the long-term average glottal response varies with accent and gender is examined using a population of 352 American English speakers from eight different accent regions. Descriptors are defined that characterize the shape of the long-term average glottal response. Factor analysis of the descriptors of the long-term average glottal responses shows that the most important factor contains significant contributions from descriptors comprised of the coefficients of cubics fitted to the long-term average glottal response. Discriminant analysis demonstrates that the long-term average glottal response is potentially useful for classifying speakers according to their gender, but is not useful for distinguishing American accents. The identification accuracy of the long-term average glottal response is compared with that obtained from vocal tract features. Identification experiments are performed using a speaker database containing utterances from twenty speakers of the digits zero to nine. Vocal tract features, which consist of cepstral coefficients, partial correlation coefficients and linear prediction coefficients, are shown to be more accurate than the long-term average glottal response. Despite analysis of the training data indicating that the long-term average glottal response was uncorrelated with the vocal tract features, various feature combinations gave insignificant improvements in identification accuracy. The effect of noise and distortion on speaker identification is examined for each of the features. It is found that the identification performance of the long-term average glottal response is insensitive to noise compared with cepstral coefficients, partial correlation coefficients and the long-term average spectrum, but that it is highly sensitive to variations in the phase response of the speech transmission channel. Before reporting on the identification experiments, the thesis introduces speech production, speech models and background to the various features used in the experiments. Investigations into the long-term average glottal response demonstrate that it approximates the glottal pulse convolved with the long-term average impulse response, and this relationship is verified using synthetic speech. Furthermore, the spectrum of the long-term average glottal response extracted from pre-emphasized speech is shown to be similar to the long-term average spectrum of pre-emphasized speech, but computationally much simpler

    Identificação de falantes : aspectos teoricos e metodologicos

    Get PDF
    Orientador: Eleonora AlbanoTese (doutorado) - Universidade Estadual de Campinas, Instituto de Estudos da LinguagemResumo: O presente trabalho pretende examinar a eficiência de diversos parâmetros acústicos na Identificação de Falantes. Nos experimentos analisou-se um conjunto básico de 8 falantes, adultos do sexo masculino, com idades entre 22 e 45 anos. Em alguns casos incluiu-se a análise de mais dois falantes, gêmeos idênticos, de modo a examinar instrumentalmente as diferenças entre vozes perceptuaImente muito semelhantes. Os parâmetros estudados foram: Formantes Vocálicos, Freqüência Fundamental, Espectro de Longo Termo, Velocidade de Fala, Consoantes Nasais e VOT (Voice Onset Time). Discutiu-se também a eficiência da inspeção visual de espectrogramas na Identificação de Falantes, um tema especialmente relevante para o modelo forense, e que tem provocado grande controvérsia nas últimas décadasAbstract: Not informed.DoutoradoDoutor em Linguístic
    corecore