9 research outputs found

    Human and Machine Speaker Recognition Based on Short Trivial Events

    Full text link
    Trivial events are ubiquitous in human to human conversations, e.g., cough, laugh and sniff. Compared to regular speech, these trivial events are usually short and unclear, thus generally regarded as not speaker discriminative and so are largely ignored by present speaker recognition research. However, these trivial events are highly valuable in some particular circumstances such as forensic examination, as they are less subjected to intentional change, so can be used to discover the genuine speaker from disguised speech. In this paper, we collect a trivial event speech database that involves 75 speakers and 6 types of events, and report preliminary speaker recognition results on this database, by both human listeners and machines. Particularly, the deep feature learning technique recently proposed by our group is utilized to analyze and recognize the trivial events, which leads to acceptable equal error rates (EERs) despite the extremely short durations (0.2-0.5 seconds) of these events. Comparing different types of events, 'hmm' seems more speaker discriminative.Comment: ICASSP 201

    Query by Example of Speaker Audio Signals using Power Spectrum and MFCCs

    Get PDF
    Search engine is the popular term for an information retrieval (IR) system. Typically, search engine can be based on full-text indexing. Changing the presentation from the text data to multimedia data types make an information retrieval process more complex such as a retrieval of image or sounds in large databases. This paper introduces the use of language and text independent speech as input queries in a large sound database by using Speaker identification algorithm. The method consists of 2 main processing first steps, we separate vocal and non-vocal identification after that vocal be used to speaker identification for audio query by speaker voice. For the speaker identification and audio query by process, we estimate the similarity of the example signal and the samples in the queried database by calculating the Euclidian distance between the Mel frequency cepstral coefficients (MFCC) and Energy spectrum of acoustic features. The simulations show that the good performance with a sustainable computational cost and obtained the average accuracy rate more than 90%

    Estudio y mejora de sistemas de verificación de locutores bajo condiciones de voz afónica

    Get PDF
    Este proyecto analiza las diferencias entre distintos modos del habla, en especial el susurro, utilizado como medio de comunicación cuando se padece una enfermedad como la afonı́a, y cómo afectan a los sistemas de verificación automática de locutores. El objetivo del proyecto es el estudio de la pérdida de prestaciones, y la mejora de los sistemas mediante la aplicación de distintas técnicas.El estudio parte del análisis de las señales en los dominios de voz susurrada y voz neutra, cuyas diferencias explican el detrimento del sistema. Para cuantificarlo, se escogen un sistema de referencia de altas prestaciones y una base de datos que cuenta con audios en condiciones de habla normal y susurrada.Las técnicas de mejora estudiadas abordan el problema en distintos puntos del sistema completo. Estas técnicas se introducen de forma teórica en el segundo bloque del trabajo, y en el tercer bloque se muestran los resultados obtenidos para cada una de ellas. Para evaluarlas y compararlas se utilizan herramientas de software libre, herramientas de visualización y entrenamiento de modelos estadı́sticos utilizando Python como lenguaje de programación principal. El trabajo muestra el rendimiento de herramientas alternativas a los algoritmos populares de aprendizaje automático, necesarias cuando no se dispone de una cantidad significativa de datos que permitan buenos resultados.<br /

    GLOTTAL EXCITATION EXTRACTION OF VOICED SPEECH - JOINTLY PARAMETRIC AND NONPARAMETRIC APPROACHES

    Get PDF
    The goal of this dissertation is to develop methods to recover glottal flow pulses, which contain biometrical information about the speaker. The excitation information estimated from an observed speech utterance is modeled as the source of an inverse problem. Windowed linear prediction analysis and inverse filtering are first used to deconvolve the speech signal to obtain a rough estimate of glottal flow pulses. Linear prediction and its inverse filtering can largely eliminate the vocal-tract response which is usually modeled as infinite impulse response filter. Some remaining vocal-tract components that reside in the estimate after inverse filtering are next removed by maximum-phase and minimum-phase decomposition which is implemented by applying the complex cepstrum to the initial estimate of the glottal pulses. The additive and residual errors from inverse filtering can be suppressed by higher-order statistics which is the method used to calculate cepstrum representations. Some features directly provided by the glottal source\u27s cepstrum representation as well as fitting parameters for estimated pulses are used to form feature patterns that were applied to a minimum-distance classifier to realize a speaker identification system with very limited subjects

    Modelização de filtro de trato vocal para reconstrução de voz disfónica

    Get PDF
    Análise e modelizadação da envolvente espectral para as nove vogais orais do português europeu padrão nos modos de fala vozeada e fala sussurrada. Desenvolvimento de modelos compactos no domínio espectral e no domínio cepstral das nove vogais orais do português padrão, orientados ao orador. Desenvolvimento e avaliação de um algoritmo protótipo de identificação de vogal sussurrada, orientado à operação em tempo real
    corecore