6 research outputs found
Accuracy of MFCC-Based Speaker Recognition in Series 60 Device
A fixed point implementation of speaker recognition based on MFCC signal processing is considered. We analyze the numerical error of the MFCC and its effect on the recognition accuracy. Techniques to reduce the information loss in a converted fixed point implementation are introduced. We increase the signal processing accuracy by adjusting the ratio of presentation accuracy of the operators and the signal. The signal processing error is found out to be more important to the speaker recognition accuracy than the error in the classification algorithm. The results are verified by applying the alternative technique to speech data. We also discuss the specific programming requirements set up by the Symbian and Series 60
Análisis de compensación de variabilidad en reconocimiento de locutor aplicado a duraciones cortas
En este proyecto se estudian, implementan y evalúan sistemas automáticos de reconocimiento
de locutor en presencia de locuciones de duración corta. Para llevarlo a cabo se han utilizado
y comparado diversas técnicas del estado del arte en reconocimiento de locutor así como su
adaptación a locuciones de corta duración.
Como punto de partida del proyecto se ha realizado un estudio de las diferentes técnicas que
han ido marcando el estado del arte, destacando las que han conseguido una mejoría notable
en evaluaciones promovidas por el National Institute of Standards and Technology (NIST) de
reconocimiento de locutor durante la última década.
Una vez entendido el estado del arte desde el punto de vista teórico el siguiente paso se define
la tarea sobre la que se evaluarán las diferentes técnicas. Históricamente, la tarea principal en
evaluaciones NIST consiste en entrenar el modelo de locutor con una conversación, de aproximadamente
150 segundos, y realizar la verificación de usuario frente a una locución de la misma
duración. En la tarea que se desarrolla durante la realización de este proyecto disponemos de
locuciones con una duración mucho más limitada, aproximadamente 10 segundos, provenientes
de evaluaciones NIST de reconocimiento de locutor.
Para la parte experimental se llevaron a cabo dos fases de experimentos. Durante la primera
fase el objetivo ha sido comparar y analizar las diferencias entre dos técnicas del estado del
arte basadas en Factor Analysis (FA), Total Variability (TV) y Probabilistic Linear Discriminant
Analysis (PLDA), evaluando principalmente el rendimiento de éstas técnicas sobre nuestro
entorno experimental que seguirá el protocolo de las evaluaciones NIST. En la segunda fase se
hace un ajuste de los parámetros de dichas técnicas para comprobar el impacto de los mismos
en presencia de duraciones cortas y mejorar el rendimiento de los sistemas con escasez de datos.
Para ello evaluamos el sistema en base a dos medidas, la tasa de error y la función de coste que
suele emplearse en dicha evaluación, que será detallada en los siguientes capítulos.
Finalmente, se presentan las conclusiones extraídas a lo largo de este trabajo, así como las
líneas de trabajo futuro.
Parte del trabajo llevado a cabo durante la ejecución de este Proyecto Final de Carrera ha
sido publicado en la conferencia de carácter internacional IberSpeech 2012 [1]:
Javier Gonzalez-Dominguez, Ruben Zazo, and Joaquin Gonzalez-Rodriguez. “On the use of
total variability and probabilistic linear discriminant analysis for speaker verification on short
utterances”.
i
Análisis de compensación de variabilidad en reconocimiento de locutor aplicado a duracionesThis project is focused on automatic speaker verification (SV) systems dealing with short
duration utterances ( 10s). Despite the enormous advances in the field, the broad use of SV in
real scenarios remains a challenge mostly due to two factors. First, the session variability; that
is, the set of difference among utterances belonging to the same speaker. Second, the system
performance degradation when dealing with short duration utterances.
As an starting point of this project, an exhaustive study of the state-of-the-art speaker verification
techniques has been conducted. This, with special focus on those methods, which achieved
outstanding results and open the door to better SV systems. In that sense, we put particular
emphasis on the recent methods based on Factor Analysis (FA) namely, Total Variability (TV)
and Probabilistic Linear Discriminant Analysis (PLDA). Those methods have become the state
of the art in the field due to their ability of mitigating the session variability problem
In order to assess the behaviour of those systems, we use the data and follow the protocol
defined by the US National Institute of Standards and Technology (NIST) in its Speaker Recognition
Evaluation series (SRE). Particularly, we follow the SRE2010 protocol, but adapted to
the short durations problems. Thus, instead of using 150s duration utterances as defined in the
core task of SRE2010, we experiment with 10s duration utterance in both training and testing.
The experiments conducted can be divided in two phases. During the first phase we study,
compare and evaluate the use of TV and PLDA as effective methods to perform SV. Second
phase is then devoted to adapt those methods to the short duration scenarios. We analyse in
this point the effect and importance of the multiple parameters of the systems when facing to
limited data for both training and testing. Conclusions and future lines of this work are then
presented.
Part of this work has been published on the international conference IberSpeech 2012 [1]:
Javier Gonzalez-Dominguez, Ruben Zazo, and Joaquin Gonzalez-Rodriguez. “On the use of
total variability and probabilistic linear discriminant analysis for speaker verification on short
utterances”
Language design for distributed stream processing
Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2009.This electronic version was submitted by the student author. The certified thesis is available in the Institute Archives and Special Collections.Includes bibliographical references (p. 149-152).Applications that combine live data streams with embedded, parallel, and distributed processing are becoming more commonplace. WaveScript is a domain-specific language that brings high-level, type-safe, garbage-collected programming to these domains. This is made possible by three primary implementation techniques, each of which leverages characteristics of the streaming domain. First, WaveScript employs an evaluation strategy that uses a combination of interpretation and reification to partially evaluate programs into stream dataflow graphs. Second, we use profile-driven compilation to enable many optimizations that are normally only available in the synchronous (rather than asynchronous) dataflow domain. Finally, an empirical, profile-driven approach also allows us to compute practical partitions of dataflow graphs, spreading them across embedded nodes and more powerful servers. We have used our language to build and deploy applications, including a sensor-network for the acoustic localization of wild animals such as the Yellow-Bellied marmot. We evaluate WaveScript's performance on this application, showing that it yields good performance on both embedded and desktop-class machines. Our language allowed us to implement the application rapidly, while outperforming a previous C implementation by over 35%, using fewer than half the lines of code. We evaluate the contribution of our optimizations to this success. We also evaluate WaveScript's ability to extract parallelism from this and other applications.by Ryan Rhodes Newton.Ph.D