8 research outputs found
Desarrollo y evaluación de herramientas para alineamiento automático de audio y texto con sistemas de reconocimiento automático del habla
El objetivo del Reconocimiento Automático del Habla (RAH) es, dada una señal de voz, extraer la secuencia de palabras que han sido pronunciadas. Para poder llevar a cabo su tarea correctamente, un sistema de RAH precisa de ciertos conocimientos que obtiene a través de una fase de entrenamiento. Dicho aprendizaje se basa en dos modelos: el Modelo Acústico para caracterizar la señal de voz, y el Modelo de Lenguaje, relativo al vocabulario en ella utilizado. Este Trabajo Fin de Grado toma como punto de partida un motor de RAH para desarrollar y poner a prueba un sistema capaz de alinear el texto del guión de un programa de televisión con su correspondiente audio y obtener una localización temporal precisa de cada una de las palabras locutadas. Bajo esta premisa, se consideran diferentes estrategias de alineamiento. El principal problema que se nos plantea es la incertidumbre al localizar el texto en el audio, ya que, a priori no se tiene ninguna información. Como primera estrategia se propone, realizar un reparto uniforme del texto en el audio del programa. Así, se llevan a cabo una serie de experimentos que permiten caracterizar el sistema de alineamiento y obtener una primera referencia de sus prestaciones. Para disminuir la ambigüedad en la localización del texto en el audio se incluye un nuevo módulo en el sistema de alineamiento capaz de obtener marcas temporales parciales que sirvan de guía. Tras una nueva serie de experimentos se comprueba que esta estrategia supone una mejora relativa cercana al 12% respecto de las prestaciones ofrecidas por el sistema base. Demostrada la eficacia del uso de marcas temporales parciales, y en un intento por mejorar aun más el sistema de alineamiento, se utiliza una herramienta desarrollada para paliar las limitaciones del reconocedor en los finales de palabras, obteniendo una mejora relativa en torno al 20% respecto del sistema base, que alcanza valores próximos al 23% cuando se incluye la información de las intervenciones de cada locutor en el sistema de alineamiento. Por tanto, a la vista de las resultados obtenidos en este Trabajo Fin de Grado, se concluye que el uso de estrategias que permitan reducir la incertidumbre en la localización del texto en el audio resultan adecuadas en este contexto, quedando probada la mejora de prestaciones que suponen en el sistema de alineamiento
Spoken term detection ALBAYZIN 2014 evaluation: overview, systems, results, and discussion
The electronic version of this article is the complete one and can be found online at: http://dx.doi.org/10.1186/s13636-015-0063-8Spoken term detection (STD) aims at retrieving data from a speech repository given a textual representation of the search term. Nowadays, it is receiving much interest due to the large volume of multimedia information. STD differs from automatic speech recognition (ASR) in that ASR is interested in all the terms/words that appear in the speech data, whereas STD focuses on a selected list of search terms that must be detected within the speech data. This paper presents the systems submitted to the STD ALBAYZIN 2014 evaluation, held as a part of the ALBAYZIN 2014 evaluation campaign within the context of the IberSPEECH 2014 conference. This is the first STD evaluation that deals with Spanish language. The evaluation consists of retrieving the speech files that contain the search terms, indicating their start and end times within the appropriate speech file, along with a score value that reflects the confidence given to the detection of the search term. The evaluation is conducted on a Spanish spontaneous speech database, which comprises a set of talks from workshops and amounts to about 7 h of speech. We present the database, the evaluation metrics, the systems submitted to the evaluation, the results, and a detailed discussion. Four different research groups took part in the evaluation. Evaluation results show reasonable performance for moderate out-of-vocabulary term rate. This paper compares the systems submitted to the evaluation and makes a deep analysis based on some search term properties (term length, in-vocabulary/out-of-vocabulary terms, single-word/multi-word terms, and in-language/foreign terms).This work has been partly supported by project CMC-V2
(TEC2012-37585-C02-01) from the Spanish Ministry of Economy and
Competitiveness. This research was also funded by the European Regional
Development Fund, the Galician Regional Government (GRC2014/024,
“Consolidation of Research Units: AtlantTIC Project” CN2012/160)
Combining Multiple Approaches to Predict the Degree of Nativeness
Automatic speaker nativeness assessment has multiple applications, such as second language learning and IVR systems. In this paper we view this as a regression problem, since the available labels are on a continuous scale. Multiple approaches were applied, such as phonotactic models, i-vectors, and goodness of pronunciation, covering both segmental and suprasegmental features. Different phonotactic models were adopted, either trained with the challenge data, or using additional multilingual data from other domains. The obtained values were later combined in multiple ways and fed to a support vector machine regressor. Results on the test set surpass the provided baseline and are in line with the results obtained on the remaining sets. This suggests that our models generalize well to other datasetsinfo:eu-repo/semantics/publishedVersio
ALBAYZIN 2016 spoken term detection evaluation: an international open competitive evaluation in Spanish
Within search-on-speech, Spoken Term Detection (STD) aims to retrieve data from a speech repository given a textual representation of a search term. This paper presents an international open evaluation for search-on-speech based on STD in Spanish and an analysis of the results. The evaluation has been designed carefully so that several analyses of the main results can be carried out. The evaluation consists in retrieving the speech files that contain the search terms, providing their start and end times, and a score value that reflects the confidence given to the detection. Two different Spanish speech databases have been employed in the evaluation: MAVIR database, which comprises a set of talks from workshops, and EPIC database, which comprises a set of European Parliament sessions in Spanish. We present the evaluation itself, both databases, the evaluation metric, the systems submitted to the evaluation, the results, and a detailed discussion. Five different research groups took part in the evaluation, and ten different systems were submitted in total. We compare the systems submitted to the evaluation and make a deep analysis based on some search term properties (term length, within-vocabulary/out-of-vocabulary terms, single-word/multi-word terms, and native (Spanish)/foreign terms)Xunta de Galicia | Ref. ED431G/01Ministerio de Economía y Competitividad | Ref. TEC2015-67163-C2-1-RMinisterio de Economía y Competitividad | Ref. TIN2014-54288-C4-1-RMinisterio de Economía y Competitividad | Ref. TEC2015-68172-C2-1-
Spoken term detection ALBAYZIN 2014 evaluation: overview, systems, results, and discussion
Spoken term detection (STD) aims at retrieving data from a speech repository given a textual representation of the search term. Nowadays, it is receiving much interest due to the large volume of multimedia information. STD differs from automatic speech recognition (ASR) in that ASR is interested in all the terms/words that appear in the speech data, whereas STD focuses on a selected list of search terms that must be detected within the speech data. This paper presents the systems submitted to the STD ALBAYZIN 2014 evaluation, held as a part of the ALBAYZIN 2014 evaluation campaign within the context of the IberSPEECH 2014 conference. This is the first STD evaluation that deals with Spanish language. The evaluation consists of retrieving the speech files that contain the search terms, indicating their start and end times within the appropriate speech file, along with a score value that reflects the confidence given to the detection of the search term. The evaluation is conducted on a Spanish spontaneous speech database, which comprises a set of talks from workshops and amounts to about 7 h of speech. We present the database, the evaluation metrics, the systems submitted to the evaluation, the results, and a detailed discussion. Four different research groups took part in the evaluation. Evaluation results show reasonable performance for moderate out-of-vocabulary term rate. This paper compares the systems submitted to the evaluation and makes a deep analysis based on some search term properties (term length, in-vocabulary/out-of-vocabulary terms, single-word/multi-word terms, and in-language/foreign terms).Ministerio de Economía y Competitividad | Ref. TEC2012-37585-C02-01Xunta de Galicia | Ref. 2014/02
ALBAYZIN 2016 spoken term detection evaluation: an international open competitive evaluation in Spanish
Abstract Within search-on-speech, Spoken Term Detection (STD) aims to retrieve data from a speech repository given a textual representation of a search term. This paper presents an international open evaluation for search-on-speech based on STD in Spanish and an analysis of the results. The evaluation has been designed carefully so that several analyses of the main results can be carried out. The evaluation consists in retrieving the speech files that contain the search terms, providing their start and end times, and a score value that reflects the confidence given to the detection. Two different Spanish speech databases have been employed in the evaluation: MAVIR database, which comprises a set of talks from workshops, and EPIC database, which comprises a set of European Parliament sessions in Spanish. We present the evaluation itself, both databases, the evaluation metric, the systems submitted to the evaluation, the results, and a detailed discussion. Five different research groups took part in the evaluation, and ten different systems were submitted in total. We compare the systems submitted to the evaluation and make a deep analysis based on some search term properties (term length, within-vocabulary/out-of-vocabulary terms, single-word/multi-word terms, and native (Spanish)/foreign terms)