28 research outputs found

    Applying feature reduction analysis to a PPRLM-multiple Gaussian language identification system

    Get PDF
    This paper presents the application of a feature selection technique such as LDA to a language identification (LID) system. The baseline system consists of a PPRLM module followed by a multiple-Gaussian classifier. This classifier makes use of acoustic scores and duration features of each input utterance. We applied a dimension reduction of the feature space in order to achieve a faster and easier-trainable system. We imputed missing values of our vectors before projecting them on the new space. Our experiments show a very low performance reduction due to the dimension reduction approach. Using a single dimension projection the error rates we have obtained are about 8.73% taking into account the 22 most significant features

    A web-based application for the management and evaluation of tutoring requests in PBL-based massive laboratories

    Get PDF
    One important steps in a successful project-based-learning methodology (PBL) is the process of providing the students with a convenient feedback that allows them to keep on developing their projects or to improve them. However, this task is more difficult in massive courses, especially when the project deadline is close. Besides, the continuous evaluation methodology makes necessary to find ways to objectively and continuously measure students' performance without increasing excessively instructors' work load. In order to alleviate these problems, we have developed a web service that allows students to request personal tutoring assistance during the laboratory sessions by specifying the kind of problem they have and the person who could help them to solve it. This service provides tools for the staff to manage the laboratory, for performing continuous evaluation for all students and for the student collaborators, and to prioritize tutoring according to the progress of the student's project. Additionally, the application provides objective metrics which can be used at the end of the subject during the evaluation process in order to support some students' final scores. Different usability statistics and the results of a subjective evaluation with more than 330 students confirm the success of the proposed application

    On the use of phone-gram units in recurrent neural networks for language identification

    Get PDF
    In this paper we present our results on using RNN-based LM scores trained on different phone-gram orders and using different phonetic ASR recognizers. In order to avoid data sparseness problems and to reduce the vocabulary of all possible n-gram combinations, a K-means clustering procedure was performed using phone-vector embeddings as a pre-processing step. Additional experiments to optimize the amount of classes, batch-size, hidden neurons, state-unfolding, are also presented. We have worked with the KALAKA-3 database for the plenty-closed condition [1]. Thanks to our clustering technique and the combination of high level phonegrams, our phonotactic system performs ~13% better than the unigram-based RNNLM system. Also, the obtained RNNLM scores are calibrated and fused with other scores from an acoustic-based i-vector system and a traditional PPRLM system. This fusion provides additional improvements showing that they provide complementary information to the LID system

    n-gram Frequency Ranking with additional sources of information in a multiple-Gaussian classifier for Language Identification

    Get PDF
    We present new results of our n-gram frequency ranking used for language identification. We use a Parallel phone recognizer (as in PPRLM), but instead of the language model, we create a ranking with the most frequent n-grams. Then we compute the distance between the input sentence ranking and each language ranking, based on the difference in relative positions for each n-gram. The objective of this ranking is to model reliably a longer span than PPRLM. This approach overcomes PPRLM (15% relative improvement) due to the inclusion of 4-gram and 5-gram in the classifier. We will also see that the combination of this technique with other sources of information (feature vectors in our classifier) is also advantageous over PPRLM, showing also a detailed analysis of the relevance of these sources and a simple feature selection technique to cope with long feature vectors. The test database has been significantly increased using cross-fold validation, so comparisons are now more reliable

    Low-resource language recognition using a fusion of phoneme posteriorgram counts, acoustic and glottal-based i-vectors

    Get PDF
    This paper presents a description of our system for the Albayzin 2012 LRE competition. One of the main characteristics of this evaluation was the reduced number of available files for training the system, especially for the empty condition where no training data set was provided but only a development set. In addition, the whole database was created from online videos and around one third of the training data was labeled as noisy files. Our primary system was the fusion of three different i-vector based systems: one acoustic system based on MFCCs, a phonotactic system using trigrams of phone-posteriorgram counts, and another acoustic system based on RPLPs that improved robustness against noise. A contrastive system that included new features based on the glottal source was also presented. Official and postevaluation results for all the conditions using the proposed metrics for the evaluation and the Cavg metric are presented in the paper

    Extended phone log-likelihood ratio features and acoustic-based I-vectors for language recognition

    Get PDF
    This paper presents new techniques with relevant improvements added to the primary system presented by our group to the Albayzin 2012 LRE competition, where the use of any additional corpora for training or optimizing the models was forbidden. In this work, we present the incorporation of an additional phonotactic subsystem based on the use of phone log-likelihood ratio features (PLLR) extracted from different phonotactic recognizers that contributes to improve the accuracy of the system in a 21.4% in terms of Cavg (we also present results for the official metric during the evaluation, Fact). We will present how using these features at the phone state level provides significant improvements, when used together with dimensionality reduction techniques, especially PCA. We have also experimented with applying alternative SDC-like configurations on these PLLR features with additional improvements. Also, we will describe some modifications to the MFCC-based acoustic i-vector system which have also contributed to additional improvements. The final fused system outperformed the baseline in 27.4% in Cavg

    Incorporación de n-gramas discriminativos para mejorar un reconocedor de idioma fonotáctico basado en i-vectores

    Get PDF
    Este artículo describe una nueva técnica que permite combinar la información de dos sistemas fonotácticos distintos con el objetivo de mejorar los resultados de un sistema de reconocimiento automático de idioma. El primer sistema se basa en la creación de cuentas de posteriorgramas utilizadas para la generación de i-vectores, y el segundo es una variante del primero que tiene en cuenta los n-gramas más discriminativos en función de su ocurrencia en un idioma frente a todos los demás. La técnica propuesta permite obtener una mejora relativa de 8.63% en Cavg sobre los datos de evaluación utilizados para la competición ALBAYZIN 2012 LRE

    Aplicación de métodos estadísticos para la traducción de voz a Lengua de Signos

    Get PDF
    Este artículo presenta un conjunto de experimentos para la realización de un sistema de traducción estadística de voz a lengua de signos para personas sordas. El sistema contiene un primer módulo de reconocimiento de voz, un segundo módulo de traducción estadística de palabras en castellano a signos en Lengua de Signos Española, y un tercer módulo que realiza el signado de los signos mediante un agente animado. La traducción se hace utilizando dos alternativas tecnológicas: la primera basada en modelos de subsecuencias de palabras y la segunda basada en transductores de estados finitos. De todos los experimentos, se obtienen los mejores resultados con el modelo que realiza la traducción mediante transductores de estados finitos con unas tasas de error de 26,06% para las frases de referencia, de 33,42% para la salida del reconocedor

    Speech into Sign Language Statistical Translation System for Deaf People

    Get PDF
    This paper presents a set of experiments used to develop a statistical system from translating speech to sign language for deaf people. This system is composed of an Automatic Speech Recognition (ASR) system, followed by a statistical translation module and an animated agent that represents the different signs. Two different approaches have been used to perform the translations: a phrase-based system and a finite state transducer. For the evaluation, the followings figures have been considered: WER (Word Error Rate), BLEU and NIST. The paper presents translation results of reference sentences and sentences from the automatic speech recognizer. Also three different configurations have been evaluated for the speech recognizer. The best results were obtained with the finite state transducer, with a word error rate of 28.21% for the reference text, and 29.27% using the ASR output

    Sistema de Traducción Estadística de Voz a Lengua de Signos para Personas Sordas

    Get PDF
    Este artículo presenta un conjunto de experimentos para la realización de un sistema de traducción estadística de voz a lengua de signos para personas sordas. Éste contiene un módulo de reconocimiento de voz, uno de traducción estadística de palabras en castellano a signos en Lengua de Signos Española, y un tercer módulo que representa los signos mediante un avatar. La traducción se hace mediante dos alternativas tecnológicas: la primera basada en modelos de subsecuencias de palabras y la segunda basada en transductores de estados finitos. Para la evaluación se utilizan varias métricas, como WER (tasa de error de palabras), BLEU y NIST. Estas pruebas incluyen experimentos con las frases originales en castellano y Lengua de Signos y con frases extraídas del reconocedor de voz. Se evalúan también diferentes situaciones del reconocedor de voz a la hora de obtener el vocabulario y el modelo de lenguaje. Los mejores resultados se obtienen con la traducción mediante transductores de estados finitos dando unas tasas de error de 28,21% para las frases de referencia, y de 29,27% la salida del reconocedor
    corecore