30 research outputs found
Applying feature reduction analysis to a PPRLM-multiple Gaussian language identification system
This paper presents the application of a feature selection technique such as LDA to a language identification (LID) system. The baseline system consists of a PPRLM module followed by a multiple-Gaussian classifier. This classifier makes use of acoustic scores and duration features of each input utterance. We applied a dimension reduction of the feature space in order to achieve a faster and easier-trainable system. We imputed missing values of our vectors before projecting them on the new space. Our experiments show a very low performance reduction due to the dimension reduction approach. Using a single dimension projection the error rates we have obtained are about 8.73% taking into account the 22 most significant features
Parálisis cerebral y sistemas de seguimiento de la mirada: ¿clic por parpadeo o permanencia?
Los comandos básicos para el control de ordenadores
a través de interfaces gráficas son comandos
de alcance a un objeto en pantalla y su selección.
El presente documento describe un estudio comparativo
a nivel funcional de dos estrategias de selección
alternativas, especialmente diseñada para
personas con parálisis cerebral: clic por permanencia
y clic por parpadeo, mientras que la tarea de
alcance se realiza a través de movimientos oculares
SD-TEAM: Interactive Learning, Self-Evaluation and Multimodal Technologies for Multidomain Spoken Dialog Systems
Speech technology currently supports the development of dialogue systems that function in limited domains for which they were trained and in conditions for which they were designed, that is, specific acoustic conditions, speakers etc. The international scientific community has made significant efforts in exploring methods for adaptation to different acoustic contexts, tasks and types of user. However, further work is needed to produce multimodal spoken dialogue systems capable of exploiting interactivity to learn online in order to improve their performance. The goal is to produce flexible and dynamic multimodal, interactive systems based on spoken communication, capable of detecting automatically their operating conditions and especially of learning from user interactions and experience through evaluating their own performance. Such ?living? systems will evolve continuously and without supervision until user satisfaction is achieved. Special attention will be paid to those groups of users for which adaptation and personalisation is essential: amongst others, people with disabilities which lead to communication difficulties (hearing loss, dysfluent speech, ...), mobility problems and non-native users. In this context, the SD-TEAM Project aims to advance the development of technologies for interactive learning and evaluation. In addition, it will develop flexible distributed architectures that allow synergistic interaction between processing modules from a variety of dialogue systems designed for distinct tasks, user groups, acoustic conditions, etc. These technologies will be demonstrated via multimodal dialogue systems to access to services from home and to access to unstructured information, based on the multi-domain systems developed in the previous project TIN2005-08660-C04
Low-resource language recognition using a fusion of phoneme posteriorgram counts, acoustic and glottal-based i-vectors
This paper presents a description of our system for the Albayzin 2012 LRE competition. One of the main characteristics of this evaluation was the reduced number of available files for training the system, especially for the empty condition where no training data set was provided but only a development set. In addition, the whole database was created from online videos and around one third of the training data was labeled as noisy files. Our primary system was the fusion of three different i-vector based systems: one acoustic system based on MFCCs, a phonotactic system using trigrams of phone-posteriorgram counts, and another acoustic system based on RPLPs that improved robustness against noise. A contrastive system that included new features based on the glottal source was also presented. Official and postevaluation results for all the conditions using the proposed metrics for the evaluation and the Cavg metric are presented in the paper
A web-based application for the management and evaluation of tutoring requests in PBL-based massive laboratories
One important steps in a successful project-based-learning methodology (PBL) is the process of providing the students with a convenient feedback that allows them to keep on developing their projects or to improve them. However, this task is more difficult in massive courses, especially when the project deadline is close. Besides, the continuous evaluation methodology makes necessary to find ways to objectively and continuously measure students' performance without increasing excessively instructors' work load. In order to alleviate these problems, we have developed a web service that allows students to request personal tutoring assistance during the laboratory sessions by specifying the kind of problem they have and the person who could help them to solve it. This service provides tools for the staff to manage the laboratory, for performing continuous evaluation for all students and for the student collaborators, and to prioritize tutoring according to the progress of the student's project. Additionally, the application provides objective metrics which can be used at the end of the subject during the evaluation process in order to support some students' final scores. Different usability statistics and the results of a subjective evaluation with more than 330 students confirm the success of the proposed application
n-gram Frequency Ranking with additional sources of information in a multiple-Gaussian classifier for Language Identification
We present new results of our n-gram frequency ranking used for language identification. We use a Parallel phone recognizer (as in PPRLM), but instead of the language model, we create a ranking with the most frequent n-grams. Then we compute the distance between the input sentence ranking and each language ranking, based on the difference in relative positions for each n-gram. The objective of this ranking is to model reliably a longer span than PPRLM. This approach overcomes PPRLM (15% relative improvement) due to the inclusion of 4-gram and 5-gram in the classifier. We will also see that the combination of this technique with other sources of information (feature vectors in our classifier) is also advantageous over PPRLM, showing also a detailed analysis of the relevance of these sources and a simple feature selection technique to cope with long feature vectors. The test database has been significantly increased using cross-fold validation, so comparisons are now more reliable
Incorporación de n-gramas discriminativos para mejorar un reconocedor de idioma fonotáctico basado en i-vectores
Este artículo describe una nueva técnica que permite combinar la información de dos sistemas fonotácticos distintos con el objetivo de mejorar los resultados de un sistema de reconocimiento automático de idioma. El primer sistema se basa en la creación de cuentas de posteriorgramas utilizadas para la generación de i-vectores, y el segundo es una variante del primero que tiene en cuenta los n-gramas más discriminativos en función de su ocurrencia en un idioma frente a todos los demás. La técnica propuesta permite obtener una mejora relativa de 8.63% en Cavg sobre los datos de evaluación utilizados para la competición ALBAYZIN 2012 LRE
Automatic Tools for Software Quality Analysis in a Project-Based-Learning Course
Over the last decade, the “Anytime, anywhere” paradigm has gained pace in Higher Education teaching, leading many universities to innovate in pedagogical strategies based on Internet and Web access technologies. Development of remote access technologies has enabled teachers to achieve higher levels of efficiency while students can access tools and resources no longer constrained by time or location. Additionally, students can submit their assignments, be evaluated and be provided feedback remotely. In this context arises the need for faculty to dispose of automatic tools that ease and support the evaluation process whilst facilitating the provision of student feedback. Project Based Learning (PBL) has emerged as a pedagogical strategy that can contribute to measure software quality and thus evaluate students in a more accurate and comprehensive way by devoting importance to a broad set of components, not just focused on functional aspects. This paper analyzes how the introduction of innovative automatic diagnosis and feedback tools, based on quantitative methods, can contribute towards a continuous process of student software quality enhancement and higher efficiency programming in PBL courses, without compromising functional aspects, as students are provided practical guidelines by instructors on a timely basis
Phonotactic language recognition using i-vectors and phoneme posteriogram counts
This paper describes a novel approach to phonotactic LID, where instead of using soft-counts based on phoneme lattices, we use posteriogram to obtain n-gram counts. The high-dimensional vectors of counts are reduced to low-dimensional units for which we adapted the commonly used term i-vectors. The reduction is based on multinomial subspace modeling and is designed to work in the total-variability space. The proposed technique was tested on the NIST 2009 LRE set with better results to a system based on using soft-counts (Cavg on 30s: 3.15% vs 3.43%), and with very good results when fused with an acoustic i-vector LID system (Cavg on 30s acoustic 2.4% vs 1.25%). The proposed technique is also compared with another low dimensional projection system based on PCA. In comparison with the original soft-counts, the proposed technique provides better results, reduces the problems due to sparse counts, and avoids the process of using pruning techniques when creating the lattices
Speaker Diarization Features: The UPM Contribution to the RT09 Evaluation
Two new features have been proposed and used in the Rich Transcription Evaluation 2009 by the Universidad Politécnica de Madrid, which outperform the results of the baseline system. One of the features is the intensity channel contribution, a feature related to the location of the speaker. The second feature is the logarithm of the interpolated fundamental frequency. It is the first time that both features are applied to the clustering stage of multiple distant microphone meetings diarization. It is shown that the inclusion of both features improves the baseline results by 15.36% and 16.71% relative to the development set and the RT 09 set, respectively. If we consider speaker errors only, the relative improvement is 23% and 32.83% on the development set and the RT09 set, respectively