8 research outputs found

    Calidad de estudio, calidad de vida

    Get PDF

    Semisupervised Speech Data Extraction from Basque Parliament Sessions and Validation on Fully Bilingual Basque–Spanish ASR

    Get PDF
    In this paper, a semisupervised speech data extraction method is presented and applied to create a new dataset designed for the development of fully bilingual Automatic Speech Recognition (ASR) systems for Basque and Spanish. The dataset is drawn from an extensive collection of Basque Parliament plenary sessions containing frequent code switchings. Since session minutes are not exact, only the most reliable speech segments are kept for training. To that end, we use phonetic similarity scores between nominal and recognized phone sequences. The process starts with baseline acoustic models trained on generic out-of-domain data, then iteratively updates the models with the extracted data and applies the updated models to refine the training dataset until the observed improvement between two iterations becomes small enough. A development dataset, involving five plenary sessions not used for training, has been manually audited for tuning and evaluation purposes. Cross-validation experiments (with 20 random partitions) have been carried out on the development dataset, using the baseline and the iteratively updated models. On average, Word Error Rate (WER) reduces from 16.57% (baseline) to 4.41% (first iteration) and further to 4.02% (second iteration), which corresponds to relative WER reductions of 73.4% and 8.8%, respectively. When considering only Basque segments, WER reduces on average from 16.57% (baseline) to 5.51% (first iteration) and further to 5.13% (second iteration), which corresponds to relative WER reductions of 66.7% and 6.9%, respectively. As a result of this work, a new bilingual Basque–Spanish resource has been produced based on Basque Parliament sessions, including 998 h of training data (audio segments + transcriptions), a development set (17 h long) designed for tuning and evaluation under a cross-validation scheme and a fully bilingual trigram language model.This work was partially funded by the Spanish Ministry of Science and Innovation (OPEN-SPEECH project, PID2019-106424RB-I00) and by the Basque Government under the general support program to research groups (IT-1704-22)

    An Overview of the IberSpeech-RTVE 2022 Challenges on Speech Technologies

    Get PDF
    Evaluation campaigns provide a common framework with which the progress of speech technologies can be effectively measured. The aim of this paper is to present a detailed overview of the IberSpeech-RTVE 2022 Challenges, which were organized as part of the IberSpeech 2022 conference under the ongoing series of Albayzin evaluation campaigns. In the 2022 edition, four challenges were launched: (1) speech-to-text transcription; (2) speaker diarization and identity assignment; (3) text and speech alignment; and (4) search on speech. Different databases that cover different domains (e.g., broadcast news, conference talks, parliament sessions) were released for those challenges. The submitted systems also cover a wide range of speech processing methods, which include hidden Markov model-based approaches, end-to-end neural network-based methods, hybrid approaches, etc. This paper describes the databases, the tasks and the performance metrics used in the four challenges. It also provides the most relevant features of the submitted systems and briefly presents and discusses the obtained results. Despite employing state-of-the-art technology, the relatively poor performance attained in some of the challenges reveals that there is still room for improvement. This encourages us to carry on with the Albayzin evaluation campaigns in the coming years.This work was partially supported by Radio Televisión Española through the RTVE Chair at the University of Zaragoza, and Red Temática en Tecnologías del Habla (RED2022-134270-T), funded by AEI (Ministerio de Ciencia e Innovación); It was also partially funded by the European Union’s Horizon 2020 research and innovation program under Marie Skłodowska-Curie Grant 101007666; in part by MCIN/AEI/10.13039/501100011033 and by the European Union “NextGenerationEU”/ PRTR under Grants PDC2021-120846C41 PID2021-126061OB-C44, and in part by the Government of Aragon (Grant Group T3623R); it was also partially funded by the Spanish Ministry of Science and Innovation (OPEN-SPEECH project, PID2019-106424RB-I00) and by the Basque Government under the general support program to research groups (IT-1704-22), and by projects RTI2018-098091-B-I00 and PID2021-125943OB-I00 (Spanish Ministry of Science and Innovation and ERDF) as well

    Reconocimiento de la Lengua en Albayzin 2010 LRE utilizando características PLLR

    Get PDF
    Los así denominados Phone Log-Likelihood Ratios (PLLR), han sido introducidos como características alternativas a los MFCC-SDC para sistemas de Reconocimiento de la Lengua (RL) mediante iVectors. En este artículo, tras una breve descripción de estas características, se proporcionan nuevas evidencias de su utilidad para tareas de RL, con un nuevo conjunto de experimentos sobre la base de datos Albayzin 2010 LRE, que contiene habla multi-locutor de banda ancha en seis lenguas diferentes: euskera, catalán, gallego, español, portugués e inglés. Los sistemas de iVectors entrenados con PLLRs obtienen mejoras relativas significativas respecto a los sistemas fonotácticos y sistemas de iVectors entrenados con características MFCC-SDC, tanto en condiciones de habla limpia como con habla ruidosa. Las fusiones de los sistemas PLLR con los sistemas fonotácticos y/o sistemas basados en MFCC-SDC proporcionan mejoras adicionales en el rendimiento, lo que revela que las características PLLR aportan información complementaria en ambos casos.Phone Log-Likelihood Ratios (PLLR) have been recently proposed as alternative features to MFCC-SDC for iVector Spoken Language Recognition (SLR). In this paper, PLLR features are first described, and then further evidence of their usefulness for SLR tasks is provided, with a new set of experiments on the Albayzin 2010 LRE dataset, which features wide-band multi speaker TV broadcast speech on six languages: Basque, Catalan, Galician, Spanish, Portuguese and English. iVector systems built using PLLR features, computed by means of three open-source phone decoders, achieved significant relative improvements with regard to the phonotactic and MFCC-SDC iVector systems in both clean and noisy speech conditions. Fusions of PLLR systems with the phonotactic and/or the MFCC-SDC iVector systems led to improved performance, revealing that PLLR features provide complementary information in both cases.This work has been supported by the University of the Basque Country under grant GIU10/18 and project US11/06 and by the Government of the Basque Country under program SAIOTEK (project S-PE12UN55). M. Diez is supported by a research fellowship from the Department of Education, Universities and Research of the Basque Country Government

    Verificación de la lengua en conversaciones telefónicas y en informativos de televisión (GLOSA)

    Get PDF
    En esta breve comunicación presentamos el proyecto GLOSA, financiado por el Gobierno Vasco durante el bienio 2010-2011. El proyecto plantea, entre otros, los siguientes objetivos tecnológicos: (1) crear una infraestructura adecuada para desarrollar y evaluar nuevos métodos de verificación de la lengua; y (2) preparar un sistema competitivo de verificación de la lengua para señales telefónicas con objeto de presentarlo a la NIST 2011 Language Recognition Evaluation. Desde el punto de vista académico, el objetivo más importante del proyecto es la implementación y mejora de las técnicas actuales de verificación de la lengua.In this brief communication we present the project GLOSA, financed by the Government of the Basque Country for the period 2010-2011. The project has two main technological objectives: (1) creating a suitable infrastructure for the development and evaluation of language recognition technologies; and (2) preparing a competitive language recognition system for conversational telephone speech, which will be eventually presented to the NIST 2011 Language Recognition Evaluation. From an academic point of view, the project aims to implement and improve state-of-the-art technology.This project has been supported by the Government of the Basque Country under program SAIO-TEK (project S-PE10UN87) and by the University of the Basque Country under grant GIU10/18

    Sistema de recuperación de noticias de televisión en castellano y euskera

    Get PDF
    El sistema de indexado y búsqueda de contenidos multimedia que se presenta en este trabajo (Hearch) es un buscador de aspecto convencional pero con la capacidad de devolver segmentos de vídeo gracias a la transcripción automática de sus contenidos de voz. El sistema consta de un back-end que capta, procesa e indexa los recursos, y de un front-end que permite realizar búsquedas y configurar y monitorizar el funcionamiento de los distintos módulos, mediante una interfaz web. Actualmente se encuentra operativa una versión de la herramienta que trabaja frente a repositorios de noticias en castellano y euskera (http://gtts.ehu.es/Hearch/). Para evaluar el rendimiento del sistema se dispone de 6 programas de noticias en castellano y 7 en euskera. Puesto que el módulo de Reconocimiento Automático del Habla introduce bastantes errores, se ha propuesto y evaluado una aproximación basada en añadir términos afines a los de la pregunta para ampliar los resultados proporcionados por el sistema. Como resultado se obtiene una pequeña mejora del rendimiento.This paper presents a spoken document retrieval system (Hearch) looking like a conventional search tool, which retrieves audio/video segments based on the automatic transcription of speech contents. The system consists of a back-end that captures, processes and indexes audio/video resources, and a front-end that allows to search contents, configure various modules and display performance statistics through a web interface. An early version of this tool is available (http://gtts.ehu.es/Hearch/), which searches and retrieves segments on TV broadcast news repositories in Spanish and Basque. To evaluate the performance of the system, six manually transcribed TV broadcast news in Spanish and seven in Basque have been used. An approach based on extending the query with the so called friendly terms has been proposed and evaluated, attempting to minimize the effect of errors introduced by the Automatic Speech Recognition module. This approach led to slight performance improvements.This work has been supported by the University of the Basque Country under grant GIU10/18, by the Government of the Basque Country under program SAIOTEK (project S-PE10UN87) and by the Spanish MICINN under Plan Nacional de I+D+i (project TIN2009-07446, partially financed by FEDER funds). M. Diez is supported by a research fellowship from the Department of Education, Universities and Research of the Basque Country Government

    Verificación de las cuatro lenguas oficiales españolas en grabaciones de programas de televisión

    No full text
    En este trabajo se presentan resultados de verificación sobre las cuatro lenguas oficiales españolas: castellano, catalán, euskera y gallego. Se analizan los resultados obtenidos en tests cerrados y abiertos (estos últimos incluyendo segmentos en francés, portugués, alemán o inglés) y considerando segmentos de voz de 30 segundos. Se realiza también un estudio detallado del rendimiento del sistema por cada lengua objetivo. Se usa la base de datos KALAKA creada especialmente para la Evaluación Albayzín 2008 de sistemas de verificación de la lengua. El sistema de verificación principal resulta de la fusión de un sistema acústico y 6 subsistemas fonotácticos. El sistema acústico toma información de las características espectrales de la señal de audio, mientras que los sistemas fonotácticos utilizan secuencias de fonemas producidas por varios decodificadores acústicos. En este trabajo se alcanza una tasa EER= 3,58 % y un coste CLLR = 0.30 en test cerrado, lo que implica una mejora relativa del 24,5 % con respecto a los mejores resultados obtenidos en la evaluación Albayzin 2008 VL.This paper presents language recognition results obtained for the four official Spanish languages: Spanish, Catalan, Basque and Galician. Results were obtained in closed and open tests (these latter including segments in French, Portuguese, German or English) on a subset of 30 second segments. A detailed study per target language is also included. Experiments were carried out on the KALAKA database, especially recorded for The Albayzin 2008 Language Recognition Evaluation. The main verification system resulted from the fusion of an acoustic system and 6 phonotactic subsystems. To model the target language, the acoustic subsystem takes information from the spectral characteristics of the audio signal, whereas phonotactic subsystems use sequences of phones produced by several acoustic-phonetic decoders. The best fused system attained a 3,58 % EER and CLLR = 0.30 in closed tests, which means 24,5 % improvement with regard to the best result obtained in the Albayzin 2008 LRE.This work has been supported by the Government of the Basque Country, under program SAIOTEK (project S-PE09UN47), and the Spanish MICINN, under Plan Nacional de I+D+i (project TIN2009-07446, partially financed by FEDER funds)

    Búsqueda y acceso a la información contenida en el habla de recursos multimedia

    No full text
    El proyecto tiene como objetivo hacer aportaciones científicas e introducir mejoras de tipo tecnológico en el sistema de indexado y búsqueda de contenidos multimedia (Hearch) desarrollado por el Grupo de Trabajo en Tecnologías Software de la UPV/EHU. Hearch es un buscador de aspecto convencional (como Google, Bing, etc.) pero con la capacidad de obtener como resultado segmentos de vídeo gracias a la transcripción automática de sus contenidos de voz. El sistema consta de un back-end que capta, procesa e indexa los recursos, y de un front-end que permite realizar búsquedas, configurar los distintos módulos y monitorizar el funcionamiento, mediante una interfaz web. Actualmente se encuentra operativa una primera versión de la herramienta que trabaja frente a repositorios de noticias en castellano y euskera (http://gtts.ehu.es/Hearch/), aunque está preparada también para tratar con recursos en inglés.The main goal of this project is to make scientific contributions and technological improvements related to the spoken document retrieval system (Hearch) developed by the Working Group on Software Technologies of the University of the Basque Country. Hearch looks like a conventional search tool (such as Google, Bing, etc.) but it is designed to retrieve audio/video segments based on the automatic transcription of speech contents. The system consists of a back-end that captures, processes and indexes audio/video resources, and a front-end that allows to search contents, configure various modules and display performance statistics through a web interface. An early version of this tool is available (http://gtts.ehu.es/Hearch/), which searches and retrieves segments on broadcast news repositories in Spanish and Basque, through it can also deal with resources in English.This project has been supported by the Spanish MICINN, under Plan Nacional de I+D+i (project TIN2009-07446, partially financed by FEDER funds)
    corecore