Search CORE

9 research outputs found

Desgarros: aproximación al TDAH a través de un proyecto artístico de fotografía y collage

Author: Pastor Silvestre Joan
Publication venue: 'Universitat Politecnica de Valencia'
Publication date: 12/02/2018
Field of study

El proyecto Desgarros tiene como epicentro el TDAH (Trastorno por Déficit de Atención e Hiperactividad) y trata de analizar la experiencia emocional de las personas que lo padecen y cómo esta afecta a sus vidas. Usando como referente las vivencias de una niña de 16 años, este proyecto pretende trazar una imagen de esta afección mediante la realización de una serie de fotografías combinadas con collage y maquetadas en forma de fotolibro con el objetivo de dar visibilidad, transmitir y contar una historia: la suya.The project Desgarros has as epicentre the ADHD (Attention-deficit hyperactivity disorder). It tries to analyse the emotional experience of people who suffer it and how this affects to his lives from the reality of a 16 year old girl. This project pretends to trace an image of this condition by means of the realisation of a series of photographs combined with collage and the layout of a photobook with the aim to give visibility, transmit and tell a story: her story.Pastor Silvestre, J. (2017). Desgarros: aproximación al TDAH a través de un proyecto artístico de fotografía y collage. http://hdl.handle.net/10251/97705TFG

RiuNet

Live Streaming Speech Recognition Using Deep Bidirectional LSTM Acoustic Models and Interpolated Language Models

Author: Civera Saiz Jorge
Giménez Pastor Adrián
Jorge-Cano Javier
Juan Alfons
Sanchis Navarro José Alberto
Silvestre Cerdà Joan Albert
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2022
Field of study

[EN] Although Long-Short Term Memory (LSTM) networks and deep Transformers are now extensively used in offline ASR, it is unclear how best offline systems can be adapted to work with them under the streaming setup. After gaining considerable experience on this regard in recent years, in this paper we show how an optimized, low-latency streaming decoder can be built in which bidirectional LSTM acoustic models, together with general interpolated language models, can be nicely integrated with minimal performance degradation. In brief, our streaming decoder consists of a one-pass, real-time search engine relying on a limited-duration window sliding over time and a number of ad hoc acoustic and language model pruning techniques. Extensive empirical assessment is provided on truly streaming tasks derived from the well-known LibriSpeech and TED talks datasets, as well as from TV shows on a main Spanish broadcasting station.This work was supported in part by European Union's Horizon 2020 Research and Innovation Programme under Grant 761758 (X5gon), and 952215 (TAILOR) and Erasmus+ Education Program under Grant Agreement 20-226-093604-SCH, in part by MCIN/AEI/10.13039/501100011033 ERDF A way of making Europe under Grant RTI2018-094879-B-I00, and in part by Generalitat Valenciana's Research Project Classroom Activity Recognition under Grant PROMETEO/2019/111. Funding for open access charge: CRUE-Universitat Politecnica de Valencia. The associate editor coordinating the review of this manuscript and approving it for publication was Prof. Lei Xie.Jorge-Cano, J.; Giménez Pastor, A.; Silvestre Cerdà, JA.; Civera Saiz, J.; Sanchis Navarro, JA.; Juan, A. (2022). Live Streaming Speech Recognition Using Deep Bidirectional LSTM Acoustic Models and Interpolated Language Models. IEEE/ACM Transactions on Audio Speech and Language Processing. 30:148-161. https://doi.org/10.1109/TASLP.2021.3133216S1481613

RiuNet

Albayzin Evaluation: The PRHLT-UPV Audio Segmentation System

Author: Andrés Ferrer Jesús
Civera Saiz Jorge
Giménez Pastor Adrián
Juan Císcar Alfonso
Silvestre Cerdà Joan Albert
Publication venue: Ramos Castro, Daniel
Publication date: 21/11/2012
Field of study

This paper describes the audio segmentation system developed by the PRHLT research group at the UPV for the Albayzin Audio Segmentation Evaluation 2012. The PRHLT-UPV audio segmentation system is based on a conventional GMM-HMM speech recognition approach in which the vocabulary set is defined by the power set of segment classes. MFCC features were extracted to represent the acoustic signal and the AK toolkit was used for both, training acoustic models and performing audio segmentation. Experimental results reveals that our system provides an excellent performance on speech detection, so it could be successfully employed to provide speech segments to a diarization or speech recognition system.The research leading to these results has received funding from the European Union Seventh Framework Programme (FP7/2007-2013) under grant agreement no. 287755. Funding was also provided by the Spanish Government (iTrans2 project, TIN2009-14511; FPU scholarship AP2010-4349).Silvestre Cerdà, JA.; Giménez Pastor, A.; Andrés Ferrer, J.; Civera Saiz, J.; Juan Císcar, A. (2012). Albayzin Evaluation: The PRHLT-UPV Audio Segmentation System. En Proceedings IberSPEECH 2012 VII Jornadas en Tecnología del Habla and III Iberian SLTech Workshop. Ramos Castro, Daniel. 596-600. http://hdl.handle.net/10251/53699S59660

RiuNet

MLLP-VRAIN Spanish ASR Systems for the Albayzín-RTVE 2020 Speech-to-Text Challenge: Extension

Author: Baquero-Arnal Pau
Civera Saiz Jorge
Garcés Díaz-Munío Gonçal
Giménez Pastor Adrián
Iranzo-Sánchez Javier
Jorge-Cano Javier
Juan Alfons
Pérez-González de Martos Alejandro Manuel
Sanchis Navarro José Alberto
Silvestre Cerdà Joan Albert
Publication venue: MDPI AG
Publication date: 01/01/2022
Field of study

[EN] This paper describes the automatic speech recognition (ASR) systems built by the MLLP-VRAIN research group of Universitat Politècnica de València for the Albayzín-RTVE 2020 Speech-to-Text Challenge, and includes an extension of the work consisting of building and evaluating equivalent systems under the closed data conditions from the 2018 challenge. The primary system (p-streaming_1500ms_nlt) was a hybrid ASR system using streaming one-pass decoding with a context window of 1.5 seconds. This system achieved 16.0% WER on the test-2020 set. We also submitted three contrastive systems. From these, we highlight the system c2-streaming_600ms_t which, following a similar configuration as the primary system with a smaller context window of 0.6 s, scored 16.9% WER points on the same test set, with a measured empirical latency of 0.81 ± 0.09 s (mean ± stdev). That is, we obtained state-of-the-art latencies for high-quality automatic live captioning with a small WER degradation of 6% relative. As an extension, the equivalent closed-condition systems obtained 23.3% WER and 23.5% WER, respectively. When evaluated with an unconstrained language model, we obtained 19.9% WER and 20.4% WER; i.e., not far behind the top-performing systems with only 5% of the full acoustic data and with the extra ability of being streaming-capable. Indeed, all of these streaming systems could be put into production environments for automatic captioning of live media streams.The research leading to these results has received funding from the European Union's Horizon 2020 research and innovation programme under grant agreements no. 761758 (X5Gon) and 952215 (TAILOR), and Erasmus+ Education programme under grant agreement no. 20-226-093604-SCH (EXPERT); the Government of Spain's grant RTI2018-094879-B-I00 (Multisub) funded by MCIN/AEI/10.13039/501100011033 & "ERDF A way of making Europe", and FPU scholarships FPU14/03981 and FPU18/04135; the Generalitat Valenciana's research project Classroom Activity Recognition (ref. PROMETEO/2019/111), and predoctoral research scholarship ACIF/2017/055; and the Universitat Politecnica de Valencia's PAID-01-17 R&D support programme.Baquero-Arnal, P.; Jorge-Cano, J.; Giménez Pastor, A.; Iranzo-Sánchez, J.; Pérez-González De Martos, AM.; Garcés Díaz-Munío, G.; Silvestre Cerdà, JA.... (2022). MLLP-VRAIN Spanish ASR Systems for the Albayzín-RTVE 2020 Speech-to-Text Challenge: Extension. Applied Sciences. 12(2):1-14. https://doi.org/10.3390/app1202080411412

Directory of Open Access Journals

RiuNet

Doblaje automático de vídeo-charlas educativas en UPV[Media]

Author: Baquero Arnal Pau
Civera Saiz Jorge
Garcés Díaz-Munío Gonzalo Vicente
Giménez Pastor Adrián
Iranzo Sánchez Javier
Jorge Cano Javier
Juan Císcar Alfonso
Pérez González de Martos Alejandro Manuel
Sanchis Navarro José Alberto
Silvestre Cerdà Joan Albert
Turró Ribalta Carlos
Publication venue: 'Universitat Politecnica de Valencia'
Publication date: 09/01/2023
Field of study

[EN] More and more universities are banking on the production of digital contents to support online or blended learning in higher education. Over the last years, the MLLP research group has been working closely with the UPV’s ASIC media services in order to enrich educational multimedia resources through the application of natural language processing technologies including automatic speech recognition, machine translation and text-tospeech. In this work we present the steps that are being followed for the comprehensive translation of these materials, specifically through (semi-)automatic dubbing by making use of state-of-the-art speaker-adaptive text-to-speech technologies.[ES] Cada vez son más las universidades que apuestan por la producción de contenidos digitales como apoyo al aprendizaje en lı́nea o combinado en la enseñanza superior. El grupo de investigación MLLP lleva años trabajando junto al ASIC de la UPV para enriquecer estos materiales, y particularmente su accesibilidad y oferta lingüı́stica, haciendo uso de tecnologı́as del lenguaje como el reconocimiento automático del habla, la traducción automática y la sı́ntesis de voz. En este trabajo presentamos los pasos que se están dando hacia la traducción integral de estos materiales, concretamente a través del doblaje (semi-)automático mediante sistemas de sı́ntesis de voz adaptables al locutor.Este trabajo ha recibido financiación del Gobierno de España a través de la subvención RTI2018-094879-B-I00 financiada por MCIN/AEI/10.13039/501100011033 (Multisub) y por ”FEDER Una manera de hacer Europa”; del programa Erasmus+ Educación a través del acuerdo de subvención 20-226-093604-SCH (EXPERT); and by the European Union’s Horizon 2020 research and innovation programme under grant agreement no. 761758 (X5gon).Pérez González De Martos, AM.; Giménez Pastor, A.; Jorge Cano, J.; Iranzo Sánchez, J.; Silvestre Cerdà, JA.; Garcés Díaz-Munío, GV.; Baquero Arnal, P.... (2023). Doblaje automático de vídeo-charlas educativas en UPV[Media]. En In-Red 2022 - VIII Congreso Nacional de Innovación Educativa y Docencia en Red. Editorial Universitat Politècnica de València. https://doi.org/10.4995/INRED2022.2022.1584

RiuNet

TransLectures

Author: Andrés Ferrer Jesús
Civera Saiz Jorge
Del Agua Teba Miguel Angel
Garcés Díaz-Munío Gonzalo Vicente
Gascó Mora Guillem
Giménez Pastor Adrián
Juan Císcar Alfonso
Martínez-Villaronga Adrià Agustí
Pérez González de Martos Alejandro Manuel
Sanchis Navarro José Alberto
Serrano Martínez-Santos Nicolás
Silvestre Cerdà Joan Albert
Spencer Rachel Nadine
Sánchez-Cortina Isaías
Valor Miró Juan Daniel
Publication venue: IberSPEECH 2012
Publication date: 21/11/2012
Field of study

transLectures (Transcription and Translation of Video Lectures) is an EU STREP project in which advanced automatic speech recognition and machine translation techniques are being tested on large video lecture repositories. The project began in November 2011 and will run for three years. This paper will outline the project¿s main motivation and objectives, and give a brief description of the two main repositories being considered: VideoLectures.NET and poliMedia. The first results obtained by the UPV group for the poliMedia repository will also be provided.The research leading to these results has received funding from the European Union Seventh Framework Programme (FP7/2007-2013) under grant agreement no. 287755. Funding was also provided by the Spanish Government (iTrans2 project, TIN2009-14511; FPI scholarship BES-2010-033005; FPU scholarship AP2010-4349)Silvestre Cerdà, JA.; Del Agua Teba, MA.; Garcés Díaz-Munío, GV.; Gascó Mora, G.; Giménez Pastor, A.; Martínez-Villaronga, AA.; Pérez González De Martos, AM.... (2012). TransLectures. IberSPEECH 2012. 345-351. http://hdl.handle.net/10251/3729034535

RiuNet

Towards cross-lingual voice cloning in higher education

Author: Civera Saiz Jorge
Garcés Díaz-Munío Gonçal
Giménez Pastor Adrián
Jiménez Manuel
Juan Alfons
Pérez-González de Martos Alejandro Manuel
Sanchis Navarro José Alberto
Silvestre Cerdà Joan Albert
Turró Ribalta Carlos
Publication venue: 'Elsevier BV'
Publication date: 01/10/2021
Field of study

[EN] The rapid progress of modern AI tools for automatic speech recognition and machine translation is leading to a progressive cost reduction to produce publishable subtitles for educational videos in multiple languages. Similarly, text-to-speech technology is experiencing large improvements in terms of quality, flexibility and capabilities. In particular, state-of-the-art systems are now capable of seamlessly dealing with multiple languages and speakers in an integrated manner, thus enabling lecturer¿s voice cloning in languages she/he might not even speak. This work is to report the experience gained on using such systems at the Universitat Politècnica de València (UPV), mainly as a guidance for other educational organizations willing to conduct similar studies. It builds on previous work on the UPV¿s main repository of educational videos, MediaUPV, to produce multilingual subtitles at scale and low cost. Here, a detailed account is given on how this work has been extended to also allow for massive machine dubbing of MediaUPV. This includes collecting 59 h of clean speech data from UPV¿s academic staff, and extending our production pipeline of subtitles with a state-of-the-art multilingual and multi-speaker text-to-speech system trained from the collected data. Our main result comes from an extensive, subjective evaluation of this system by lecturers contributing to data collection. In brief, it is shown that text-to-speech technology is not only mature enough for its application to MediaUPV, but also needed as soon as possible by students to improve its accessibility and bridge language barriers.We wish first to thank all UPV lecturers who made this study possi-ble. We are also very grateful for the funding support received by the European Union's Horizon 2020 research and innovation programme under grant agreement no. 761758 (X5gon) , the Spanish government under grant RTI2018-094879-B-I00 (Multisub, MCIU/AEI/FEDER) , and the Universitat Politecnica de Valencia's, Spain PAID-01-17 R&D sup-port programme. Funding for open access charge: CRUE-Universitat Politecnica de ValenciaPérez-González De Martos, AM.; Garcés Díaz-Munío, G.; Giménez Pastor, A.; Silvestre Cerdà, JA.; Sanchis Navarro, JA.; Civera Saiz, J.; Jiménez, M.... (2021). Towards cross-lingual voice cloning in higher education. Engineering Applications of Artificial Intelligence. 105:1-9. https://doi.org/10.1016/j.engappai.2021.104413S1910

RiuNet

Europarl-ASR: A Large Corpus of Parliamentary Debates for Streaming ASR Benchmarking and Speech Data Filtering/Verbatimization

Author: Baquero-Arnal Pau
Civera Saiz Jorge
Garcés Díaz-Munío Gonçal
Giménez Pastor Adrián
Iranzo-Sánchez Javier
Jorge-Cano Javier
Juan Alfons
Pérez-González de Martos Alejandro Manuel
Roselló Nahuel
Sanchis Navarro José Alberto
Silvestre Cerdà Joan Albert
Publication venue: 'International Speech Communication Association'
Publication date: 03/09/2021
Field of study

[EN] We introduce Europarl-ASR, a large speech and text corpus of parliamentary debates including 1300 hours of transcribed speeches and 70 million tokens of text in English extracted from European Parliament sessions. The training set is labelled with the Parliament¿s non-fully-verbatim official transcripts, time-aligned. As verbatimness is critical for acoustic model training, we also provide automatically noise-filtered and automatically verbatimized transcripts of all speeches based on speech data filtering and verbatimization techniques. Additionally, 18 hours of transcribed speeches were manually verbatimized to build reliable speaker-dependent and speaker-independent development/test sets for streaming ASR benchmarking. The availability of manual non-verbatim and verbatim transcripts for dev/test speeches makes this corpus useful for the assessment of automatic filtering and verbatimization techniques. This paper describes the corpus and its creation, and provides off-line and streaming ASR baselines for both the speaker-dependent and speaker-independent tasks using the three training transcription sets. The corpus is publicly released under an open licence.[Otros] "Europarl-ASR: Un extens corpus parlamentari de referència per a reconeixement de la parla i filtratge/literalització de transcripcions": Presentem Europarl-ASR, un extens corpus de veu i text de debats parlamentaris amb 1300 hores d'intervencions transcrites i 70 milions de paraules de text en anglés extrets de sessions del Parlament Europeu. Les transcripcions oficials del Parlament Europeu, no literals, s'han sincronitzat per a tot el conjunt d'entrenament. Com que l'entrenament de models acústics requereix transcripcions com més literals millor, també s'han inclòs transcripcions filtrades i transcripcions literalitzades de totes les intervencions, basades en tècniques de filtratge i literalització automàtics. A més, s'han inclòs 18 hores de transcripcions literals revisades manualment per definir dos conjunts de validació i avaluació de referència per a reconeixement automàtic de la parla en temps real, amb oradors coneguts i amb oradors desconeguts. Pel fet de disposar de transcripcions literals i no literals, aquest corpus és també ideal per a l'anàlisi de tècniques de filtratge i de literalització. En aquest article, es descriu la creació del corpus i es proporcionen mesures de referència de reconeixement automàtic de la parla en temps real i en diferit, amb oradors coneguts i amb oradors desconeguts, usant els tres conjunts de transcripcions d'entrenament. El corpus es fa públic amb una llicència oberta.This work has received funding from the EU¿s H2020 research and innovation programme under grant agreements 761758 (X5gon) and 952215 (TAILOR); the Government of Spain¿s research project Multisub (RTI2018-094879-B-I00, MCIU/AEI/FEDER,EU) and FPU scholarships FPU14/03981 and FPU18/04135; the Generalitat Valenciana¿s research project Classroom Activity Recognition (PROMETEO/2019/111) and predoctoral research scholarship ACIF/2017/055; and the Universitat Politecnica de València¿s ` PAID-01-17 R&D support programme.Garcés Díaz-Munío, G.; Silvestre Cerdà, JA.; Jorge-Cano, J.; Giménez Pastor, A.; Iranzo-Sánchez, J.; Baquero-Arnal, P.; Roselló, N.... (2021). Europarl-ASR: A Large Corpus of Parliamentary Debates for Streaming ASR Benchmarking and Speech Data Filtering/Verbatimization. International Speech Communication Association (ISCA). 3695-3699. https://doi.org/10.21437/Interspeech.2021-19053695369

RiuNet