30 research outputs found

    Integrating a State-of-the-Art ASR System into the Opencast Matterhorn Platform

    Full text link
    [EN] In this paper we present the integration of a state-of-the-art ASR system into the Opencast Matterhorn platform, a free, open-source platform to support the management of educational audio and video content. The ASR system was trained on a novel large speech corpus, known as poliMedia, that was manually transcribed for the European project transLectures. This novel corpus contains more than 115 hours of transcribed speech that will be available for the research community. Initial results on the poliMedia corpus are also reported to compare the performance of different ASR systems based on the linear interpolation of language models. To this purpose, the in-domain poliMedia corpus was linearly interpolated with an external large-vocabulary dataset, the well-known Google N-Gram corpus. WER figures reported denote the notable improvement over the baseline performance as a result of incorporating the vast amount of data represented by the Google N-Gram corpus.The research leading to these results has received funding from the European Union Seventh Framework Programme (FP7/2007-2013) under grant agreement no 287755. Also supported by the Spanish Government (MIPRCV ”Consolider Ingenio 2010” and iTrans2 TIN2009-14511) and the Generalitat Valenciana (Prometeo/2009/014).Valor Miró, JD.; Pérez González De Martos, AM.; Civera Saiz, J.; Juan Císcar, A. (2012). Integrating a State-of-the-Art ASR System into the Opencast Matterhorn Platform. Communications in Computer and Information Science. 328:237-246. https://doi.org/10.1007/978-3-642-35292-8_25S237246328UPVLC, XEROX, JSI-K4A, RWTH, EML, DDS: transLectures: Transcription and Translation of Video Lectures. In: Proc. of EAMT, p. 204 (2012)Zhan, P., Ries, K., Gavalda, M., Gates, D., Lavie, A., Waibel, A.: JANUS-II: towards spontaneous Spanish speech recognition 4, 2285–2288 (1996)Nogueiras, A., Fonollosa, J.A.R., Bonafonte, A., Mariño, J.B.: RAMSES: El sistema de reconocimiento del habla continua y gran vocabulario desarrollado por la UPC. In: VIII Jornadas de I+D en Telecomunicaciones, pp. 399–408 (1998)Huang, X., Alleva, F., Hon, H.W., Hwang, M.Y., Rosenfeld, R.: The SPHINX-II Speech Recognition System: An Overview. Computer, Speech and Language 7, 137–148 (1992)Speech and Language Technology Group. Sumat: An online service for subtitling by machine translation (May 2012), http://www.sumat-project.euBroman, S., Kurimo, M.: Methods for combining language models in speech recognition. In: Proc. of Interspeech, pp. 1317–1320 (2005)Liu, X., Gales, M., Hieronymous, J., Woodland, P.: Use of contexts in language model interpolation and adaptation. In: Proc. of Interspeech (2009)Liu, X., Gales, M., Hieronymous, J., Woodland, P.: Language model combination and adaptation using weighted finite state transducers (2010)Goodman, J.T.: Putting it all together: Language model combination. In: Proc. of ICASSP, pp. 1647–1650 (2000)Lööf, J., Gollan, C., Hahn, S., Heigold, G., Hoffmeister, B., Plahl, C., Rybach, D., Schlüter, R., Ney, H.: The rwth 2007 tc-star evaluation system for european english and spanish. In: Proc. of Interspeech, pp. 2145–2148 (2007)Rybach, D., Gollan, C., Heigold, G., Hoffmeister, B., Lööf, J., Schlüter, R., Ney, H.: The rwth aachen university open source speech recognition system. In: Proc. of Interspeech, pp. 2111–2114 (2009)Stolcke, A.: SRILM - An Extensible Language Modeling Toolkit. In: Proc. of ICSLP (2002)Michel, J.B., et al.: Quantitative analysis of culture using millions of digitized books. Science 331(6014), 176–182Turro, C., Cañero, A., Busquets, J.: Video learning objects creation with polimedia. In: 2010 IEEE International Symposium on Multimedia (ISM), December 13-15, pp. 371–376 (2010)Barras, C., Geoffrois, E., Wu, Z., Liberman, M.: Transcriber: development and use of a tool for assisting speech corpora production. Speech Communication Special Issue on Speech Annotation and Corpus Tools 33(1-2) (2000)Apache. Apache felix (May 2012), http://felix.apache.org/site/index.htmlOsgi alliance. osgi r4 service platform (May 2012), http://www.osgi.org/Main/HomePageSahidullah, M., Saha, G.: Design, analysis and experimental evaluation of block based transformation in MFCC computation for speaker recognition 54(4), 543–565 (2012)Gascó, G., Rocha, M.-A., Sanchis-Trilles, G., Andrés-Ferrer, J., Casacuberta, F.: Does more data always yield better translations? In: Proc. of EACL, pp. 152–161 (2012)Sánchez-Cortina, I., Serrano, N., Sanchis, A., Juan, A.: A prototype for interactive speech transcription balancing error and supervision effort. In: Proc. of IUI, pp. 325–326 (2012

    A System Architecture to Support Cost-Effective Transcription and Translation of Large Video Lecture Repositories

    Full text link
    [EN] Online video lecture repositories are rapidly growing and becoming established as fundamental knowledge assets. However, most lectures are neither transcribed nor translated because of the lack of cost-effective solutions that can give accurate enough results. In this paper, we describe a system architecture that supports the cost-effective transcription and translation of large video lecture repositories. This architecture has been adopted in the EU project transLectures and is now being tested on a repository of more than 9000 video lectures at the Universitat Politecnica de Valencia. Following a brief description of this repository and of the transLectures project, we describe the proposed system architecture in detail. We also report empirical results on the quality of the transcriptions and translations currently being maintained and steadily improved.The research leading to these results has received funding from the European Union Seventh Framework Programme (FP7/2007-2013) under grant agreement no. 287755. Funding was also provided by the Spanish Government with the FPU scholarship AP2010-4349.Silvestre Cerdà, JA.; Pérez González De Martos, AM.; Jiménez López, M.; Turró Ribalta, C.; Juan Císcar, A.; Civera Saiz, J. (2013). A System Architecture to Support Cost-Effective Transcription and Translation of Large Video Lecture Repositories. IEEE International Conference on Systems, Man, and Cybernetics. Conference proceedings. 3994-3999. https://doi.org/10.1109/SMC.2013.682S3994399

    Using automatic speech transcriptions in lecture recommendation systems

    Full text link
    One problem created by the success of video lecture repositories is the difficulty faced by individual users when choosing the most suitable video for their learning needs from among the vast numbers available on a given site. Recommender systems have become extremely common in recent years and are used in many areas. In the particular case of video lectures, automatic speech transcriptions can be used to zoom in on user interests at a semantic level, thereby improving the quality of the recommendations made. In this paper, we describe a video lecture recommender system that uses automatic speech transcriptions, alongside other relevant text resources, to generate semantic lecture and user models. In addition, we present a real-life implementation of this system for the VideoLectures.NET repository.The research leading to these results has received funding from the PASCAL2 Network of Excellence under the PASCAL Harvest Project La Vie, the EU 7th Framework Programme (FP7/2007-2013) under grant agreement no. 287755 (transLectures), the ICT Policy Support Programme (ICT PSP/2007-2013) as part of the Competitiveness and Innovation Framework Programme (CIP) under grant agreement no. 621030 (EMMA), the Spanish MINECO Active2Trans (TIN2012-31723) research project, and by the Spanish Government with the FPU scholarship AP2010-4349.Pérez González De Martos, AM.; Silvestre Cerdà, JA.; Rihtar, M.; Juan Císcar, A.; Civera Saiz, J. (2014). Using automatic speech transcriptions in lecture recommendation systems. En Conference Proceedings iberSPEECH 2014 : VIII Jornadas en Tecnologías del Habla and IV SLTech Workshop. Universidad de Las Palmas de Gran Canaria. 149-158. http://hdl.handle.net/10251/54395S14915

    MLLP Transcription and Translation Platform

    Full text link
    This paper briefly presents the main features of MLLP s Transcription and Translation Platform, which uses state-of-the-art automatic speech recognition and machine translation systems to generate multilingual subtitles of educational audiovisual and textual content. It has proven to reduce user effort up to 1/3 of the time needed to generate transcriptions and translations from the scratch.Pérez González De Martos, AM.; Silvestre Cerdà, JA.; Valor Miró, JD.; Civera Saiz, J.; Juan Císcar, A. (2015). MLLP Transcription and Translation Platform. Springer. http://hdl.handle.net/10251/65747

    MLLP-VRAIN Spanish ASR Systems for the Albayzín-RTVE 2020 Speech-to-Text Challenge: Extension

    Full text link
    [EN] This paper describes the automatic speech recognition (ASR) systems built by the MLLP-VRAIN research group of Universitat Politècnica de València for the Albayzín-RTVE 2020 Speech-to-Text Challenge, and includes an extension of the work consisting of building and evaluating equivalent systems under the closed data conditions from the 2018 challenge. The primary system (p-streaming_1500ms_nlt) was a hybrid ASR system using streaming one-pass decoding with a context window of 1.5 seconds. This system achieved 16.0% WER on the test-2020 set. We also submitted three contrastive systems. From these, we highlight the system c2-streaming_600ms_t which, following a similar configuration as the primary system with a smaller context window of 0.6 s, scored 16.9% WER points on the same test set, with a measured empirical latency of 0.81 ± 0.09 s (mean ± stdev). That is, we obtained state-of-the-art latencies for high-quality automatic live captioning with a small WER degradation of 6% relative. As an extension, the equivalent closed-condition systems obtained 23.3% WER and 23.5% WER, respectively. When evaluated with an unconstrained language model, we obtained 19.9% WER and 20.4% WER; i.e., not far behind the top-performing systems with only 5% of the full acoustic data and with the extra ability of being streaming-capable. Indeed, all of these streaming systems could be put into production environments for automatic captioning of live media streams.The research leading to these results has received funding from the European Union's Horizon 2020 research and innovation programme under grant agreements no. 761758 (X5Gon) and 952215 (TAILOR), and Erasmus+ Education programme under grant agreement no. 20-226-093604-SCH (EXPERT); the Government of Spain's grant RTI2018-094879-B-I00 (Multisub) funded by MCIN/AEI/10.13039/501100011033 & "ERDF A way of making Europe", and FPU scholarships FPU14/03981 and FPU18/04135; the Generalitat Valenciana's research project Classroom Activity Recognition (ref. PROMETEO/2019/111), and predoctoral research scholarship ACIF/2017/055; and the Universitat Politecnica de Valencia's PAID-01-17 R&D support programme.Baquero-Arnal, P.; Jorge-Cano, J.; Giménez Pastor, A.; Iranzo-Sánchez, J.; Pérez-González De Martos, AM.; Garcés Díaz-Munío, G.; Silvestre Cerdà, JA.... (2022). MLLP-VRAIN Spanish ASR Systems for the Albayzín-RTVE 2020 Speech-to-Text Challenge: Extension. Applied Sciences. 12(2):1-14. https://doi.org/10.3390/app1202080411412

    Evaluación del proceso de revisión de transcripciones automáticas para vídeos Polimedia

    Full text link
    [EN] Video lectures are a tool of proven value and wide acceptance in universities that are leading to platforms like poliMedia. transLectures is a European project that generates automatic high-quality transcriptions and translations for the poliMedia platform, and improve them by using massive adaptation and intelligent interaction techniques. In this paper we present the evaluation with lecturers carried out under the Doc`encia en Xarxa 2012-2013 call, with the aim to study the process of supervise transcriptions, compared with to transcribe from scratch.[ES] Los vídeos docentes son una herramienta de demostrada utilidad y gran aceptación en el mundo universitario que están dando lugar a plataformas como poliMedia. transLectures es un proyecto europeo que genera transcripciones y traducciones autom´aticas de alta calidad para la plataforma poliMedia, mediante t´ecnicas de adaptaci´on masiva e interacci´on inteligente. En este art´ıculo presentamos la evaluaci´on con profesores que se realiz´o en el marco de Doc`encia en Xarxa 2012-2013, con el objetivo de estudiar el proceso de supervisi´on de transcripciones, compar´andolo con la obtenci´on de la transcripci´on sin disponer de una transcripci´on autom´atica previa.*The research leading to these results has received funding from the European Union Seventh Framework Programme (FP7/2007-2013) under grant agreement no. 287755.Valor Miró, JD.; Nadine Spencer, R.; Pérez González De Martos, AM.; Garcés Díaz-Munío, GV.; Turró Ribalta, C.; Civera Saiz, J.; Juan, A. (2014). Evaluación del proceso de revisión de transcripciones automáticas para vídeos Polimedia. En Jornadas de Innovación Educativa y Docencia en Red de la Universitat Politècnica de València. Editorial Universitat Politècnica de València. 272-278. http://hdl.handle.net/10251/54397S27227

    Doblaje automático de vídeo-charlas educativas en UPV[Media]

    Full text link
    [EN] More and more universities are banking on the production of digital contents to support online or blended learning in higher education. Over the last years, the MLLP research group has been working closely with the UPV’s ASIC media services in order to enrich educational multimedia resources through the application of natural language processing technologies including automatic speech recognition, machine translation and text-tospeech. In this work we present the steps that are being followed for the comprehensive translation of these materials, specifically through (semi-)automatic dubbing by making use of state-of-the-art speaker-adaptive text-to-speech technologies.[ES] Cada vez son más las universidades que apuestan por la producción de contenidos digitales como apoyo al aprendizaje en lı́nea o combinado en la enseñanza superior. El grupo de investigación MLLP lleva años trabajando junto al ASIC de la UPV para enriquecer estos materiales, y particularmente su accesibilidad y oferta lingüı́stica, haciendo uso de tecnologı́as del lenguaje como el reconocimiento automático del habla, la traducción automática y la sı́ntesis de voz. En este trabajo presentamos los pasos que se están dando hacia la traducción integral de estos materiales, concretamente a través del doblaje (semi-)automático mediante sistemas de sı́ntesis de voz adaptables al locutor.Este trabajo ha recibido financiación del Gobierno de España a través de la subvención RTI2018-094879-B-I00 financiada por MCIN/AEI/10.13039/501100011033 (Multisub) y por ”FEDER Una manera de hacer Europa”; del programa Erasmus+ Educación a través del acuerdo de subvención 20-226-093604-SCH (EXPERT); and by the European Union’s Horizon 2020 research and innovation programme under grant agreement no. 761758 (X5gon).Pérez González De Martos, AM.; Giménez Pastor, A.; Jorge Cano, J.; Iranzo Sánchez, J.; Silvestre Cerdà, JA.; Garcés Díaz-Munío, GV.; Baquero Arnal, P.... (2023). Doblaje automático de vídeo-charlas educativas en UPV[Media]. En In-Red 2022 - VIII Congreso Nacional de Innovación Educativa y Docencia en Red. Editorial Universitat Politècnica de València. https://doi.org/10.4995/INRED2022.2022.1584

    TransLectures

    Full text link
    transLectures (Transcription and Translation of Video Lectures) is an EU STREP project in which advanced automatic speech recognition and machine translation techniques are being tested on large video lecture repositories. The project began in November 2011 and will run for three years. This paper will outline the project¿s main motivation and objectives, and give a brief description of the two main repositories being considered: VideoLectures.NET and poliMedia. The first results obtained by the UPV group for the poliMedia repository will also be provided.The research leading to these results has received funding from the European Union Seventh Framework Programme (FP7/2007-2013) under grant agreement no. 287755. Funding was also provided by the Spanish Government (iTrans2 project, TIN2009-14511; FPI scholarship BES-2010-033005; FPU scholarship AP2010-4349)Silvestre Cerdà, JA.; Del Agua Teba, MA.; Garcés Díaz-Munío, GV.; Gascó Mora, G.; Giménez Pastor, A.; Martínez-Villaronga, AA.; Pérez González De Martos, AM.... (2012). TransLectures. IberSPEECH 2012. 345-351. http://hdl.handle.net/10251/3729034535

    Abstracts from the Food Allergy and Anaphylaxis Meeting 2016

    Get PDF
    corecore