8 research outputs found

    Language model adaptation for video lectures transcription

    Full text link
    Videolectures are currently being digitised all over the world for its enormous value as reference resource. Many of these lectures are accompanied with slides. The slides offer a great opportunity for improving ASR systems performance. We propose a simple yet powerful extension to the linear interpolation of language models for adapting language models with slide information. Two types of slides are considered, correct slides, and slides automatic extracted from the videos with OCR. Furthermore, we compare both time aligned and unaligned slides. Results report an improvement of up to 3.8 % absolute WER points when using correct slides. Surprisingly, when using automatic slides obtained with poor OCR quality, the ASR system still improves up to 2.2 absolute WER points.The research leading to these results has received funding from the European Union Seventh Framework Programme (FP7/2007-2013) under grant agreement no 287755 (transLectures). Also supported by the Spanish Government (Plan E, iTrans2 TIN2009-14511).Martínez-Villaronga, A.; Del Agua Teba, MA.; Andrés Ferrer, J.; Juan Císcar, A. (2013). Language model adaptation for video lectures transcription. En Acoustics, Speech and Signal Processing (ICASSP), 2013 IEEE International Conference on. IInstitute of Electrical and Electronics Engineers (IEEE). 8450-8454. https://doi.org/10.1109/ICASSP.2013.6639314S8450845

    Language model adaptation for video lecture transcription

    Full text link
    [ES] En este trabajo proponemos un método de adaptación de modelos de lenguaje a charlas docentes en el contexto de la transcripción automática de videos docentes. Examinaremos diferentes variaciones de la técnica de adaptación, obteniendo mejoras en el WER significativas para el repositorio en castellano poliMedia[EN] In this work we propose a method to adapt language models to specific lectures in the context of video lectures automatic transcriptions. We explore different variations of the adaptation technique obtaining significant WER reduction for the Spanish repository poliMediaMartínez Villaronga, AA. (2013). Language model adaptation for video lecture transcription. http://hdl.handle.net/10251/37114Archivo delegad

    Speaker-adapted confidence measures for speech recognition of video lectures

    Full text link
    [EN] Automatic speech recognition applications can benefit from a confidence measure (CM) to predict the reliability of the output. Previous works showed that a word-dependent native Bayes (NB) classifier outperforms the conventional word posterior probability as a CM. However, a discriminative formulation usually renders improved performance due to the available training techniques. Taking this into account, we propose a logistic regression (LR) classifier defined with simple input functions to approximate to the NB behaviour. Additionally, as a main contribution, we propose to adapt the CM to the speaker in cases in which it is possible to identify the speakers, such as online lecture repositories. The experiments have shown that speaker-adapted models outperform their non-adapted counterparts on two difficult tasks from English (videoLectures.net) and Spanish (poliMedia) educational lectures. They have also shown that the NB model is clearly superseded by the proposed LR classifier.The research leading to these results has received funding from the European Union Seventh Framework Programme (FP7/2007-2013) under grant agreement no 287755. Also supported by the Spanish MINECO (iTrans2 TIN2009-14511 and Active2Trans TIN2012-31723) research projects and the FPI Scholarship BES-2010-033005.Sanchez-Cortina, I.; Andrés Ferrer, J.; Sanchis Navarro, JA.; Juan Císcar, A. (2016). Speaker-adapted confidence measures for speech recognition of video lectures. Computer Speech and Language. 37:11-23. https://doi.org/10.1016/j.csl.2015.10.003S11233

    Evaluating intelligent interfaces for post-editing automatic transcriptions of online video lectures

    Full text link
    Video lectures are fast becoming an everyday educational resource in higher education. They are being incorporated into existing university curricula around the world, while also emerging as a key component of the open education movement. In 2007, the Universitat Politècnica de València (UPV) implemented its poliMedia lecture capture system for the creation and publication of quality educational video content and now has a collection of over 10,000 video objects. In 2011, it embarked on the EU-subsidised transLectures project to add automatic subtitles to these videos in both Spanish and other languages. By doing so, it allows access to their educational content by non-native speakers and the deaf and hard-of-hearing, as well as enabling advanced repository management functions. In this paper, following a short introduction to poliMedia, transLectures and Docència en Xarxa (Teaching Online), the UPV s action plan to boost the use of digital resources at the university, we will discuss the three-stage evaluation process carried out with the collaboration of UPV lecturers to find the best interaction protocol for the task of post-editing automatic subtitles.Valor Miró, JD.; Spencer, RN.; Pérez González De Martos, AM.; Garcés Díaz-Munío, GV.; Turró Ribalta, C.; Civera Saiz, J.; Juan Císcar, A. (2014). Evaluating intelligent interfaces for post-editing automatic transcriptions of online video lectures. Open Learning: The Journal of Open and Distance Learning. 29(1):72-85. doi:10.1080/02680513.2014.909722S7285291Fujii, A., Itou, K., & Ishikawa, T. (2006). LODEM: A system for on-demand video lectures. Speech Communication, 48(5), 516-531. doi:10.1016/j.specom.2005.08.006Gilbert, M., Knight, K., & Young, S. (2008). Spoken Language Technology [From the Guest Editors]. IEEE Signal Processing Magazine, 25(3), 15-16. doi:10.1109/msp.2008.918412Leggetter, C. J., & Woodland, P. C. (1995). Maximum likelihood linear regression for speaker adaptation of continuous density hidden Markov models. Computer Speech & Language, 9(2), 171-185. doi:10.1006/csla.1995.0010Proceedings of the 9th ACM SIGCHI New Zealand Chapter’s International Conference on Human-Computer Interaction Design Centered HCI - CHINZ ’08. (2008). doi:10.1145/1496976Martinez-Villaronga, A., del Agua, M. A., Andres-Ferrer, J., & Juan, A. (2013). Language model adaptation for video lectures transcription. 2013 IEEE International Conference on Acoustics, Speech and Signal Processing. doi:10.1109/icassp.2013.6639314Munteanu, C., Baecker, R., & Penn, G. (2008). Collaborative editing for improved usefulness and usability of transcript-enhanced webcasts. Proceeding of the twenty-sixth annual CHI conference on Human factors in computing systems - CHI ’08. doi:10.1145/1357054.1357117Repp, S., Gross, A., & Meinel, C. (2008). Browsing within Lecture Videos Based on the Chain Index of Speech Transcription. IEEE Transactions on Learning Technologies, 1(3), 145-156. doi:10.1109/tlt.2008.22Proceedings of the 2012 ACM international conference on Intelligent User Interfaces - IUI ’12. (2012). doi:10.1145/2166966Serrano, N., Giménez, A., Civera, J., Sanchis, A., & Juan, A. (2013). Interactive handwriting recognition with limited user effort. International Journal on Document Analysis and Recognition (IJDAR), 17(1), 47-59. doi:10.1007/s10032-013-0204-5Torre Toledano, D., Ortega Giménez, A., Teixeira, A., González Rodríguez, J., Hernández Gómez, L., San Segundo Hernández, R., & Ramos Castro, D. (Eds.). (2012). Advances in Speech and Language Technologies for Iberian Languages. Communications in Computer and Information Science. doi:10.1007/978-3-642-35292-8Wald, M. (2006). Creating accessible educational multimedia through editing automatic speech recognition captioning in real time. Interactive Technology and Smart Education, 3(2), 131-141. doi:10.1108/1741565068000005

    Evaluation of innovative computer-assisted transcription and translation strategies for video lecture repositories

    Full text link
    Nowadays, the technology enhanced learning area has experienced a strong growth with many new learning approaches like blended learning, flip teaching, massive open online courses, and open educational resources to complement face-to-face lectures. Specifically, video lectures are fast becoming an everyday educational resource in higher education for all of these new learning approaches, and they are being incorporated into existing university curricula around the world. Transcriptions and translations can improve the utility of these audiovisual assets, but rarely are present due to a lack of cost-effective solutions to do so. Lecture searchability, accessibility to people with impairments, translatability for foreign students, plagiarism detection, content recommendation, note-taking, and discovery of content-related videos are examples of advantages of the presence of transcriptions. For this reason, the aim of this thesis is to test in real-life case studies ways to obtain multilingual captions for video lectures in a cost-effective way by using state-of-the-art automatic speech recognition and machine translation techniques. Also, we explore interaction protocols to review these automatic transcriptions and translations, because unfortunately automatic subtitles are not error-free. In addition, we take a step further into multilingualism by extending our findings and evaluation to several languages. Finally, the outcomes of this thesis have been applied to thousands of video lectures in European universities and institutions.Hoy en día, el área del aprendizaje mejorado por la tecnología ha experimentado un fuerte crecimiento con muchos nuevos enfoques de aprendizaje como el aprendizaje combinado, la clase inversa, los cursos masivos abiertos en línea, y nuevos recursos educativos abiertos para complementar las clases presenciales. En concreto, los videos docentes se están convirtiendo rápidamente en un recurso educativo cotidiano en la educación superior para todos estos nuevos enfoques de aprendizaje, y se están incorporando a los planes de estudios universitarios existentes en todo el mundo. Las transcripciones y las traducciones pueden mejorar la utilidad de estos recursos audiovisuales, pero rara vez están presentes debido a la falta de soluciones rentables para hacerlo. La búsqueda de y en los videos, la accesibilidad a personas con impedimentos, la traducción para estudiantes extranjeros, la detección de plagios, la recomendación de contenido, la toma de notas y el descubrimiento de videos relacionados son ejemplos de las ventajas de la presencia de transcripciones. Por esta razón, el objetivo de esta tesis es probar en casos de estudio de la vida real las formas de obtener subtítulos multilingües para videos docentes de una manera rentable, mediante el uso de técnicas avanzadas de reconocimiento automático de voz y de traducción automática. Además, exploramos diferentes modelos de interacción para revisar estas transcripciones y traducciones automáticas, pues desafortunadamente los subtítulos automáticos no están libres de errores. Además, damos un paso más en el multilingüismo extendiendo nuestros hallazgos y evaluaciones a muchos idiomas. Por último, destacar que los resultados de esta tesis se han aplicado a miles de vídeos docentes en universidades e instituciones europeas.Hui en dia, l'àrea d'aprenentatge millorat per la tecnologia ha experimentat un fort creixement, amb molts nous enfocaments d'aprenentatge com l'aprenentatge combinat, la classe inversa, els cursos massius oberts en línia i nous recursos educatius oberts per tal de complementar les classes presencials. En concret, els vídeos docents s'estan convertint ràpidament en un recurs educatiu quotidià en l'educació superior per a tots aquests nous enfocaments d'aprenentatge i estan incorporant-se als plans d'estudi universitari existents arreu del món. Les transcripcions i les traduccions poden millorar la utilitat d'aquests recursos audiovisuals, però rara vegada estan presents a causa de la falta de solucions rendibles per fer-ho. La cerca de i als vídeos, l'accessibilitat a persones amb impediments, la traducció per estudiants estrangers, la detecció de plagi, la recomanació de contingut, la presa de notes i el descobriment de vídeos relacionats són un exemple dels avantatges de la presència de transcripcions. Per aquesta raó, l'objectiu d'aquesta tesi és provar en casos d'estudi de la vida real les formes d'obtenir subtítols multilingües per a vídeos docents d'una manera rendible, mitjançant l'ús de tècniques avançades de reconeixement automàtic de veu i de traducció automàtica. A més a més, s'exploren diferents models d'interacció per a revisar aquestes transcripcions i traduccions automàtiques, puix malauradament els subtítols automàtics no estan lliures d'errades. A més, es fa un pas més en el multilingüisme estenent els nostres descobriments i avaluacions a molts idiomes. Per últim, destacar que els resultats d'aquesta tesi s'han aplicat a milers de vídeos docents en universitats i institucions europees.Valor Miró, JD. (2017). Evaluation of innovative computer-assisted transcription and translation strategies for video lecture repositories [Tesis doctoral no publicada]. Universitat Politècnica de València. https://doi.org/10.4995/Thesis/10251/90496TESI

    Multilingual videos for MOOCs and OER

    Full text link
    [EN] Massive Open Online Courses (MOOCs) and Open Educational Resources (OER) are rapidly growing, but are not usually offered in multiple languages due to the lack of cost-effective solutions to translate the different objects comprising them and particularly videos. However, current state-of-the-art automatic speech recognition (ASR) and machine translation (MT) techniques have reached a level of maturity which opens the possibility of producing multilingual video subtitles of publishable quality at low cost. This work summarizes authors' experience in exploring this possibility in two real-life case studies: a MOOC platform and a large video lecture repository. Apart from describing the systems, tools and integration components employed for such purpose, a comprehensive evaluation of the results achieved is provided in terms of quality and efficiency. More precisely, it is shown that draft multilingual subtitles produced by domainadapted ASR/MT systems reach a level of accuracy that make them worth post-editing, instead of generating them ex novo, saving approximately 25%-75% of the time. Finally, the results reported on user multilingual data consumption reflect that multilingual subtitles have had a very positive impact in our case studies boosting student enrolment, in the case of the MOOC platform, by 70% relative.The research leading to these results has received funding from the European Union's Seventh Framework Programme (FP7/2007-2013) under grant agreement no. 287755 (transLectures) and from the EU's ICT Policy Support Programme as part of the Competitiveness and Innovation Framework Programme under grant agreement no. 621030 (EMMA). Additionally, it is supported by the Spanish research project TIN2015-68326-R (MINECO/FEDER).Valor Miró, JD.; Baquero-Arnal, P.; Civera Saiz, J.; Turró Ribalta, C.; Juan, A. (2018). Multilingual videos for MOOCs and OER. Educational Technology & Society. 21(2):1-12. http://hdl.handle.net/10251/122577S11221

    Aplicació de xarxes neuronals profundes en traducció automàtica per a recursos educatius oberts

    Full text link
    La Educación Abierta se ha convertido en una aproximación revolucionaria para el futuro de la educación permitiendo el acceso mundial a un gran volumen de Recursos Educativos Abiertos (REA). Un ejemplo emblemático de REA son los cursos "OpenCourseWare" (OCW) producidos por universidades y publicados gratuitamente en internet. Aunque los cursos OCW han tenido un gran impacto en la Educación Abierta, los llamados cursos online masivos y abiertos (MOOCs) están aumentando todavía más este impacto. A diferencia de los cursos OCW, los MOOCs ofrecen los contenidos de manera más estructurada, habilitan foros de discusión y ofrecen certificados académicos. En España, muchas instituciones relacionadas con la Educación han incrementado su producción de REA y comparten, o incluso han fundado, iniciativas relacionadas con los MOOCs como es el caso de Miríada X, UNED COMA, UniMOOC y UPV[X]. El rápido crecimiento de los REA y MOOCs no ha pasado desapercibido a gobiernos y organizaciones internacionales relacionadas con la Educación como lo demuestra la Declaración de París sobre REA de 2012 adoptada en el World OER Congress celebrado en la UNESCO. La Declaración mostró la importancia de los REA para el acceso universal a la Educación y enumeró 10 recomendaciones a los estados relacionados con la colaboración internacional y la accesibilidad. Siguiendo la Declaración de París, la Comisión Europea lanzó la agenda "Opening up Education" en Septiembre de 2013 con el fin de estimular la alta calidad y la innovación en el aprendizaje y la enseñanza mediante nuevas tecnologías y contenidos digitales. En ella se reconoce una falta de contenidos educativos de calidad en múltiples lenguas. Aunque existe una clara necesidad de servicios multilingües en la Educación Abierta, los proveedores de REA y, particularmente los MOOCs, no ofrecen comunicación multilingüe y muy ocasionalmente contenidos multilingües. Basado en la evidencia anterior, el TFM propuesto pretende contribuir al fomento de la Educación Abierta mediante acceso multilingüe a los REA y permitiendo la comunicación multilingüe en las plataformas MOOC. Conviene destacar que este proyecto se enmarca, tanto en objetivos como en metodología, en los siguientes proyectos del grupo MLLP: ¿ MORE: Multilingual Open Resources for Education Periodo: 1/1/2016 ¿ 31/12/2018 Proyecto de investigación financiado por el MINECO (TIN2015-68326-R) Investigadores responsables: Albert Sanchis y Alfons Juan ¿ X5gon: Cross Modal, Cross Cultural, Cross Lingual, Cross Domain, and Cross Site Global OER Network Periodo: 1/9/2017 ¿ 31/8/2020 Proyecto de investigación financiado por la CE (H2020-ICT-2016-2) Investigador responsable: Alfons Juan Más concretamente, el TFM propuesto se centrará en: ¿ La mejora de los modelos de lenguaje y traducción, mediante redes profundas y adaptación al dominio, en tareas de traducción relevantes en casos de estudio actuales del MLLP (poliMedia, VideoLectures, UC3M, etc.) ¿ Integración de resultados en casos de estudio del MLLP y, en particular, en el repositorio poliMedia de la UPV.Baquero Arnal, P. (2017). Aplicación de redes neuronales profundas en traducción automática para recursos educativos abiertos. http://hdl.handle.net/10251/93250TFG

    Language model adaptation for lecture transcription by document retrieval

    Full text link
    The final publication is available at Springer via http://dx.doi.org/10.1007/978-3-319-13623-3_14With the spread of MOOCs and video lecture repositories it is more important than ever to have accurate methods for automatically transcribing video lectures. In this work, we propose a simple yet effective language model adaptation technique based on document retrieval from the web. This technique is combined with slide adaptation, and compared against a strong baseline language model and a stronger slide-adapted baseline. These adaptation techniques are compared within two different acoustic models: a standard HMM model and the CD-DNN-HMM model. The proposed method obtains improvements on WER of up to 14% relative with respect to a competitive baseline as well as outperforming slide adaptation.The research leading to these results has received fund-ing from the European Union Seventh Framework Programme (FP7/2007-2013)under grant agreement no 287755 (transLectures) and ICT Policy Support Pro-gramme (ICT PSP/2007-2013) as part of the Competitiveness and Innovation Framework Programme (CIP) under grant agreement no 621030 (EMMA), the Spanish MINECO Active2Trans (TIN2012-31723) research project and the Spanish Government with the FPU scholarships FPU13/06241 and AP2010-4349.Martínez-Villaronga, A.; Del Agua Teba, MA.; Silvestre Cerdà, JA.; Andrés Ferrer, J.; Juan, A. (2014). Language model adaptation for lecture transcription by document retrieval. En Advances in Speech and Language Technologies for Iberian Languages. Springer Verlag (Germany). 129-137. https://doi.org/10.1007/978-3-319-13623-3_14S129137coursera.org: Take the World’s Best Courses, Online, For Free, http://www.coursera.org/poliMedia: Videolectures from the “Universitat Politècnica de València, http://polimedia.upv.es/catalogo/SuperLectures: We take full care of your event video recordings, http://www.superlectures.comtransLectures, https://translectures.eu/transLectures-UPV Toolkit (TLK) for Automatic Speech Recognition, http://translectures.eu/tlkUdacity: Learn, Think, Do, http://www.udacity.com/Videolectures.NET: Exchange Ideas and Share Knowledge, http://www.videolectures.net/del-Agua, M.A., Giménez, A., Serrano, N., Andrés-Ferrer, J., Civera, J., Sanchis, A., Juan, A.: The translectures-UPV toolkit. In: Navarro Mesa, J.L., Giménez, A.O., Teixeira, A. (eds.) IberSPEECH 2014. LNCS (LNAI), vol. 8854, pp. 269–278. Springer, Heidelberg (2014)Chang, P.C., Shan Lee, L.: Improved language model adaptation using existing and derived external resources. In: Proc. of ASRU, pp. 531–536 (2003)Chen, S.F., Goodman, J.: An empirical study of smoothing techniques for language modeling. Computer Speech & Language 13(4), 359–393 (1999)Jelinek, F., Mercer, R.L.: Interpolated Estimation of Markov Source Parameters from Sparse Data. In: Proc. of the Workshop on Pattern Recognition in Practice, pp. 381–397 (1980)Ketterl, M., Schulte, O.A., Hochman, A.: Opencast matterhorn: A community-driven open source solution for creation, management and distribution of audio and video in academia. In: Proc. of ISM, pp. 687–692 (2009)Kneser, R., Ney, H.: Improved Backing-off for M-gram Language Modeling. In: Proc. of ICASSP, pp. 181–184 (1995)Lecorv, G., Gravier, G., Sbillot, P.: An unsupervised web-based topic language model adaptation method. In: Proc. of ICASSP 2008, pp. 5081–5084 (2008)Martínez-Villaronga, A., del Agua, M.A., Andrés-Ferrer, J., Juan, A.: Language model adaptation for video lectures transcription. In: Proc. of ICASSP, pp. 8450–8454 (2013)Munteanu, C., Penn, G., Baecker, R.: Web-based language modelling for automatic lecture transcription. In: Proc. of INTERSPEECH, pp. 2353–2356 (2007)Rogina, I., Schaaf, T.: Lecture and presentation tracking in an intelligent meeting room. In: Proc of ICMI, pp. 47–52 (2002)Schlippe, T., Gren, L., Vu, N.T., Schultz, T.: Unsupervised language model adaptation for automatic speech recognition of broadcast news using web 2.0, pp. 2698–2702 (2013)Seide, F., Li, G., Chen, X., Yu, D.: Feature engineering in context-dependent deep neural networks for conversational speech transcription. In: Proc. of ASRU, pp. 24–29 (2011)Silvestre, J.A., et al.: Translectures. In: Proc. of IberSPEECH 2012, pp. 345–351 (2012)Smith, R.: An overview of the tesseract ocr engine. In: Proc. of ICDAR 2007, pp. 629–633 (2007)Stolcke, A.: SRILM – an extensible language modeling toolkit. In: Proc. of ICSLP, pp. 901–904 (2002)Tsiartas, A., Georgiou, P., Narayanan, S.: Language model adaptation using www documents obtained by utterance-based queries. In: Proc. of ICASSP, pp. 5406–5409 (2010
    corecore