Integrating a State-of-the-Art ASR System into the Opencast Matterhorn Platform

Abstract

[EN] In this paper we present the integration of a state-of-the-art ASR system into the Opencast Matterhorn platform, a free, open-source platform to support the management of educational audio and video content. The ASR system was trained on a novel large speech corpus, known as poliMedia, that was manually transcribed for the European project transLectures. This novel corpus contains more than 115 hours of transcribed speech that will be available for the research community. Initial results on the poliMedia corpus are also reported to compare the performance of different ASR systems based on the linear interpolation of language models. To this purpose, the in-domain poliMedia corpus was linearly interpolated with an external large-vocabulary dataset, the well-known Google N-Gram corpus. WER figures reported denote the notable improvement over the baseline performance as a result of incorporating the vast amount of data represented by the Google N-Gram corpus.The research leading to these results has received funding from the European Union Seventh Framework Programme (FP7/2007-2013) under grant agreement no 287755. Also supported by the Spanish Government (MIPRCV ”Consolider Ingenio 2010” and iTrans2 TIN2009-14511) and the Generalitat Valenciana (Prometeo/2009/014).Valor Miró, JD.; Pérez González De Martos, AM.; Civera Saiz, J.; Juan Císcar, A. (2012). Integrating a State-of-the-Art ASR System into the Opencast Matterhorn Platform. Communications in Computer and Information Science. 328:237-246. https://doi.org/10.1007/978-3-642-35292-8_25S237246328UPVLC, XEROX, JSI-K4A, RWTH, EML, DDS: transLectures: Transcription and Translation of Video Lectures. In: Proc. of EAMT, p. 204 (2012)Zhan, P., Ries, K., Gavalda, M., Gates, D., Lavie, A., Waibel, A.: JANUS-II: towards spontaneous Spanish speech recognition 4, 2285–2288 (1996)Nogueiras, A., Fonollosa, J.A.R., Bonafonte, A., Mariño, J.B.: RAMSES: El sistema de reconocimiento del habla continua y gran vocabulario desarrollado por la UPC. In: VIII Jornadas de I+D en Telecomunicaciones, pp. 399–408 (1998)Huang, X., Alleva, F., Hon, H.W., Hwang, M.Y., Rosenfeld, R.: The SPHINX-II Speech Recognition System: An Overview. Computer, Speech and Language 7, 137–148 (1992)Speech and Language Technology Group. Sumat: An online service for subtitling by machine translation (May 2012), http://www.sumat-project.euBroman, S., Kurimo, M.: Methods for combining language models in speech recognition. In: Proc. of Interspeech, pp. 1317–1320 (2005)Liu, X., Gales, M., Hieronymous, J., Woodland, P.: Use of contexts in language model interpolation and adaptation. In: Proc. of Interspeech (2009)Liu, X., Gales, M., Hieronymous, J., Woodland, P.: Language model combination and adaptation using weighted finite state transducers (2010)Goodman, J.T.: Putting it all together: Language model combination. In: Proc. of ICASSP, pp. 1647–1650 (2000)Lööf, J., Gollan, C., Hahn, S., Heigold, G., Hoffmeister, B., Plahl, C., Rybach, D., Schlüter, R., Ney, H.: The rwth 2007 tc-star evaluation system for european english and spanish. In: Proc. of Interspeech, pp. 2145–2148 (2007)Rybach, D., Gollan, C., Heigold, G., Hoffmeister, B., Lööf, J., Schlüter, R., Ney, H.: The rwth aachen university open source speech recognition system. In: Proc. of Interspeech, pp. 2111–2114 (2009)Stolcke, A.: SRILM - An Extensible Language Modeling Toolkit. In: Proc. of ICSLP (2002)Michel, J.B., et al.: Quantitative analysis of culture using millions of digitized books. Science 331(6014), 176–182Turro, C., Cañero, A., Busquets, J.: Video learning objects creation with polimedia. In: 2010 IEEE International Symposium on Multimedia (ISM), December 13-15, pp. 371–376 (2010)Barras, C., Geoffrois, E., Wu, Z., Liberman, M.: Transcriber: development and use of a tool for assisting speech corpora production. Speech Communication Special Issue on Speech Annotation and Corpus Tools 33(1-2) (2000)Apache. Apache felix (May 2012), http://felix.apache.org/site/index.htmlOsgi alliance. osgi r4 service platform (May 2012), http://www.osgi.org/Main/HomePageSahidullah, M., Saha, G.: Design, analysis and experimental evaluation of block based transformation in MFCC computation for speaker recognition 54(4), 543–565 (2012)Gascó, G., Rocha, M.-A., Sanchis-Trilles, G., Andrés-Ferrer, J., Casacuberta, F.: Does more data always yield better translations? In: Proc. of EACL, pp. 152–161 (2012)Sánchez-Cortina, I., Serrano, N., Sanchis, A., Juan, A.: A prototype for interactive speech transcription balancing error and supervision effort. In: Proc. of IUI, pp. 325–326 (2012

    Similar works

    Full text

    thumbnail-image

    Available Versions