24 research outputs found

    The RWTH Aachen German and English LVCSR systems for IWSLT-2013

    Get PDF
    Abstract In this paper, German and English large vocabulary continuous speech recognition (LVCSR) systems developed by the RWTH Aachen University for the IWSLT-2013 evaluation campaign are presented. Good improvements are obtained with state-of-the-art monolingual and multilingual bottleneck features. In addition, an open vocabulary approach using morphemic sub-lexical units is investigated along with the language model adaptation for the German LVCSR. For both the languages, competitive WERs are achieved using system combination

    Modularity and Neural Integration in Large-Vocabulary Continuous Speech Recognition

    Get PDF
    This Thesis tackles the problems of modularity in Large-Vocabulary Continuous Speech Recognition with use of Neural Network

    Speaker-adapted confidence measures for speech recognition of video lectures

    Full text link
    [EN] Automatic speech recognition applications can benefit from a confidence measure (CM) to predict the reliability of the output. Previous works showed that a word-dependent native Bayes (NB) classifier outperforms the conventional word posterior probability as a CM. However, a discriminative formulation usually renders improved performance due to the available training techniques. Taking this into account, we propose a logistic regression (LR) classifier defined with simple input functions to approximate to the NB behaviour. Additionally, as a main contribution, we propose to adapt the CM to the speaker in cases in which it is possible to identify the speakers, such as online lecture repositories. The experiments have shown that speaker-adapted models outperform their non-adapted counterparts on two difficult tasks from English (videoLectures.net) and Spanish (poliMedia) educational lectures. They have also shown that the NB model is clearly superseded by the proposed LR classifier.The research leading to these results has received funding from the European Union Seventh Framework Programme (FP7/2007-2013) under grant agreement no 287755. Also supported by the Spanish MINECO (iTrans2 TIN2009-14511 and Active2Trans TIN2012-31723) research projects and the FPI Scholarship BES-2010-033005.Sanchez-Cortina, I.; Andrés Ferrer, J.; Sanchis Navarro, JA.; Juan Císcar, A. (2016). Speaker-adapted confidence measures for speech recognition of video lectures. Computer Speech and Language. 37:11-23. https://doi.org/10.1016/j.csl.2015.10.003S11233

    Contributions to Adaptation on Automatic Speech Recognition and Multilingual Handwritten Text Recognition

    Full text link
    [ES En este trabajo se han estudiado métodos para adaptación en la transcripción automática de documentos manuscritos multilíngües y además se han propuesto diversas técnicas para identificación del idioma. Por otra parte, en cuanto a la adaptación en el reconocimiento del habla, se aplica una técnica conocida de adaptación al locutor (MLLR). Por último, se presenta una aplicación en el marco del proyecto transLectures, cuyo objetivo es integrar en un reproductor multimedia un sistema de transcripción automática interactivo.[EN] In this work we have studied several methods for adaptation on automatic handwriting recognition of multilingual documents, so as it has been propossed different language identification techniques. Regarding adaptation in speech recognition, it has been applied a well-known technique for speaker adaptation (MLLR). Finally, we present an application under the transLectures project, which is a real application example of an interactive automatic transcription system within a video player.Del Agua Teba, MA. (2012). Contributions to Adaptation on Automatic Speech Recognition and Multilingual Handwritten Text Recognition. http://hdl.handle.net/10251/19115Archivo delegad

    Confidence Measures for Automatic and Interactive Speech Recognition

    Full text link
    [EN] This thesis work contributes to the field of the {Automatic Speech Recognition} (ASR). And particularly to the {Interactive Speech Transcription} and {Confidence Measures} (CM) for ASR. The main goals of this thesis work can be summarised as follows: 1. To design IST methods and tools to tackle the problem of improving automatically generated transcripts. 2. To assess the designed IST methods and tools on real-life tasks of transcription in large educational repositories of video lectures. 3. To improve the reliability of the IST by improving the underlying (CM). Abstracts: The {Automatic Speech Recognition} (ASR) is a crucial task in a broad range of important applications which could not accomplished by means of manual transcription. The ASR can provide cost-effective transcripts in scenarios of increasing social impact such as the {Massive Open Online Courses} (MOOC), for which the availability of accurate enough is crucial even if they are not flawless. The transcripts enable search-ability, summarisation, recommendation, translation; they make the contents accessible to non-native speakers and users with impairments, etc. The usefulness is such that students improve their academic performance when learning from subtitled video lectures even when transcript is not perfect. Unfortunately, the current ASR technology is still far from the necessary accuracy. The imperfect transcripts resulting from ASR can be manually supervised and corrected, but the effort can be even higher than manual transcription. For the purpose of alleviating this issue, a novel {Interactive Transcription of Speech} (IST) system is presented in this thesis. This IST succeeded in reducing the effort if a small quantity of errors can be allowed; and also in improving the underlying ASR models in a cost-effective way. In other to adequate the proposed framework into real-life MOOCs, another intelligent interaction methods involving limited user effort were investigated. And also, it was introduced a new method which benefit from the user interactions to improve automatically the unsupervised parts ({Constrained Search} for ASR). The conducted research was deployed into a web-based IST platform with which it was possible to produce a massive number of semi-supervised lectures from two different well-known repositories, videoLectures.net and poliMedia. Finally, the performance of the IST and ASR systems can be easily increased by improving the computation of the {Confidence Measure} (CM) of transcribed words. As so, two contributions were developed: a new particular {Logistic Regresion} (LR) model; and the speaker adaption of the CM for cases in which it is possible, such with MOOCs.[ES] Este trabajo contribuye en el campo del {reconocimiento automático del habla} (RAH). Y en especial, en el de la {transcripción interactiva del habla} (TIH) y el de las {medidas de confianza} (MC) para RAH. Los objetivos principales son los siguientes: 1. Diseño de métodos y herramientas TIH para mejorar las transcripciones automáticas. 2. Evaluar los métodos y herramientas TIH empleando tareas de transcripción realistas extraídas de grandes repositorios de vídeos educacionales. 3. Mejorar la fiabilidad del TIH mediante la mejora de las MC. Resumen: El {reconocimiento automático del habla} (RAH) es una tarea crucial en una amplia gama de aplicaciones importantes que no podrían realizarse mediante transcripción manual. El RAH puede proporcionar transcripciones rentables en escenarios de creciente impacto social como el de los {cursos abiertos en linea masivos} (MOOC), para el que la disponibilidad de transcripciones es crucial, incluso cuando no son completamente perfectas. Las transcripciones permiten la automatización de procesos como buscar, resumir, recomendar, traducir; hacen que los contenidos sean más accesibles para hablantes no nativos y usuarios con discapacidades, etc. Incluso se ha comprobado que mejora el rendimiento de los estudiantes que aprenden de videos con subtítulos incluso cuando estos no son completamente perfectos. Desafortunadamente, la tecnología RAH actual aún está lejos de la precisión necesaria. Las transcripciones imperfectas resultantes del RAH pueden ser supervisadas y corregidas manualmente, pero el esfuerzo puede ser incluso superior al de la transcripción manual. Con el fin de aliviar este problema, esta tesis presenta un novedoso sistema de {transcripción interactiva del habla} (TIH). Este método TIH consigue reducir el esfuerzo de semi-supervisión siempre que sea aceptable una pequeña cantidad de errores; además mejora a la par los modelos RAH subyacentes. Con objeto de transportar el marco propuesto para MOOCs, también se investigaron otros métodos de interacción inteligentes que involucran esfuerzo limitado por parte del usuario. Además, se introdujo un nuevo método que aprovecha las interacciones para mejorar aún más las partes no supervisadas (ASR con {búsqueda restringida}). La investigación en TIH llevada a cabo se desplegó en una plataforma web con el que fue posible producir un número masivo de transcripciones de videos de dos conocidos repositorios, videoLectures.net y poliMedia. Por último, el rendimiento de la TIH y los sistemas de RAH se puede aumentar directamente mediante la mejora de la estimación de la {medida de confianza} (MC) de las palabras transcritas. Por este motivo se desarrollaron dos contribuciones: un nuevo modelo discriminativo {logístico} (LR); y la adaptación al locutor de la MC para los casos en que es posible, como por ejemplo en MOOCs.[CA] Aquest treball hi contribueix al camp del {reconeixment automàtic de la parla} (RAP). I en especial, al de la {transcripció interactiva de la parla} i el de {mesures de confiança} (MC) per a RAP. Els objectius principals són els següents: 1. Dissenyar mètodes i eines per a TIP per tal de millorar les transcripcions automàtiques. 2. Avaluar els mètodes i eines TIP per a tasques de transcripció realistes extretes de grans repositoris de vídeos educacionals. 3. Millorar la fiabilitat del TIP, mitjançant la millora de les MC. Resum: El {reconeixment automàtic de la parla} (RAP) és una tasca crucial per una àmplia gamma d'aplicacions importants que no es poden dur a terme per mitjà de la transcripció manual. El RAP pot proporcionar transcripcions en escenaris de creixent impacte social com els {cursos online oberts massius} (MOOC). Les transcripcions permeten automatitzar tasques com ara cercar, resumir, recomanar, traduir; a més a més, fa accessibles els continguts als parlants no nadius i els usuaris amb discapacitat, etc. Fins i tot, pot millorar el rendiment acadèmic de estudiants que aprenen de xerrades amb subtítols, encara que aquests subtítols no siguen perfectes. Malauradament, la tecnologia RAP actual encara està lluny de la precisió necessària. Les transcripcions imperfectes resultants de RAP poden ser supervisades i corregides manualment, però aquest l'esforç pot acabar sent superior a la transcripció manual. Per tal de resoldre aquest problema, en aquest treball es presenta un sistema nou per a {transcripció interactiva de la parla} (TIP). Aquest sistema TIP va ser reeixit en la reducció de l'esforç per quan es pot permetre una certa quantitat d'errors; així com també en en la millora dels models RAP subjacents. Per tal d'adequar el marc proposat per a MOOCs, també es van investigar altres mètodes d'interacció intel·ligents amb esforç d''usuari limitat. A més a més, es va introduir un nou mètode que aprofita les interaccions per tal de millorar encara més les parts no supervisades (RAP amb {cerca restringida}). La investigació en TIP duta a terme es va desplegar en una plataforma web amb la qual va ser possible produir un nombre massiu de transcripcions semi-supervisades de xerrades de repositoris ben coneguts, videoLectures.net i poliMedia. Finalment, el rendiment de la TIP i els sistemes de RAP es pot augmentar directament mitjançant la millora de l'estimació de la {Confiança Mesura} (MC) de les paraules transcrites. Per tant, es van desenvolupar dues contribucions: un nou model discriminatiu logístic (LR); i l'adaptació al locutor de la MC per casos en que és possible, per exemple amb MOOCs.Sánchez Cortina, I. (2016). Confidence Measures for Automatic and Interactive Speech Recognition [Tesis doctoral no publicada]. Universitat Politècnica de València. https://doi.org/10.4995/Thesis/10251/61473TESI

    CHORUS Deliverable 2.1: State of the Art on Multimedia Search Engines

    Get PDF
    Based on the information provided by European projects and national initiatives related to multimedia search as well as domains experts that participated in the CHORUS Think-thanks and workshops, this document reports on the state of the art related to multimedia content search from, a technical, and socio-economic perspective. The technical perspective includes an up to date view on content based indexing and retrieval technologies, multimedia search in the context of mobile devices and peer-to-peer networks, and an overview of current evaluation and benchmark inititiatives to measure the performance of multimedia search engines. From a socio-economic perspective we inventorize the impact and legal consequences of these technical advances and point out future directions of research

    Adaptation in Machine Translation

    Get PDF
    Statistical machine translation (SMT) has emerged as the currently most promising approach for machine translation. One limitation to date, however, is that the quality of SMT systems strongly depends on the similarity between the training data and its deployment. This thesis is devoted to adapting MT systems in the scenario of mismatching training data. We develop different approaches to increase performance even though all or some of the training data does not match the system\u27s application

    Getting Past the Language Gap: Innovations in Machine Translation

    Get PDF
    In this chapter, we will be reviewing state of the art machine translation systems, and will discuss innovative methods for machine translation, highlighting the most promising techniques and applications. Machine translation (MT) has benefited from a revitalization in the last 10 years or so, after a period of relatively slow activity. In 2005 the field received a jumpstart when a powerful complete experimental package for building MT systems from scratch became freely available as a result of the unified efforts of the MOSES international consortium. Around the same time, hierarchical methods had been introduced by Chinese researchers, which allowed the introduction and use of syntactic information in translation modeling. Furthermore, the advances in the related field of computational linguistics, making off-the-shelf taggers and parsers readily available, helped give MT an additional boost. Yet there is still more progress to be made. For example, MT will be enhanced greatly when both syntax and semantics are on board: this still presents a major challenge though many advanced research groups are currently pursuing ways to meet this challenge head-on. The next generation of MT will consist of a collection of hybrid systems. It also augurs well for the mobile environment, as we look forward to more advanced and improved technologies that enable the working of Speech-To-Speech machine translation on hand-held devices, i.e. speech recognition and speech synthesis. We review all of these developments and point out in the final section some of the most promising research avenues for the future of MT
    corecore