47 research outputs found
CONTRIBUTIONS TO EFFICIENT AUTOMATIC TRANSCRIPTION OF VIDEO LECTURES
Tesis por compendio[ES] Durante los últimos años, los repositorios multimedia en línea se han convertido
en fuentes clave de conocimiento gracias al auge de Internet, especialmente en
el área de la educación. Instituciones educativas de todo el mundo han dedicado
muchos recursos en la búsqueda de nuevos métodos de enseñanza, tanto para
mejorar la asimilación de nuevos conocimientos, como para poder llegar a una
audiencia más amplia. Como resultado, hoy en día disponemos de diferentes
repositorios con clases grabadas que siven como herramientas complementarias en
la enseñanza, o incluso pueden asentar una nueva base en la enseñanza a
distancia. Sin embargo, deben cumplir con una serie de requisitos para que la
experiencia sea totalmente satisfactoria y es aquí donde la transcripción de los
materiales juega un papel fundamental. La transcripción posibilita una búsqueda
precisa de los materiales en los que el alumno está interesado, se abre la
puerta a la traducción automática, a funciones de recomendación, a la
generación de resumenes de las charlas y además, el poder hacer
llegar el contenido a personas con discapacidades auditivas. No obstante, la
generación de estas transcripciones puede resultar muy costosa.
Con todo esto en mente, la presente tesis tiene como objetivo proporcionar
nuevas herramientas y técnicas que faciliten la transcripción de estos
repositorios. En particular, abordamos el desarrollo de un conjunto de herramientas
de reconocimiento de automático del habla, con énfasis en las técnicas de aprendizaje
profundo que contribuyen a proporcionar transcripciones precisas en casos de
estudio reales. Además, se presentan diferentes participaciones en competiciones
internacionales donde se demuestra la competitividad del software comparada con
otras soluciones. Por otra parte, en aras de mejorar los sistemas de
reconocimiento, se propone una nueva técnica de adaptación de estos sistemas al
interlocutor basada en el uso Medidas de Confianza. Esto además motivó el
desarrollo de técnicas para la mejora en la estimación de este tipo de medidas
por medio de Redes Neuronales Recurrentes.
Todas las contribuciones presentadas se han probado en diferentes repositorios
educativos. De hecho, el toolkit transLectures-UPV es parte de un conjunto de
herramientas que sirve para generar transcripciones de clases en diferentes
universidades e instituciones españolas y europeas.[CA] Durant els últims anys, els repositoris multimèdia en línia s'han convertit
en fonts clau de coneixement gràcies a l'expansió d'Internet, especialment en
l'àrea de l'educació. Institucions educatives de tot el món han dedicat
molts recursos en la recerca de nous mètodes d'ensenyament, tant per
millorar l'assimilació de nous coneixements, com per poder arribar a una
audiència més àmplia. Com a resultat, avui dia disposem de diferents
repositoris amb classes gravades que serveixen com a eines complementàries en
l'ensenyament, o fins i tot poden assentar una nova base a l'ensenyament a
distància. No obstant això, han de complir amb una sèrie de requisits perquè la
experiència siga totalment satisfactòria i és ací on la transcripció dels
materials juga un paper fonamental. La transcripció possibilita una recerca
precisa dels materials en els quals l'alumne està interessat, s'obri la
porta a la traducció automàtica, a funcions de recomanació, a la
generació de resums de les xerrades i el poder fer
arribar el contingut a persones amb discapacitats auditives. No obstant, la
generació d'aquestes transcripcions pot resultar molt costosa.
Amb això en ment, la present tesi té com a objectiu proporcionar noves
eines i tècniques que faciliten la transcripció d'aquests repositoris. En
particular, abordem el desenvolupament d'un conjunt d'eines de reconeixement
automàtic de la parla, amb èmfasi en les tècniques d'aprenentatge profund que
contribueixen a proporcionar transcripcions precises en casos d'estudi reals. A
més, es presenten diferents participacions en competicions internacionals on es
demostra la competitivitat del programari comparada amb altres solucions.
D'altra banda, per tal de millorar els sistemes de reconeixement, es proposa una
nova tècnica d'adaptació d'aquests sistemes a l'interlocutor basada en l'ús de
Mesures de Confiança. A més, això va motivar el desenvolupament de tècniques per
a la millora en l'estimació d'aquest tipus de mesures per mitjà de Xarxes
Neuronals Recurrents.
Totes les contribucions presentades s'han provat en diferents repositoris
educatius. De fet, el toolkit transLectures-UPV és part d'un conjunt d'eines
que serveix per generar transcripcions de classes en diferents universitats i
institucions espanyoles i europees.[EN] During the last years, on-line multimedia repositories have become key
knowledge assets thanks to the rise of Internet and especially in the area of
education. Educational institutions around the world have devoted big efforts
to explore different teaching methods, to improve the transmission of knowledge
and to reach a wider audience. As a result, online video lecture repositories
are now available and serve as complementary tools that can boost the learning
experience to better assimilate new concepts. In order to guarantee the success
of these repositories the transcription of each lecture plays a very important
role because it constitutes the first step towards the availability of many other
features. This transcription allows the searchability of learning materials,
enables the translation into another languages, provides recommendation
functions, gives the possibility to provide content summaries, guarantees
the access to people with hearing disabilities, etc. However, the
transcription of these videos is expensive in terms of time and human cost.
To this purpose, this thesis aims at providing new tools and techniques that
ease the transcription of these repositories. In particular, we address the
development of a complete Automatic Speech Recognition Toolkit with an special
focus on the Deep Learning techniques that contribute to provide accurate
transcriptions in real-world scenarios. This toolkit is tested against many
other in different international competitions showing comparable transcription
quality. Moreover, a new technique to improve the recognition accuracy has been
proposed which makes use of Confidence Measures, and constitutes the spark that
motivated the proposal of new Confidence Measures techniques that helped to
further improve the transcription quality. To this end, a new speaker-adapted
confidence measure approach was proposed for models based on Recurrent Neural
Networks.
The contributions proposed herein have been tested in real-life scenarios in
different educational repositories. In fact, the transLectures-UPV toolkit is
part of a set of tools for providing video lecture transcriptions in many
different Spanish and European universities and institutions.Agua Teba, MÁD. (2019). CONTRIBUTIONS TO EFFICIENT AUTOMATIC TRANSCRIPTION OF VIDEO LECTURES [Tesis doctoral no publicada]. Universitat Politècnica de València. https://doi.org/10.4995/Thesis/10251/130198TESISCompendi
Automatic Quality Estimation for ASR System Combination
Recognizer Output Voting Error Reduction (ROVER) has been widely used for
system combination in automatic speech recognition (ASR). In order to select
the most appropriate words to insert at each position in the output
transcriptions, some ROVER extensions rely on critical information such as
confidence scores and other ASR decoder features. This information, which is
not always available, highly depends on the decoding process and sometimes
tends to over estimate the real quality of the recognized words. In this paper
we propose a novel variant of ROVER that takes advantage of ASR quality
estimation (QE) for ranking the transcriptions at "segment level" instead of:
i) relying on confidence scores, or ii) feeding ROVER with randomly ordered
hypotheses. We first introduce an effective set of features to compensate for
the absence of ASR decoder information. Then, we apply QE techniques to perform
accurate hypothesis ranking at segment-level before starting the fusion
process. The evaluation is carried out on two different tasks, in which we
respectively combine hypotheses coming from independent ASR systems and
multi-microphone recordings. In both tasks, it is assumed that the ASR decoder
information is not available. The proposed approach significantly outperforms
standard ROVER and it is competitive with two strong oracles that e xploit
prior knowledge about the real quality of the hypotheses to be combined.
Compared to standard ROVER, the abs olute WER improvements in the two
evaluation scenarios range from 0.5% to 7.3%
Building task-oriented machine translation systems
La principal meta de esta tesis es desarrollar sistemas de traduccion interactiva que presenten mayor
sinergia con sus usuarios potenciales. Por ello, el objetivo es hacer los sistemas estado del arte mas
ergonomicos, intuitivos y eficientes, con el fin de que el experto humano se sienta mas comodo al utilizarlos.
Con este fin se presentan diferentes t�ecnicas enfocadas a mejorar la adaptabilidad y el tiempo
de respuesta de los sistemas de traduccion automatica subyacentes, as�ÿ como tambien se presenta una
estrategia cuya finalidad es mejorar la interaccion hombre-m�aquina. Todo ello con el proposito ultimo
de rellenar el hueco existente entre el estado del arte en traduccion automatica y las herramientas que los
traductores humanos tienen a su disposici�on.
En lo que respecta al tiempo de respuesta de los sistemas de traducci�on autom�atica, en esta tesis se
presenta una t�ecnica de poda de los par�ametros de los modelos de traducci�on actuales, cuya intuici�on est�a
basada en el concepto de segmentaci�on biling¤ue, pero que termina por evolucionar hacia una estrategia de
re-estimaci�on de dichos par�ametros. Utilizando esta estrategia se obtienen resultados experimentales que
demuestran que es posible podar la tabla de segmentos hasta en un 97%, sin mermar por ello la calidad
de las traducciones obtenidas. Adem�as, estos resultados son coherentes en diferentes pares de lenguas,
lo cual evidencia que la t�ecnica que se presenta aqu�ÿ es efectiva en un entorno de traducci�on autom�atica
tradicional, y por lo tanto podr�ÿa ser utilizada directamente en un escenario de post-edici�on. Sin embargo,
los experimentos llevados a cabo en traducci�on interactiva son ligeramente menos convincentes, pues
implican la necesidad de llegar a un compromiso entre el tiempo de respuesta y la calidad de los sufijos
producidos.
Por otra parte, se presentan dos t�ecnicas de adaptaci�on, con el prop�osito de mejorar la adaptabilidad
de los sistemas de traducci�on autom�atica. La primeraSanchis Trilles, G. (2012). Building task-oriented machine translation systems [Tesis doctoral no publicada]. Universitat Politècnica de València. https://doi.org/10.4995/Thesis/10251/17174Palanci
A Deep Source-Context Feature for Lexical Selection in Statistical Machine Translation
this is the author’s version of a work that was accepted for publication in Pattern Recognition Letters . Changes resulting from the publishing process, such as peer review, editing, corrections, structural formatting, and other quality control mechanisms may not be reflected in this document. Changes may have been made to this work since it was submitted for publication. A definitive version was subsequently published in Pattern Recognition Letters 75 (2016) 24–29. DOI 10.1016/j.patrec.2016.02.014.This paper presents a methodology to address lexical disambiguation in a standard phrase-based statistical
machine translation system. Similarity among source contexts is used to select appropriate translation
units. The information is introduced as a novel feature of the phrase-based model and it is used to select
the translation units extracted from the training sentence more similar to the sentence to translate. The
similarity is computed through a deep autoencoder representation, which allows to obtain effective lowdimensional
embedding of data and statistically significant BLEU score improvements on two different
tasks (English-to-Spanish and English-to-Hindi).
© 2016 Elsevier B.V. All rights reserved.The work of the first author has been supported by FPI UPV pre-doctoral grant (num. registro - 3505). The work of the second author has been supported by Spanish Ministerio de Economia y Competitividad, contract TEC2015-69266-P and the Seventh Framework Program of the European Commission through the International Outgoing Fellowship Marie Curie Action (IMTraP-2011-29951). The work of the third author has been supported by the Spanish Ministerio de Economia y Competitividad, SomEMBED TIN2015-71147-C2-1-P research project and by the Generalitat Valenciana under the grant ALMAPATER (PrometeoII/2014/030).Gupta, PA.; Costa-Jussa, MR.; Rosso, P.; Banchs, R. (2016). A Deep Source-Context Feature for Lexical Selection in Statistical Machine Translation. Pattern Recognition Letters. 75:24-29. https://doi.org/10.1016/j.patrec.2016.02.014S24297
The ADAPT system description for the IWSLT 2018 Basque to English translation task
In this paper we present the ADAPT system built for the
Basque to English Low Resource MT Evaluation Campaign.
Basque is a low-resourced, morphologically-rich language.
This poses a challenge for Neural Machine Translation models which usually achieve better performance when trained
with large sets of data.
Accordingly, we used synthetic data to improve the translation quality produced by a model built using only authentic
data. Our proposal uses back-translated data to: (a) create
new sentences, so the system can be trained with more data;
and (b) translate sentences that are close to the test set, so the
model can be fine-tuned to the document to be translated
The UPV Handwriting Recognition and Translation System for OpenHaRT 2013
The NIST Open Handwriting Recognition and
Translation Evaluation 2013 (NIST OpenHaRT’13) is a performance
evaluation assessing technologies that transcribe and
translate text in document images. This evaluation is focused on
recognizing Arabic text images and translating them into English.
A Handwriting Recognition and Translation system typically consists
of a combination of two systems: a Text Recognition system
and a Machine Translation system. In this paper, we present the
UPV participation in the NIST OpenHaRT 2013 evaluation. For
the Text Recognition system we used the TL toolkit for training
and recognition. For the Machine Translation system we used the
Moses toolkit for training and decoding. Results in this evaluation
are challenging and they significantly outperform our previous
results in the OpenHaRT 2010 evaluation.Alkhoury, I.; Giménez Pastor, A.; Andrés Ferrer, J.; Juan Císcar, A.; Sánchez Peiró, JA. (2013). The UPV Handwriting Recognition and Translation System for OpenHaRT 2013. US National Institute of Standards and Technology (NIST). http://hdl.handle.net/10251/5439
Explicit length modelling for statistical machine translation
[EN] Explicit length modelling has been previously explored in statistical pattern recognition with successful
results. In this paper, two length models along with two parameter estimation methods and two
alternative parametrisations for statistical machine translation (SMT) are presented. More precisely, we
incorporate explicit bilingual length modelling in a state-of-the-art log-linear SMT system as an
additional feature function in order to prove the contribution of length information. Finally, a
systematic evaluation on reference SMT tasks considering different language pairs proves the benefits
of explicit length modelling.Work supported by the EC (FEDER/FSE) under the transLectures project (FP7-ICT-2011-7-287755) and the Spanish MEC/MICINN under the MIPRCV "Consolider Ingenio 2010" program (CSD2007-00018) and iTrans2 (TIN2009-14511) projects and FPU grant (AP2010-4349). Also supported by the Spanish MITyC under the erudito.com (TSI-020110-2009-439) project and by the Generalitat Valenciana under grants Prometeo/2009/014 and GV/2010/067, and by the UPV under the AdInTAO (20091027) project. The authors wish to thank the anonymous reviewers for their criticisms and suggestions.Silvestre Cerdà, JA.; Andrés Ferrer, J.; Civera Saiz, J. (2012). Explicit length modelling for statistical machine translation. Pattern Recognition. 45(9):3183-3192. https://doi.org/10.1016/j.patcog.2012.01.006S3183319245
Findings of the IWSLT 2022 Evaluation Campaign.
The evaluation campaign of the 19th International Conference on Spoken Language Translation featured eight shared tasks: (i) Simultaneous speech translation, (ii) Offline speech translation, (iii) Speech to speech translation, (iv) Low-resource speech translation, (v) Multilingual speech translation, (vi) Dialect speech translation, (vii) Formality control for speech translation, (viii) Isometric speech translation. A total of 27 teams participated in at least one of the shared tasks. This paper details, for each shared task, the purpose of the task, the data that were released, the evaluation metrics that were applied, the submissions that were received and the results that were achieved