46 research outputs found

    Learning languages from parallel corpora

    Full text link
    This work describes a blueprint for an application that generates language learning exercises from parallel corpora. Word alignment and parallel structures allow for the automatic assessment of sentence pairs in the source and target languages, while users of the application continuously improve the quality of the data with their interactions, thus crowdsourcing parallel language learning material. Through triangulation, their assessment can be transferred to language pairs other than the original ones if multiparallel corpora are used as a source. Several challenges need to be addressed for such an application to work, and we will discuss three of them here. First, the question of how adequate learning material can be identified in corpora has received some attention in the last decade, and we will detail what the structure of parallel corpora implies for that selection. Secondly, we will consider which type of exercises can be generated automatically from parallel corpora such that they foster learning and keep learners motivated. And thirdly, we will highlight the potential of employing users, that is both teachers and learners, as crowdsourcers to help improve the material

    Nodalida 2005 - proceedings of the 15th NODALIDA conference

    Get PDF

    24th Nordic Conference on Computational Linguistics (NoDaLiDa)

    Get PDF

    Proceedings

    Get PDF
    Proceedings of the Workshop CHAT 2011: Creation, Harmonization and Application of Terminology Resources. Editors: Tatiana Gornostay and Andrejs Vasiļjevs. NEALT Proceedings Series, Vol. 12 (2011). © 2011 The editors and contributors. Published by Northern European Association for Language Technology (NEALT) http://omilia.uio.no/nealt . Electronically published at Tartu University Library (Estonia) http://hdl.handle.net/10062/16956

    Segmental Durations of Speech

    Get PDF
    This dissertation considers the segmental durations of speech from the viewpoint of speech technology, especially speech synthesis. The idea is that better models of segmental durations lead to higher naturalness and better intelligibility. These features are the key factors for better usability and generality of synthesized speech technology. Even though the studies are based on a Finnish corpus the approaches apply to all other languages as well. This is possibly due to the fact that most of the studies included in this dissertation are about universal effects taking place on utterance boundaries. Also the methods invented and used here are suitable for any other study of another language. This study is based on two corpora of news reading speech and sentences read aloud. The other corpus is read aloud by a 39-year-old male, whilst the other consists of several speakers in various situations. The use of two corpora is twofold: it involves a comparison of the corpora and a broader view on the matters of interest. The dissertation begins with an overview to the phonemes and the quantity system in the Finnish language. Especially, we are covering the intrinsic durations of phonemes and phoneme categories, as well as the difference of duration between short and long phonemes. The phoneme categories are presented to facilitate the problem of variability of speech segments. In this dissertation we cover the boundary-adjacent effects on segmental durations. In initial positions of utterances we find that there seems to be initial shortening in Finnish, but the result depends on the level of detail and on the individual phoneme. On the phoneme level we find that the shortening or lengthening only affects the very first ones at the beginning of an utterance. However, on average, the effect seems to shorten the whole first word on the word level. We establish the effect of final lengthening in Finnish. The effect in Finnish has been an open question for a long time, whilst Finnish has been the last missing piece for it to be a universal phenomenon. Final lengthening is studied from various angles and it is also shown that it is not a mere effect of prominence or an effect of speech corpus with high inter- and intra-speaker variation. The effect of final lengthening seems to extend from the final to the penultimate word. On a phoneme level it reaches a much wider area than the initial effect. We also present a normalization method suitable for corpus studies on segmental durations. The method uses an utterance-level normalization approach to capture the pattern of segmental durations within each utterance. This prevents the impact of various problematic variations within the corpora. The normalization is used in a study on final lengthening to show that the results on the effect are not caused by variation in the material. The dissertation shows an implementation and prowess of speech synthesis on a mobile platform. We find that the rule-based method of speech synthesis is a real-time software solution, but the signal generation process slows down the system beyond real time. Future aspects of speech synthesis on limited platforms are discussed. The dissertation considers ethical issues on the development of speech technology. The main focus is on the development of speech synthesis with high naturalness, but the problems and solutions are applicable to any other speech technology approaches.Siirretty Doriast

    Text-to-speech vs. human voiced audio descriptions : a reception study in films dubbed into Catalan

    Get PDF
    This article presents an experiment that aims to determine whether blind and visually impaired people would accept the implementation of text-to-speech in the audio description of dubbed feature films in the Catalan context. A user study was conducted with 67 blind and partially sighted people who assessed two synthetic voices when applied to audio description, as compared to two natural voices. All of the voices had been previously selected in a preliminary test. The analysis of the data (both quantitative and qualitative) concludes that most participants accept Catalan text-to-speech audio description as an alternative solution to the standard human-voiced audio description. However, natural voices obtain statistically higher scores than synthetic voices and are still the preferred solution

    Learning multilingual and multimodal representations with language-specific encoders and decoders for machine translation

    Get PDF
    This thesis aims to study different language-specific approaches for Multilingual Machine Translation without parameter sharing and their properties compared to the current state-of-the-art based on parameter-sharing. We define Multilingual Machine Translation as the task that focuses on methods to translate between several pairs of languages in a single system. It has been widely studied in recent years due to its ability to easily scale to more languages, even between pairs never seen together during training (zero-shot translation). Several architectures have been proposed to tackle this problem with varying amounts of shared parameters between languages. Current state-of-the-art systems focus on a single sequence-to-sequence architecture where all languages share the complete set of parameters, including the token representation. While this has proven convenient for transfer learning, it makes it challenging to incorporate new languages into the trained model as all languages depend on the same parameters. What all proposed architectures have in common is enforcing a shared presentation space between languages. Specifically, during this work, we will employ as representation the final output of the encoders that the decoders will use to perform cross-attention. Having a shared space reduces noise as similar sentences at semantic level produce similar vectorial representations, helping the decoders process representations from several languages. This semantic representation is particularly important for zero-shot translation as the representation similarity to the languages pairs seen during training is key to reducing ambiguity between languages and obtaining good translation performance. This thesis is structured in three main blocks, focused on different scenarios of this task. Firstly, we propose a training method that enforces a common representation for bilingual training and a procedure to extend it to new languages efficiently. Secondly, we propose another training method that allows this representation to be learned directly on multilingual data and can be equally extended to new languages. Thirdly, we show that the proposed multilingual architecture is not limited only to textual languages. We extend our method to new data modalities by adding speech encoders, performing Spoken Language Translation, including Zero-Shot, to all the supported languages. Our main results show that the common intermediate representation is achievable in this scenario, matching the performance of previously shared systems while allowing the addition of new languages or data modalities efficiently without negative transfer learning to the previous languages or retraining the system.El objetivo de esta tesis es estudiar diferentes arquitecturas de Traducción Automática Multilingüe con parámetros específicos para cada idioma que no son compartidos, en contraposición al estado del arte actual basado en compartir parámetros. Podemos definir la Traducción Automática Multilingüe como la tarea que estudia métodos para traducir entre varios pares de idiomas en un único sistema. Ésta ha sido ampliamente estudiada en los últimos años debido a que nos permite escalar nuestros sistemas con facilidad a un gran número de idiomas, incluso entre pares de idiomas que no han sido nunca entrenados juntos (traducción zero-shot). Diversas arquitecturas han sido propuestas con diferentes niveles de parámetros compartidos entre idiomas, El estado del arte actual se enfoca hacía un solo modelo secuencia a secuencia donde todos los parámetros son compartidos por todos los idiomas, incluyendo la representación a nivel de unidad lingüística. Siendo esto beneficioso para la transferencia de conocimiento entre idiomas, también puede resultar una limitación a la hora de añadir nuevos, ya que modificaríamos los parámetros para todos los idiomas soportados. El elemento común de todas las arquitecturas propuestas es promover un espacio común donde representar a todos los idiomas en el sistema. Concretamente, durante este trabajo, nos referiremos a la representación final de los codificadores del sistema como este espacio, puesto que es la representación utilizada durante la atención cruzada por los decodificadores al generar traducciones. El objetivo de esta representación común es reducir ruido, ya que frases similares producirán representaciones similares, lo cual resulta de ayuda al usar un mismo decodificador para procesar la representación vectorial de varios idiomas. Esto es especialmente importante en el caso de la traducción zero-shot, ya que el par de idiomas no ha sido nunca entrenado conjuntamente, para reducir posibles ambigüedades y obtener una buena calidad de traducción. La tesis está organizada en tres bloques principales, enfocados en diferentes escenarios de esta tarea. Primero, proponemos un método para entrenar una representación común en sistemas bilingües, y un procedimiento para extenderla a nuevos idiomas de manera eficiente. Segundo, proponemos otro método de entrenamiento para aprender esta representación directamente desde datos multilingües y como puede ser igualmente extendida a nuevos idiomas. Tercero, mostramos que esta representación no está limitada únicamente a datos textuales. Para ello, extendemos nuestro método a otra modalidad de datos, en este caso discurso hablado, demostrando que podemos realizar traducción de audio a texto para todos los idiomas soportados, incluyendo traducción zero-shot. Nuestros resultados muestras que una representación común puede ser aprendida sin compartir parámetros entre idiomas, con una calidad de traducción similar a la del actual estado del arte, con la ventaja de permitirnos añadir nuevos idiomas o modalidades de datos de manera eficiente, sin transferencia negativa de conocimiento a los idiomas ya soportados y sin necesidad de reentrenarlos.Postprint (published version
    corecore