2,615 research outputs found
Unsupervised vocabulary selection for simultaneous lecture translation
In this work, we propose a novel method for vocabulary selection which enables simultaneous speech recognition systems for lectures to automatically adapt to the diverse topics that occur in educational and scientific lectures. Utilizing materials that are available before the lecture begins, such as lecture slides, our proposed framework iteratively searches for related documents on the World Wide Web and generates a lecture-specific vocabulary and language model based on the resulting documents. In this paper, we introduce a novel method for vocabulary selection where we rank vocabulary that occurs in the collected documents based on a relevance score which is calculated using a combination of word features. Vocabulary selection is a critical component for topic adaptation that has typically been overlooked in prior works. On the interACT German-English simultaneous lecture translation system our proposed approach significantly improved vocabulary coverage, reducing the out-of-vocabulary rate on average by 57.0% and up to 84.9%, compared to a lecture-independent baseline. Furthermore, our approach reduced the word error rate by up to 25.3% (on average 13.2% across all lectures), compared to a lectureindependent baseline
A System for Simultaneous Translation of Lectures and Speeches
This thesis realizes the first existing automatic system for simultaneous speech-to-speech translation. The focus of this system is the automatic translation of (technical oriented) lectures and speeches from English to Spanish, but the different aspects described in this thesis will also be helpful for developing simultaneous translation systems for other domains or languages
Evaluation of Interactive User Corrections for Lecture Transcription
In this work, we present and evaluate the usage of an interactive web interface for browsing and correcting lecture transcripts. An experiment performed with potential users without transcription experience provides us with a set of example corrections. On German lecture data, user corrections greatly improve the comprehensibility of the transcripts, yet only reduce the WER to 22%. The precision of user edits is relatively low at 77% and errors in inflection, case and compounds were rarely corrected. Nevertheless, characteristic lecture data errors, such as highly specific terms, were typically corrected, providing valuable additional information
ELITR Non-Native Speech Translation at IWSLT 2020
This paper is an ELITR system submission for the non-native speech translation task at IWSLT 2020. We describe systems for offline ASR, real-time ASR, and our cascaded approach to offline SLT and real-time SLT. We select our primary candidates from a pool of pre-existing systems, develop a new end-toend general ASR system, and a hybrid ASR trained on non-native speech. The provided small validation set prevents us from carrying out a complex validation, but we submit all the unselected candidates for contrastive evaluation on the test set
Deep Learning Techniques for Music Generation -- A Survey
This paper is a survey and an analysis of different ways of using deep
learning (deep artificial neural networks) to generate musical content. We
propose a methodology based on five dimensions for our analysis:
Objective - What musical content is to be generated? Examples are: melody,
polyphony, accompaniment or counterpoint. - For what destination and for what
use? To be performed by a human(s) (in the case of a musical score), or by a
machine (in the case of an audio file).
Representation - What are the concepts to be manipulated? Examples are:
waveform, spectrogram, note, chord, meter and beat. - What format is to be
used? Examples are: MIDI, piano roll or text. - How will the representation be
encoded? Examples are: scalar, one-hot or many-hot.
Architecture - What type(s) of deep neural network is (are) to be used?
Examples are: feedforward network, recurrent network, autoencoder or generative
adversarial networks.
Challenge - What are the limitations and open challenges? Examples are:
variability, interactivity and creativity.
Strategy - How do we model and control the process of generation? Examples
are: single-step feedforward, iterative feedforward, sampling or input
manipulation.
For each dimension, we conduct a comparative analysis of various models and
techniques and we propose some tentative multidimensional typology. This
typology is bottom-up, based on the analysis of many existing deep-learning
based systems for music generation selected from the relevant literature. These
systems are described and are used to exemplify the various choices of
objective, representation, architecture, challenge and strategy. The last
section includes some discussion and some prospects.Comment: 209 pages. This paper is a simplified version of the book: J.-P.
Briot, G. Hadjeres and F.-D. Pachet, Deep Learning Techniques for Music
Generation, Computational Synthesis and Creative Systems, Springer, 201
Augmenting Translation Lexica by Learning Generalised Translation Patterns
Bilingual Lexicons do improve quality: of parallel corpora alignment, of newly extracted
translation pairs, of Machine Translation, of cross language information retrieval, among
other applications. In this regard, the first problem addressed in this thesis pertains to
the classification of automatically extracted translations from parallel corpora-collections
of sentence pairs that are translations of each other. The second problem is concerned
with machine learning of bilingual morphology with applications in the solution of first
problem and in the generation of Out-Of-Vocabulary translations.
With respect to the problem of translation classification, two separate classifiers for
handling multi-word and word-to-word translations are trained, using previously extracted
and manually classified translation pairs as correct or incorrect. Several insights
are useful for distinguishing the adequate multi-word candidates from those that are
inadequate such as, lack or presence of parallelism, spurious terms at translation ends
such as determiners, co-ordinated conjunctions, properties such as orthographic similarity
between translations, the occurrence and co-occurrence frequency of the translation
pairs. Morphological coverage reflecting stem and suffix agreements are explored as key
features in classifying word-to-word translations. Given that the evaluation of extracted
translation equivalents depends heavily on the human evaluator, incorporation of an
automated filter for appropriate and inappropriate translation pairs prior to human evaluation
contributes to tremendously reduce this work, thereby saving the time involved
and progressively improving alignment and extraction quality. It can also be applied
to filtering of translation tables used for training machine translation engines, and to
detect bad translation choices made by translation engines, thus enabling significative
productivity enhancements in the post-edition process of machine made translations.
An important attribute of the translation lexicon is the coverage it provides. Learning
suffixes and suffixation operations from the lexicon or corpus of a language is an extensively
researched task to tackle out-of-vocabulary terms. However, beyond mere words
or word forms are the translations and their variants, a powerful source of information
for automatic structural analysis, which is explored from the perspective of improving
word-to-word translation coverage and constitutes the second part of this thesis. In this
context, as a phase prior to the suggestion of out-of-vocabulary bilingual lexicon entries,
an approach to automatically induce segmentation and learn bilingual morph-like units by identifying and pairing word stems and suffixes is proposed, using the bilingual
corpus of translations automatically extracted from aligned parallel corpora, manually
validated or automatically classified. Minimally supervised technique is proposed to enable
bilingual morphology learning for language pairs whose bilingual lexicons are highly
defective in what concerns word-to-word translations representing inflection diversity.
Apart from the above mentioned applications in the classification of machine extracted
translations and in the generation of Out-Of-Vocabulary translations, learned bilingual
morph-units may also have a great impact on the establishment of correspondences of
sub-word constituents in the cases of word-to-multi-word and multi-word-to-multi-word
translations and in compression, full text indexing and retrieval applications
Modularity and Neural Integration in Large-Vocabulary Continuous Speech Recognition
This Thesis tackles the problems of modularity in Large-Vocabulary Continuous Speech Recognition with use of Neural Network
- …