860 research outputs found

    Automatic Quality Estimation for ASR System Combination

    Get PDF
    Recognizer Output Voting Error Reduction (ROVER) has been widely used for system combination in automatic speech recognition (ASR). In order to select the most appropriate words to insert at each position in the output transcriptions, some ROVER extensions rely on critical information such as confidence scores and other ASR decoder features. This information, which is not always available, highly depends on the decoding process and sometimes tends to over estimate the real quality of the recognized words. In this paper we propose a novel variant of ROVER that takes advantage of ASR quality estimation (QE) for ranking the transcriptions at "segment level" instead of: i) relying on confidence scores, or ii) feeding ROVER with randomly ordered hypotheses. We first introduce an effective set of features to compensate for the absence of ASR decoder information. Then, we apply QE techniques to perform accurate hypothesis ranking at segment-level before starting the fusion process. The evaluation is carried out on two different tasks, in which we respectively combine hypotheses coming from independent ASR systems and multi-microphone recordings. In both tasks, it is assumed that the ASR decoder information is not available. The proposed approach significantly outperforms standard ROVER and it is competitive with two strong oracles that e xploit prior knowledge about the real quality of the hypotheses to be combined. Compared to standard ROVER, the abs olute WER improvements in the two evaluation scenarios range from 0.5% to 7.3%

    Improving the translation environment for professional translators

    Get PDF
    When using computer-aided translation systems in a typical, professional translation workflow, there are several stages at which there is room for improvement. The SCATE (Smart Computer-Aided Translation Environment) project investigated several of these aspects, both from a human-computer interaction point of view, as well as from a purely technological side. This paper describes the SCATE research with respect to improved fuzzy matching, parallel treebanks, the integration of translation memories with machine translation, quality estimation, terminology extraction from comparable texts, the use of speech recognition in the translation process, and human computer interaction and interface design for the professional translation environment. For each of these topics, we describe the experiments we performed and the conclusions drawn, providing an overview of the highlights of the entire SCATE project

    Procjena kvalitete strojnog prijevoda govora: studija slučaja aplikacije ILA

    Get PDF
    Machine translation (MT) is becoming qualitatively more successful and quantitatively more productive at an unprecedented pace. It is becoming a widespread solution to the challenges of a constantly rising demand for quick and affordable translations of both text and speech, causing disruption and adjustments of the translation practice and profession, but at the same time making multilingual communication easier than ever before. This paper focuses on the speech-to-speech (S2S) translation app Instant Language Assistant (ILA), which brings together the state-of-the-art translation technology: automatic speech recognition, machine translation and text-to-speech synthesis, and allows for MT-mediated multilingual communication. The aim of the paper is to assess the quality of translations of conversational language produced by the S2S translation app ILA for en-de and en-hr language pairs. The research includes several levels of translation quality analysis: human translation quality assessment by translation experts using the Fluency/Adequacy Metrics, light-post editing, and automated MT evaluation (BLEU). Moreover, the translation output is assessed with respect to language pairs to get an insight into whether they affect the MT output quality and how. The results show a relatively high quality of translations produced by the S2S translation app ILA across all assessment models and a correlation between human and automated assessment results.Strojno je prevođenje sve kvalitetnije i sve je više prisutno u svakodnevnom životu. Zbog porasta potražnje za brzim i pristupačnim prijevodima teksta i govora, strojno se prevođenje nameće kao općeprihvaćeno rješenje, što dovodi do korjenitih promjena i prilagodbi u prevoditeljskoj struci i praksi te istodobno višejezičnu komunikaciju čini lakšom nego ikada do sada. Ovaj se rad bavi aplikacijom Instant Language Assistant (ILA) za strojni prijevod govora. ILA omogućuje višejezičnu komunikaciju posredovanu strojnim prevođenjem, a temelji se na najnovijim tehnološkim dostignućima, i to na automatskom prepoznavanju govora, strojnom prevođenju i sintezi teksta u govor. Cilj je rada procijeniti kvalitetu prijevoda razgovornog jezika dobivenog pomoću aplikacije ILA i to za parove jezika engleski – njemački te engleski – hrvatski. Kvaliteta prijevoda analizira se u nekoliko faza: kvalitetu prijevoda procjenjuju stručnjaci pomoću metode procjene tečnosti i točnosti (engl. Fluency/Adequacy Metrics), zatim se provodi ograničena redaktura strojno prevedenih govora (engl. light post-editing), nakon čega slijedi automatsko vrednovanje strojnog prijevoda (BLEU). Strojno prevedeni govor procjenjuje se i uzevši u obzir o kojem je jezičnom paru riječ kako bi se dobio uvid u to utječu li jezični parovi na strojni prijevod i na koji način. Rezultati pokazuju da su prijevodi dobiveni pomoću aplikacije ILA za strojni prijevod govora procijenjeni kao razmjerno visokokvalitetni bez obzira na metodu procjene, kao i da se ljudske procjene kvalitete prijevoda poklapaju sa strojnima

    Spoken Language Translation Graphs Re-decoding using Automatic Quality Assessment

    Get PDF
    International audienceThis paper investigates how automatic quality assessment of spoken language translation (SLT) can help re-decoding SLT output graphs and improving the overall speech translation performance. Using robust word confidence measures (from both ASR and MT) to re-decode the SLT graph leads to a significant BLEU improvement (more than 2 points) compared to our SLT baseline (French-English task)

    Testing quality in interlingual respeaking and other methods of interlingual live subtitling

    Get PDF
    La sottotitolazione in tempo reale (Live Subtitling, LS), trova le sue fondamenta nella sottotitolazione preregistrata per non udenti e ipoudenti per la produzione di sottotitoli per eventi o programmi televisivi dal vivo. La sottotitolazione live comporta il trasferimento da un contenuto orale a uno scritto (traduzione intersemiotica) e può essere effettuata da e verso la stessa lingua (intralinguistica), o da una lingua a un’altra (interlinguistica), fornendo così accessibilità per soggetti non udenti e al tempo stesso garantendo accesso multilingue ai contenuti audiovisivi. La sottotitolazione interlinguistica in tempo reale (d'ora in poi indicata come ILS, Interlingual Live Subtitling) viene attualmente realizzata con diversi metodi: l'attenzione è qui posta sulla tecnica del respeaking interlinguistico, uno dei metodi di sottotitolazione in tempo reale o speech-to-text interpreting (STTI) che ha suscitato negli ultimi anni un crescente interesse, anche nel panorama italiano. Questa tesi di Dottorato intende fornire un quadro della letteratura e della ricerca sul respeaking intralinguistico e interlinguistico fino ad oggi, con particolare enfasi sulla situazione attuale in Italia di questa pratica. L'obiettivo della ricerca è stato quello di esplorare diversi metodi di ILS, mettendone in luce i punti di forza e le debolezze nel tentativo di informare il settore delle potenzialità e dei rischi che possono riflettersi sulla qualità complessiva finale dei sottotitoli attraverso l’utilizzo di diverse tecniche. Per fare ciò, sono stati testati in totale cinque metodi di ILS con diversi gradi di interazione uomo-macchina; ciascun metodo è stato analizzato in termini di qualità, quindi non solo dal punto di vista dell'accuratezza linguistica, ma anche considerando un altro fattore cruciale quale il ritardo nella trasmissione dei sottotitoli stessi. Nello svolgimento della ricerca sono stati condotti due casi di studio con diverse coppie linguistiche: il primo esperimento (dall'inglese all'italiano) ha testato e valutato la qualità di respeaking interlinguistico, interpretazione simultanea insieme a respeaking intralinguistico e, infine, interpretazione simultanea e sistema di riconoscimento automatico del parlato (Automatic Speech Recognition, ASR). Il secondo esperimento (dallo spagnolo all'italiano) ha valutato e confrontato cinque i metodi: i primi tre appena menzionati e altri due in cui la macchina svolgeva la maggior parte se non la totalità del lavoro: respeaking intralinguistico e traduzione automatica (Machine Translation, MT), e ASR con MT. Sono stati offerti due laboratori di respeaking interlinguistico nel Corso magistrale in Traduzione e Interpretazione dell'Università di Genova per preparare gli studenti agli esperimenti, volti a testare diversi moduli di formazione sull'ILS e la loro efficacia sull’apprendimento degli studenti. Durante le fasi di test, agli studenti sono stati assegnati diversi ruoli per ogni metodo, producendo sottotitoli interlinguistici live a partire dallo stesso testo di partenza: un video di un discorso originale completo durante un evento dal vivo. Le trascrizioni ottenute, sotto forma di sottotitoli, sono state analizzate utilizzando il modello NTR (Romero-Fresco & Pöchhacker, 2017) e per ciascun metodo è anche stato calcolato il ritardo. I risultati quantitativi preliminari derivanti dalle analisi NTR e dal calcolo del ritardo sono stati confrontati con altri due casi di studio condotti dall'Università di Vigo (Spagna) e dall'Università del Surrey (Gran Bretagna), sottolineando come i flussi di lavoro più automatizzati o completamente automatizzati siano effettivamente più veloci degli altri, ma al contempo presentino ancora diversi problemi di traduzione e di punteggiatura. Anche se su scala ridotta, la ricerca dimostra anche quanto sia urgente e possa potenzialmente essere facile formare i traduttori e gli interpreti sul respeaking durante il loro percorso accademico, grazie anche al loro spiccato interesse per la materia. Si spera che i risultati ottenuti possano meglio mettere in luce le ripercussioni dell'uso dei diversi metodi a confronto, nonché indurre un'ulteriore riflessione sull'importanza dell'interazione umana con i sistemi automatici di traduzione e di riconoscimento del parlato nel fornire accessibilità di alta qualità per eventi dal vivo. Si spera inoltre che l’interesse degli studenti in questo campo, che era a loro completamente sconosciuto prima di questa ricerca, possa informare sull'urgenza di sensibilizzare gli studenti nel campo della sottotitolazione dal vivo attraverso il respeaking.Live subtitling (LS) finds its foundations in pre-recorded subtitling for the d/Deaf and hard of hearing (SDH) to produce real-time subtitles for live events and programs. LS implies the transfer from oral into written content (intersemiotic translation) and can be carried out from and to the same language (intralingual), or from one language to another (interlingual) to provide full accessibility for all, therefore combining SDH to the need of guaranteeing multilingual access as well. Interlingual Live Subtitling (from now on referred to as ILS) in real-time is currently being achieved by using different methods: the focus here is placed on interlingual respeaking as one of the currently used methods of LS – also referred to in this work as speech-to-text interpreting (STTI) – which has triggered growing interest also in the Italian industry over the past years. The hereby presented doctoral thesis intends to provide a wider picture of the literature and the research on intralingual and interlingual respeaking to the date, emphasizing the current situation in Italy in this practice. The aim of the research was to explore different ILS methods through their strengths and weaknesses, in an attempt to inform the industry on the impact that both potentialities and risks can have on the final overall quality of the subtitles with the involvement of different techniques in producing ILS. To do so, five ILS workflows requiring human and machine interaction to different extents were tested overall in terms of quality, thus not only from a linguistic accuracy point of view, but also considering another crucial factor such as delay in the broadcast of the subtitles. Two case studies were carried out with different language pairs: a first experiment (English to Italian) tested and assessed quality in interlingual respeaking on one hand, then simultaneous interpreting (SI) combined with intralingual respeaking, and SI and Automatic Speech Recognition (ASR) on the other. A second experiment (Spanish to Italian) evaluated and compared all the five methods: the first three again, and two others more machine-centered: intralingual respeaking combined with machine translation (MT), and ASR with MT. Two workshops in interlingual respeaking were offered at the master’s degree in Translation and Interpreting from the University of Genova to prepare students for the experiments, aimed at testing different training modules on ILS and their effectiveness on students’ learning outcomes. For the final experiments, students were assigned different roles for each tested method and performed different required tasks producing ILS from the same source text: a video of a full original speech at a live event. The obtained outputs were analyzed using the NTR model (Romero-Fresco & Pöchhacker, 2017) and the delay was calculated for each method. Preliminary quantitative results deriving from the NTR analyses and the calculation of delay were compared to other two case studies conducted by the University of Vigo and the University of Surrey, showing that more and fully-automated workflows are, indeed, faster than the others, while they still present several important issues in translation and punctuation. Albeit on a small scale, the research also shows how urgent and potentially easy could be to educate translators and interpreters in respeaking during their training phase, given their keen interest in the subject matter. It is hoped that the results obtained can better shed light on the repercussions of the use of different methods and induce further reflection on the importance of human interaction with automatic machine systems in providing high quality accessibility at live events. It is also hoped that involved students’ interest in this field, which was completely unknown to them prior to this research, can inform on the urgency of raising students’ awareness and competence acquisition in the field of live subtitling through respeaking

    Review of Research on Speech Technology: Main Contributions From Spanish Research Groups

    Get PDF
    In the last two decades, there has been an important increase in research on speech technology in Spain, mainly due to a higher level of funding from European, Spanish and local institutions and also due to a growing interest in these technologies for developing new services and applications. This paper provides a review of the main areas of speech technology addressed by research groups in Spain, their main contributions in the recent years and the main focus of interest these days. This description is classified in five main areas: audio processing including speech, speaker characterization, speech and language processing, text to speech conversion and spoken language applications. This paper also introduces the Spanish Network of Speech Technologies (RTTH. Red Temática en Tecnologías del Habla) as the research network that includes almost all the researchers working in this area, presenting some figures, its objectives and its main activities developed in the last years
    corecore