Search CORE

274 research outputs found

Developing Deployable Spoken Language Translation Systems given Limited Resources

Author: Eck Matthias
Publication venue: KIT-Bibliothek, Karlsruhe
Publication date: 01/01/2008
Field of study

Approaches are presented that support the deployment of spoken language translation systems. Newly developed methods allow low cost portability to new language pairs. Proposed translation model pruning techniques achieve a high translation performance even in low memory situations. The named entity and specialty vocabulary coverage, particularly on small and mobile devices, is targeted to an individual user by translation model personalization

Thai Automatic Speech Recognition

Author: Black Alan W.
Charoenpornsawat Paisarn
Schultz Tanja
Suebvisai Sinaporn
Woszczyna Monika
Publication venue
Publication date: 16/06/2008
Field of study

Flexible Speech Translation Systems

Author: Black Alan W.
Schultz Tanja
Vogel Stephan
Woszczyna Monika
Publication venue: Institute of Electrical and Electronics Engineers
Publication date: 05/06/2008
Field of study

Individual and Domain Adaptation in Sentence Planning for Dialogue

Author: Mairesse F.
Prasad R.
Stent A.
Walker M. A.
Publication venue: 'AI Access Foundation'
Publication date: 31/10/2011
Field of study

One of the biggest challenges in the development and deployment of spoken dialogue systems is the design of the spoken language generation module. This challenge arises from the need for the generator to adapt to many features of the dialogue domain, user population, and dialogue context. A promising approach is trainable generation, which uses general-purpose linguistic knowledge that is automatically adapted to the features of interest, such as the application domain, individual user, or user group. In this paper we present and evaluate a trainable sentence planner for providing restaurant information in the MATCH dialogue system. We show that trainable sentence planning can produce complex information presentations whose quality is comparable to the output of a template-based generator tuned to this domain. We also show that our method easily supports adapting the sentence planner to individuals, and that the individualized sentence planners generally perform better than models trained and tested on a population of individuals. Previous work has documented and utilized individual preferences for content selection, but to our knowledge, these results provide the first demonstration of individual preferences for sentence planning operations, affecting the content order, discourse structure and sentence structure of system responses. Finally, we evaluate the contribution of different feature sets, and show that, in our application, n-gram features often do as well as features based on higher-level linguistic representations

arXiv.org e-Print Archive

Procjena kvalitete strojnog prijevoda govora: studija slučaja aplikacije ILA

Author: Lekić Martina
Omazić Marija
Publication venue: 'Faculty of Humanities and Social Sciences, University of Zagreb'
Publication date: 01/01/2021
Field of study

Machine translation (MT) is becoming qualitatively more successful and quantitatively more productive at an unprecedented pace. It is becoming a widespread solution to the challenges of a constantly rising demand for quick and affordable translations of both text and speech, causing disruption and adjustments of the translation practice and profession, but at the same time making multilingual communication easier than ever before. This paper focuses on the speech-to-speech (S2S) translation app Instant Language Assistant (ILA), which brings together the state-of-the-art translation technology: automatic speech recognition, machine translation and text-to-speech synthesis, and allows for MT-mediated multilingual communication. The aim of the paper is to assess the quality of translations of conversational language produced by the S2S translation app ILA for en-de and en-hr language pairs. The research includes several levels of translation quality analysis: human translation quality assessment by translation experts using the Fluency/Adequacy Metrics, light-post editing, and automated MT evaluation (BLEU). Moreover, the translation output is assessed with respect to language pairs to get an insight into whether they affect the MT output quality and how. The results show a relatively high quality of translations produced by the S2S translation app ILA across all assessment models and a correlation between human and automated assessment results.Strojno je prevođenje sve kvalitetnije i sve je više prisutno u svakodnevnom životu. Zbog porasta potražnje za brzim i pristupačnim prijevodima teksta i govora, strojno se prevođenje nameće kao općeprihvaćeno rješenje, što dovodi do korjenitih promjena i prilagodbi u prevoditeljskoj struci i praksi te istodobno višejezičnu komunikaciju čini lakšom nego ikada do sada. Ovaj se rad bavi aplikacijom Instant Language Assistant (ILA) za strojni prijevod govora. ILA omogućuje višejezičnu komunikaciju posredovanu strojnim prevođenjem, a temelji se na najnovijim tehnološkim dostignućima, i to na automatskom prepoznavanju govora, strojnom prevođenju i sintezi teksta u govor. Cilj je rada procijeniti kvalitetu prijevoda razgovornog jezika dobivenog pomoću aplikacije ILA i to za parove jezika engleski – njemački te engleski – hrvatski. Kvaliteta prijevoda analizira se u nekoliko faza: kvalitetu prijevoda procjenjuju stručnjaci pomoću metode procjene tečnosti i točnosti (engl. Fluency/Adequacy Metrics), zatim se provodi ograničena redaktura strojno prevedenih govora (engl. light post-editing), nakon čega slijedi automatsko vrednovanje strojnog prijevoda (BLEU). Strojno prevedeni govor procjenjuje se i uzevši u obzir o kojem je jezičnom paru riječ kako bi se dobio uvid u to utječu li jezični parovi na strojni prijevod i na koji način. Rezultati pokazuju da su prijevodi dobiveni pomoću aplikacije ILA za strojni prijevod govora procijenjeni kao razmjerno visokokvalitetni bez obzira na metodu procjene, kao i da se ljudske procjene kvalitete prijevoda poklapaju sa strojnima

GEMv2 : Multilingual NLG benchmarking in a single line of code

Author: Adewumi Tosin
Ammanamanch Pawan Sasanka
Bhagavatula Chandra
Bhattacharjee Abhik
Bohnet Bernd
Cahyawijaya Samuel
Cardenas Ronald
Chim Jenny
Clark Elizabeth
Clive Jordan
Creutz Mathias
Daheim Nico
Deutsch Daniel
Dhole Kaustubh
Durmus Esin
Dusek Ondrej
Garbacea Cristina
Gehrmann Sebastian
Ginter Filip
Gkatzia Dimitra
Hasan Tahmid
Hayashi Hiroaki
Hou Yufang
Jernite Yacine
Jin Di
Jolly Shailza
Juraska Juraj
Kamal Eddine Moussa
Kanerva Jenna
Kriz Reno
Ladhak Faisal
Liu Yixin
Madaan Aman
Mahamood Saad
Mahendiran Abinaya
Maynez Joshua
McMillan-Major Angelina
Mille Simon
Montella Sebastien
Nikolaev Vitaly
Novikova Jekaterina
Osei Salomey
Papangelis Alexandros
Perez-Beltrachini Laura
Pu Liang Paul
Puduppully Ratish
Pushkarna Mahima
Radev Dragomir
Raghavi Chandu Khyathi
Raheja Vipul
Raunak Vikas
Ribeiro Leonardo F. R.
Sang Yisi
Sanjay Kale Mihir
Sedoc João
Shahriyar Rifat
Shen Tianhao
Shvets Anna
Strobelt Hendrik
Subramani Nishant
Thomson Craig
Tsai Vivian
Tunstall Lewis
Upadhyay Ashish
Wang Alex
Wang Dakuo
White Michael
Wilie Bryan
Winata Genta Indra
Xiong Deyi
Xu Ying
Yao Bingsheng
You Chaobin
Zhang Li
Zhou Jiawei
Zhu Qi
Štajner Sanja
Publication venue: The Association for Computational Linguistics
Publication date: 01/01/2022
Field of study

Evaluation in machine learning is usually informed by past choices, for example which datasets or metrics to use. This standardization enables the comparison on equal footing using leaderboards, but the evaluation choices become sub-optimal as better alternatives arise. This problem is especially pertinent in natural language generation which requires ever-improving suites of datasets, metrics, and human evaluation to make definitive claims. To make following best model evaluation practices easier, we introduce GEMv2. The new version of the Generation, Evaluation, and Metrics Benchmark introduces a modular infrastructure for dataset, model, and metric developers to benefit from each others work. GEMv2 supports 40 documented datasets in 51 languages. Models for all datasets can be evaluated online and our interactive data card creation and rendering tools make it easier to add new datasets to the living benchmark.Peer reviewe

Helsingin yliopiston digitaalinen arkisto

GEMv2 : Multilingual NLG benchmarking in a single line of code

Author: Adewumi Tosin
Ammanamanch Pawan Sasanka
Bhagavatula Chandra
Bhattacharjee Abhik
Bohnet Bernd
Cahyawijaya Samuel
Cardenas Ronald
Chim Jenny
Clark Elizabeth
Clive Jordan
Creutz Mathias
Daheim Nico
Deutsch Daniel
Dhole Kaustubh
Durmus Esin
Dusek Ondrej
Garbacea Cristina
Gehrmann Sebastian
Ginter Filip
Gkatzia Dimitra
Hasan Tahmid
Hayashi Hiroaki
Hou Yufang
Jernite Yacine
Jin Di
Jolly Shailza
Juraska Juraj
Kamal Eddine Moussa
Kanerva Jenna
Kriz Reno
Ladhak Faisal
Liu Yixin
Madaan Aman
Mahamood Saad
Mahendiran Abinaya
Maynez Joshua
McMillan-Major Angelina
Mille Simon
Montella Sebastien
Nikolaev Vitaly
Novikova Jekaterina
Osei Salomey
Papangelis Alexandros
Perez-Beltrachini Laura
Pu Liang Paul
Puduppully Ratish
Pushkarna Mahima
Radev Dragomir
Raghavi Chandu Khyathi
Raheja Vipul
Raunak Vikas
Ribeiro Leonardo F. R.
Sang Yisi
Sanjay Kale Mihir
Sedoc João
Shahriyar Rifat
Shen Tianhao
Shvets Anna
Strobelt Hendrik
Subramani Nishant
Thomson Craig
Tsai Vivian
Tunstall Lewis
Upadhyay Ashish
Wang Alex
Wang Dakuo
White Michael
Wilie Bryan
Winata Genta Indra
Xiong Deyi
Xu Ying
Yao Bingsheng
You Chaobin
Zhang Li
Zhou Jiawei
Zhu Qi
Štajner Sanja
Publication venue: The Association for Computational Linguistics
Publication date: 01/01/2022
Field of study

Biblio at Institute of Formal and Applied Linguistics

Helsingin yliopiston digitaalinen arkisto

GEMv2: multilingual NLG benchmarking in a single line of code.

Author: Bhattacharjee Abhik
Gehrmann Sebastian
Mahendiran Abinaya
Upadhyay Ashish
Publication venue: ACL Association for Computational Linguistics
Publication date: 11/12/2022
Field of study

Evaluations in machine learning rarely use the latest metrics, datasets, or human evaluation in favor of remaining compatible with prior work. The compatibility, often facilitated through leaderboards, thus leads to outdated but standardized evaluation practices. We pose that the standardization is taking place in the wrong spot. Evaluation infrastructure should enable researchers to use the latest methods and what should be standardized instead is how to incorporate these new evaluation advances. We introduce GEMv2, the new version of the Generation, Evaluation, and Metrics Benchmark which uses a modular infrastructure for dataset, model, and metric developers to benefit from each other's work. GEMv2 supports 40 documented datasets in 51 languages, ongoing online evaluation for all datasets, and our interactive tools make it easier to add new datasets to the living benchmark

Open Access Institutional Repository at Robert Gordon University

An Achilles’ Heel? Helping Interpreting Students Gain Greater Awareness of Literal and Idiomatic English

Author: Crezee Ineke
Grant Lynn E
Publication venue: Clemson University Libraries
Publication date: 01/07/2020
Field of study

This research paper reports on a study involving the use of literal and non-literal or idiomatic language in a multilingual interpreter classroom. Previous research has shown that interpreters are not always able to identify and correctly interpret idiomatic language. This study first examined student interpreters’ perceptions of the importance of idiomatic language, then followed by assessing their ability to identify phrases that were literal, idiomatic or both. Lastly it looked at student interpreters’ ability to correctly identify and explain idioms in short phrases and dialogues. Findings showed that, after this exercise, students\u27 awareness of the difference between literal and non-literal language increased, however their ability to correctly identify it did not. Furthermore, their previous focus on \u27specialized terminology\u27 led them to believe that language other than this was hardly worth learning. The article concludes with recommendations for incorporating the findings of this research into interpreter education