Search CORE

4 research outputs found

GEMv2 : Multilingual NLG benchmarking in a single line of code

Author: Adewumi Tosin
Ammanamanch Pawan Sasanka
Bhagavatula Chandra
Bhattacharjee Abhik
Bohnet Bernd
Cahyawijaya Samuel
Cardenas Ronald
Chim Jenny
Clark Elizabeth
Clive Jordan
Creutz Mathias
Daheim Nico
Deutsch Daniel
Dhole Kaustubh
Durmus Esin
Dusek Ondrej
Garbacea Cristina
Gehrmann Sebastian
Ginter Filip
Gkatzia Dimitra
Hasan Tahmid
Hayashi Hiroaki
Hou Yufang
Jernite Yacine
Jin Di
Jolly Shailza
Juraska Juraj
Kamal Eddine Moussa
Kanerva Jenna
Kriz Reno
Ladhak Faisal
Liu Yixin
Madaan Aman
Mahamood Saad
Mahendiran Abinaya
Maynez Joshua
McMillan-Major Angelina
Mille Simon
Montella Sebastien
Nikolaev Vitaly
Novikova Jekaterina
Osei Salomey
Papangelis Alexandros
Perez-Beltrachini Laura
Pu Liang Paul
Puduppully Ratish
Pushkarna Mahima
Radev Dragomir
Raghavi Chandu Khyathi
Raheja Vipul
Raunak Vikas
Ribeiro Leonardo F. R.
Sang Yisi
Sanjay Kale Mihir
Sedoc João
Shahriyar Rifat
Shen Tianhao
Shvets Anna
Strobelt Hendrik
Subramani Nishant
Thomson Craig
Tsai Vivian
Tunstall Lewis
Upadhyay Ashish
Wang Alex
Wang Dakuo
White Michael
Wilie Bryan
Winata Genta Indra
Xiong Deyi
Xu Ying
Yao Bingsheng
You Chaobin
Zhang Li
Zhou Jiawei
Zhu Qi
Štajner Sanja
Publication venue: The Association for Computational Linguistics
Publication date: 01/01/2022
Field of study

Evaluation in machine learning is usually informed by past choices, for example which datasets or metrics to use. This standardization enables the comparison on equal footing using leaderboards, but the evaluation choices become sub-optimal as better alternatives arise. This problem is especially pertinent in natural language generation which requires ever-improving suites of datasets, metrics, and human evaluation to make definitive claims. To make following best model evaluation practices easier, we introduce GEMv2. The new version of the Generation, Evaluation, and Metrics Benchmark introduces a modular infrastructure for dataset, model, and metric developers to benefit from each others work. GEMv2 supports 40 documented datasets in 51 languages. Models for all datasets can be evaluated online and our interactive data card creation and rendering tools make it easier to add new datasets to the living benchmark.Peer reviewe

Helsingin yliopiston digitaalinen arkisto

GEMv2 : Multilingual NLG benchmarking in a single line of code

Author: Adewumi Tosin
Ammanamanch Pawan Sasanka
Bhagavatula Chandra
Bhattacharjee Abhik
Bohnet Bernd
Cahyawijaya Samuel
Cardenas Ronald
Chim Jenny
Clark Elizabeth
Clive Jordan
Creutz Mathias
Daheim Nico
Deutsch Daniel
Dhole Kaustubh
Durmus Esin
Dusek Ondrej
Garbacea Cristina
Gehrmann Sebastian
Ginter Filip
Gkatzia Dimitra
Hasan Tahmid
Hayashi Hiroaki
Hou Yufang
Jernite Yacine
Jin Di
Jolly Shailza
Juraska Juraj
Kamal Eddine Moussa
Kanerva Jenna
Kriz Reno
Ladhak Faisal
Liu Yixin
Madaan Aman
Mahamood Saad
Mahendiran Abinaya
Maynez Joshua
McMillan-Major Angelina
Mille Simon
Montella Sebastien
Nikolaev Vitaly
Novikova Jekaterina
Osei Salomey
Papangelis Alexandros
Perez-Beltrachini Laura
Pu Liang Paul
Puduppully Ratish
Pushkarna Mahima
Radev Dragomir
Raghavi Chandu Khyathi
Raheja Vipul
Raunak Vikas
Ribeiro Leonardo F. R.
Sang Yisi
Sanjay Kale Mihir
Sedoc João
Shahriyar Rifat
Shen Tianhao
Shvets Anna
Strobelt Hendrik
Subramani Nishant
Thomson Craig
Tsai Vivian
Tunstall Lewis
Upadhyay Ashish
Wang Alex
Wang Dakuo
White Michael
Wilie Bryan
Winata Genta Indra
Xiong Deyi
Xu Ying
Yao Bingsheng
You Chaobin
Zhang Li
Zhou Jiawei
Zhu Qi
Štajner Sanja
Publication venue: The Association for Computational Linguistics
Publication date: 01/01/2022
Field of study

Aberdeen University Research

Biblio at Institute of Formal and Applied Linguistics

Helsingin yliopiston digitaalinen arkisto

Producer-consumer Model Based Thread Pool Design

Author: Chaobin Wang
GulKSQ
Hua Peng
Jie Qiu
Liangzhou Wang
Minggang Liang
Richard W.
Shuang You
Ting Deng
Wenbin Cao
Xiangqian Lu
Yao Zhang
Publication venue: 'IOP Publishing'
Publication date
Field of study

Crossref

X-RiSAWOZ: High-quality end-to-end multilingual dialogue datasets and few-shot agents

Author: Bali Kalika
Choudhury Monojit
de Chalendar Gaël
Goel Anmol
Kim Sungkyun
Kodali Prashant
Kumaraguru Ponnurangam
Lam Monica S.
Moradshahi Mehrad
Semmar Nasredine
Semnani Sina J.
Seo Jiwon
Seshadri Vivek
Shen Tianhao
Shrivastava Manish
Sun Michael
Xiong Deyi
Yadavalli Aditya
You Chaobin
Publication venue: HAL CCSD
Publication date: 09/07/2023
Field of study

International audienceTask-oriented dialogue research has mainly focused on a few popular languages like English and Chinese, due to the high dataset creation cost for a new language. To reduce the cost, we apply manual editing to automatically translated data. We create a new multilingual benchmark, X-RiSAWOZ, by translating the Chinese RiSAWOZ to 4 languages: English, French, Hindi, Korean; and a code-mixed EnglishHindi language. X-RiSAWOZ has more than 18,000 human-verified dialogue utterances for each language, and unlike most multilingual prior work, is an end-to-end dataset for building fully-functioning agents.The many difficulties we encountered in creating X-RiSAWOZ led us to develop a toolset to accelerate the post-editing of a new language dataset after translation. This toolset improves machine translation with a hybrid entity alignment technique that combines neural with dictionary-based methods, along with many automated and semi-automated validation checks.We establish strong baselines for X-RiSAWOZ by training dialogue agents in the zero- and few-shot settings where limited gold data is available in the target language. Our results suggest that our translation and post-editing methodology and toolset can be used to create new high-quality multilingual dialogue agents cost-effectively. Our dataset, code, and toolkit are released open-source

HAL-CEA