Search CORE

4,046 research outputs found

Linguistic unit discovery from multi-modal inputs in unwritten languages: Summary of the "Speaking Rosetta" JSALT 2017 Workshop

Author: Arthur Philip
Besacier Laurent
Black Alan
Ciannella Francesco
Du Mingxing
Dupoux Emmanuel
Godard Pierre
Hasegawa-Johnson Mark
Larsen Elin
Merkx Danny
Metze Florian
Mueller Markus
Neubig Graham
Ondel Lucas
Palaskar Shruti
Riad Rachid
Scharenborg Odette
Stueker Sebastian
Wang Liming
Publication venue
Publication date: 14/02/2018
Field of study

We summarize the accomplishments of a multi-disciplinary workshop exploring the computational and scientific issues surrounding the discovery of linguistic units (subwords and words) in a language without orthography. We study the replacement of orthographic transcriptions by images and/or translated text in a well-resourced language to help unsupervised discovery from raw speech.Comment: Accepted to ICASSP 201

arXiv.org e-Print Archive

Hal - Université Grenoble Alpes

INRIA a CCSD electronic archive server

GEMv2 : Multilingual NLG benchmarking in a single line of code

Author: Adewumi Tosin
Ammanamanch Pawan Sasanka
Bhagavatula Chandra
Bhattacharjee Abhik
Bohnet Bernd
Cahyawijaya Samuel
Cardenas Ronald
Chim Jenny
Clark Elizabeth
Clive Jordan
Creutz Mathias
Daheim Nico
Deutsch Daniel
Dhole Kaustubh
Durmus Esin
Dusek Ondrej
Garbacea Cristina
Gehrmann Sebastian
Ginter Filip
Gkatzia Dimitra
Hasan Tahmid
Hayashi Hiroaki
Hou Yufang
Jernite Yacine
Jin Di
Jolly Shailza
Juraska Juraj
Kamal Eddine Moussa
Kanerva Jenna
Kriz Reno
Ladhak Faisal
Liu Yixin
Madaan Aman
Mahamood Saad
Mahendiran Abinaya
Maynez Joshua
McMillan-Major Angelina
Mille Simon
Montella Sebastien
Nikolaev Vitaly
Novikova Jekaterina
Osei Salomey
Papangelis Alexandros
Perez-Beltrachini Laura
Pu Liang Paul
Puduppully Ratish
Pushkarna Mahima
Radev Dragomir
Raghavi Chandu Khyathi
Raheja Vipul
Raunak Vikas
Ribeiro Leonardo F. R.
Sang Yisi
Sanjay Kale Mihir
Sedoc João
Shahriyar Rifat
Shen Tianhao
Shvets Anna
Strobelt Hendrik
Subramani Nishant
Thomson Craig
Tsai Vivian
Tunstall Lewis
Upadhyay Ashish
Wang Alex
Wang Dakuo
White Michael
Wilie Bryan
Winata Genta Indra
Xiong Deyi
Xu Ying
Yao Bingsheng
You Chaobin
Zhang Li
Zhou Jiawei
Zhu Qi
Štajner Sanja
Publication venue: The Association for Computational Linguistics
Publication date: 01/01/2022
Field of study

Evaluation in machine learning is usually informed by past choices, for example which datasets or metrics to use. This standardization enables the comparison on equal footing using leaderboards, but the evaluation choices become sub-optimal as better alternatives arise. This problem is especially pertinent in natural language generation which requires ever-improving suites of datasets, metrics, and human evaluation to make definitive claims. To make following best model evaluation practices easier, we introduce GEMv2. The new version of the Generation, Evaluation, and Metrics Benchmark introduces a modular infrastructure for dataset, model, and metric developers to benefit from each others work. GEMv2 supports 40 documented datasets in 51 languages. Models for all datasets can be evaluated online and our interactive data card creation and rendering tools make it easier to add new datasets to the living benchmark.Peer reviewe

Aberdeen University Research

Biblio at Institute of Formal and Applied Linguistics

Helsingin yliopiston digitaalinen arkisto

GEMv2: multilingual NLG benchmarking in a single line of code.

Author: Bhattacharjee Abhik
Gehrmann Sebastian
Mahendiran Abinaya
Upadhyay Ashish
Publication venue: ACL Association for Computational Linguistics
Publication date: 11/12/2022
Field of study

Evaluations in machine learning rarely use the latest metrics, datasets, or human evaluation in favor of remaining compatible with prior work. The compatibility, often facilitated through leaderboards, thus leads to outdated but standardized evaluation practices. We pose that the standardization is taking place in the wrong spot. Evaluation infrastructure should enable researchers to use the latest methods and what should be standardized instead is how to incorporate these new evaluation advances. We introduce GEMv2, the new version of the Generation, Evaluation, and Metrics Benchmark which uses a modular infrastructure for dataset, model, and metric developers to benefit from each other's work. GEMv2 supports 40 documented datasets in 51 languages, ongoing online evaluation for all datasets, and our interactive tools make it easier to add new datasets to the living benchmark

Open Access Institutional Repository at Robert Gordon University

GEMv2 : Multilingual NLG benchmarking in a single line of code

Author: Adewumi Tosin
Ammanamanch Pawan Sasanka
Bhagavatula Chandra
Bhattacharjee Abhik
Bohnet Bernd
Cahyawijaya Samuel
Cardenas Ronald
Chim Jenny
Clark Elizabeth
Clive Jordan
Creutz Mathias
Daheim Nico
Deutsch Daniel
Dhole Kaustubh
Durmus Esin
Dusek Ondrej
Garbacea Cristina
Gehrmann Sebastian
Ginter Filip
Gkatzia Dimitra
Hasan Tahmid
Hayashi Hiroaki
Hou Yufang
Jernite Yacine
Jin Di
Jolly Shailza
Juraska Juraj
Kamal Eddine Moussa
Kanerva Jenna
Kriz Reno
Ladhak Faisal
Liu Yixin
Madaan Aman
Mahamood Saad
Mahendiran Abinaya
Maynez Joshua
McMillan-Major Angelina
Mille Simon
Montella Sebastien
Nikolaev Vitaly
Novikova Jekaterina
Osei Salomey
Papangelis Alexandros
Perez-Beltrachini Laura
Pu Liang Paul
Puduppully Ratish
Pushkarna Mahima
Radev Dragomir
Raghavi Chandu Khyathi
Raheja Vipul
Raunak Vikas
Ribeiro Leonardo F. R.
Sang Yisi
Sanjay Kale Mihir
Sedoc João
Shahriyar Rifat
Shen Tianhao
Shvets Anna
Strobelt Hendrik
Subramani Nishant
Thomson Craig
Tsai Vivian
Tunstall Lewis
Upadhyay Ashish
Wang Alex
Wang Dakuo
White Michael
Wilie Bryan
Winata Genta Indra
Xiong Deyi
Xu Ying
Yao Bingsheng
You Chaobin
Zhang Li
Zhou Jiawei
Zhu Qi
Štajner Sanja
Publication venue: The Association for Computational Linguistics
Publication date: 01/01/2022
Field of study

Helsingin yliopiston digitaalinen arkisto

From Word to Sense Embeddings: A Survey on Vector Representations of Meaning

Author: Camacho-Collados Jose
Pilehvar Mohammad Taher
Publication venue
Publication date: 26/10/2018
Field of study

Over the past years, distributed semantic representations have proved to be effective and flexible keepers of prior knowledge to be integrated into downstream applications. This survey focuses on the representation of meaning. We start from the theoretical background behind word vector space models and highlight one of their major limitations: the meaning conflation deficiency, which arises from representing a word with all its possible meanings as a single vector. Then, we explain how this deficiency can be addressed through a transition from the word level to the more fine-grained level of word senses (in its broader acceptation) as a method for modelling unambiguous lexical meaning. We present a comprehensive overview of the wide range of techniques in the two main branches of sense representation, i.e., unsupervised and knowledge-based. Finally, this survey covers the main evaluation procedures and applications for this type of representation, and provides an analysis of four of its important aspects: interpretability, sense granularity, adaptability to different domains and compositionality.Comment: 46 pages, 8 figures. Published in Journal of Artificial Intelligence Researc

arXiv.org e-Print Archive

Online Research @ Cardiff

European Values Study 2008: project and data management

Author: Brislinger Evelyn
Harzenetter Karoline
Hauser Kristina
Kampmann Jara
Kurti Dafina
Luijkx Ruud
Nijs Bik Emile de
Ortmanns Verena
Rokven Josja
Sieben Inge
Solanes Ros Ivet
Stam Kirsten
Vlimmeren Eva van
Weijer Steve van de
Zenk-Möltgen Wolfgang
Publication venue: Köln
Publication date: 20/03/2012
Field of study

SSOAR - Social Science Open Access Repository

Linguistic linked open data and under-resourced languages: from collection to application

Author: Chiarcos Christian
Moran Steven
Publication venue: 'MIT Press - Journals'
Publication date: 27/04/2023
Field of study

OPUS Augsburg