Search CORE

41 research outputs found

Impact of Locus of Control on Depression of Elderly In Punjab, Pakistan: The Moderating Role of Religosity;An empirical evidence

Author: Khurram Fatima
Mahamood Yahaya
Mat Saad Zarina
Publication venue
Publication date: 11/07/2017
Field of study

The current research was intended to see the impact of locus of control on the depression of elderly in Punjab, Pakistan and the role of religiosity as a moderator.The sample of the present study consisted of 800 older individuals of Pakistan.They were further equally divided into males and females 400 each.Correspondingly, The data on elderly population was chosen from the different areas of Punjab, Pakistan.To measure the level of depression short form of geriatric depression scale was used.To, measure the level of locus of control Levenson Multidimensional locus of control scale was used.The findings of the research describe that there is a positive relation between the level of locus of control and depression of elderly.In contrast, when religiosity as a moderator is introduced then the positive relation becomes negative.Which means the religiosity have a negative impact on depression. The more the person religious, he/she has a low chance of developing depression in its later age

UUM Repository

Automatic Construction of Evaluation Suites for Natural Language Generation Datasets

Author: Dhole Kaustubh
Gangal Varun Prashant
Gehrmann Sebastian
Kale Mihir
Mahamood Saad
Mille Simon
Miltenburg Emiel van
Perez-Beltrachini Laura
Publication venue
Publication date: 01/01/2021
Field of study

Tilburg University Repository

Automatic Construction of Evaluation Suites for Natural Language Generation Datasets

Author: Dhole Kaustubh D.
Gangal Varun
Gehrmann Sebastian
Kale Mihir
Mahamood Saad
Mille Simon
Perez-Beltrachini Laura
van Miltenburg Emiel
Publication venue
Publication date: 01/01/2021
Field of study

Machine learning approaches applied to NLP are often evaluated by summarizing their performance in a single number, for example accuracy. Since most test sets are constructed as an i.i.d. sample from the overall data, this approach overly simplifies the complexity of language and encourages overfitting to the head of the data distribution. As such, rare language phenomena or text about underrepresented groups are not equally included in the evaluation. To encourage more in-depth model analyses, researchers have proposed the use of multiple test sets, also called challenge sets, that assess specific capabilities of a model. In this paper, we develop a framework based on this idea which is able to generate controlled perturbations and identify subsets in text-to-scalar, text-to-text, or data-to-text settings. By applying this framework to the GEM generation benchmark, we propose an evaluation suite made of 80 challenge sets, demonstrate the kinds of analyses that it enables and shed light onto the limits of current generation models

arXiv.org e-Print Archive

Tilburg University Repository

Barriers and enabling factors for error analysis in NLG research

Author: Clinciu Miruna
Dušek Ondřej
Gkatzia Dimitra
Inglis Stephanie
Leppänen Leo
Mahamood Saad
Schoch Stephanie
Thomson Craig
Van Miltenburg Emiel
Wen Luou
Publication venue: 'Linkoping University Electronic Press'
Publication date: 21/02/2023
Field of study

Peer reviewedPublisher PD

Aberdeen University Research

Underreporting of errors in NLG output, and what to do about it

Author: Clinciu Miruna
Dušek Ondřej
Gkatzia Dimitra
Inglis Stephanie
Leppänen Leo
Mahamood Saad
Manning Emma
Schoch Stephanie
Thomson Craig
van Miltenburg Emiel
Wen Luou
Publication venue: The Association for Computational Linguistics
Publication date: 01/08/2021
Field of study

We observe a severe under-reporting of the different kinds of errors that Natural Language Generation systems make. This is a problem, because mistakes are an important indicator of where systems should still be improved. If authors only report overall performance metrics, the research community is left in the dark about the specific weaknesses that are exhibited by `state-of-the-art' research. Next to quantifying the extent of error under-reporting, this position paper provides recommendations for error identification, analysis and reporting.Peer reviewe

Aberdeen University Research

Helsingin yliopiston digitaalinen arkisto

Tilburg University Repository

Missing Information, Unresponsive Authors, Experimental Flaws : The Impossibility of Assessing the Reproducibility of Previous Human Evaluations in NLP

Author: Abercrombie Gavin
Alonso-Moral Jose M.
Arvan Mohammad
Belz Anya
Braggaar Anouck
Cieliebak Mark
Clark Elizabeth
Dinkar Tanvi
Dušek Ondřej
Eger Steffen
Fang Qixiang
Gao Mingqi
Gatt Albert
Gkatzia Dimitra
González-Corbelle Javier
Hovy Dirk
Hürlimann Manuela
Ito Takumi
Kelleher John D.
Klubicka Filip
Krahmer Emiel
Lai Huiyuan
Li Yiru
Mahamood Saad
Mieskes Margot
Mosteiro Pablo
Nissim Malvina
Parde Natalie
Plátek Ondřej
Reiter Ehud
Rieser Verena
Ruan Jie
Tetreault Joel
Thomson Craig
Toral Antonio
van Deemter Kees
van der Lee Chris
van Miltenburg Emiel
Wan Xiaojun
Wanner Leo
Watson Lewis
Yang Diyi
Publication venue: Association for Computational Linguistics
Publication date: 01/01/2023
Field of study

Publisher PD

Aberdeen University Research

Archivio istituzionale della Ricerca - Bocconi

Tilburg University Repository

GEMv2 : Multilingual NLG benchmarking in a single line of code

Author: Adewumi Tosin
Ammanamanch Pawan Sasanka
Bhagavatula Chandra
Bhattacharjee Abhik
Bohnet Bernd
Cahyawijaya Samuel
Cardenas Ronald
Chim Jenny
Clark Elizabeth
Clive Jordan
Creutz Mathias
Daheim Nico
Deutsch Daniel
Dhole Kaustubh
Durmus Esin
Dusek Ondrej
Garbacea Cristina
Gehrmann Sebastian
Ginter Filip
Gkatzia Dimitra
Hasan Tahmid
Hayashi Hiroaki
Hou Yufang
Jernite Yacine
Jin Di
Jolly Shailza
Juraska Juraj
Kamal Eddine Moussa
Kanerva Jenna
Kriz Reno
Ladhak Faisal
Liu Yixin
Madaan Aman
Mahamood Saad
Mahendiran Abinaya
Maynez Joshua
McMillan-Major Angelina
Mille Simon
Montella Sebastien
Nikolaev Vitaly
Novikova Jekaterina
Osei Salomey
Papangelis Alexandros
Perez-Beltrachini Laura
Pu Liang Paul
Puduppully Ratish
Pushkarna Mahima
Radev Dragomir
Raghavi Chandu Khyathi
Raheja Vipul
Raunak Vikas
Ribeiro Leonardo F. R.
Sang Yisi
Sanjay Kale Mihir
Sedoc João
Shahriyar Rifat
Shen Tianhao
Shvets Anna
Strobelt Hendrik
Subramani Nishant
Thomson Craig
Tsai Vivian
Tunstall Lewis
Upadhyay Ashish
Wang Alex
Wang Dakuo
White Michael
Wilie Bryan
Winata Genta Indra
Xiong Deyi
Xu Ying
Yao Bingsheng
You Chaobin
Zhang Li
Zhou Jiawei
Zhu Qi
Štajner Sanja
Publication venue: The Association for Computational Linguistics
Publication date: 01/01/2022
Field of study

Evaluation in machine learning is usually informed by past choices, for example which datasets or metrics to use. This standardization enables the comparison on equal footing using leaderboards, but the evaluation choices become sub-optimal as better alternatives arise. This problem is especially pertinent in natural language generation which requires ever-improving suites of datasets, metrics, and human evaluation to make definitive claims. To make following best model evaluation practices easier, we introduce GEMv2. The new version of the Generation, Evaluation, and Metrics Benchmark introduces a modular infrastructure for dataset, model, and metric developers to benefit from each others work. GEMv2 supports 40 documented datasets in 51 languages. Models for all datasets can be evaluated online and our interactive data card creation and rendering tools make it easier to add new datasets to the living benchmark.Peer reviewe

Helsingin yliopiston digitaalinen arkisto

GEMv2 : Multilingual NLG benchmarking in a single line of code

Author: Adewumi Tosin
Ammanamanch Pawan Sasanka
Bhagavatula Chandra
Bhattacharjee Abhik
Bohnet Bernd
Cahyawijaya Samuel
Cardenas Ronald
Chim Jenny
Clark Elizabeth
Clive Jordan
Creutz Mathias
Daheim Nico
Deutsch Daniel
Dhole Kaustubh
Durmus Esin
Dusek Ondrej
Garbacea Cristina
Gehrmann Sebastian
Ginter Filip
Gkatzia Dimitra
Hasan Tahmid
Hayashi Hiroaki
Hou Yufang
Jernite Yacine
Jin Di
Jolly Shailza
Juraska Juraj
Kamal Eddine Moussa
Kanerva Jenna
Kriz Reno
Ladhak Faisal
Liu Yixin
Madaan Aman
Mahamood Saad
Mahendiran Abinaya
Maynez Joshua
McMillan-Major Angelina
Mille Simon
Montella Sebastien
Nikolaev Vitaly
Novikova Jekaterina
Osei Salomey
Papangelis Alexandros
Perez-Beltrachini Laura
Pu Liang Paul
Puduppully Ratish
Pushkarna Mahima
Radev Dragomir
Raghavi Chandu Khyathi
Raheja Vipul
Raunak Vikas
Ribeiro Leonardo F. R.
Sang Yisi
Sanjay Kale Mihir
Sedoc João
Shahriyar Rifat
Shen Tianhao
Shvets Anna
Strobelt Hendrik
Subramani Nishant
Thomson Craig
Tsai Vivian
Tunstall Lewis
Upadhyay Ashish
Wang Alex
Wang Dakuo
White Michael
Wilie Bryan
Winata Genta Indra
Xiong Deyi
Xu Ying
Yao Bingsheng
You Chaobin
Zhang Li
Zhou Jiawei
Zhu Qi
Štajner Sanja
Publication venue: The Association for Computational Linguistics
Publication date: 01/01/2022
Field of study

Aberdeen University Research

Biblio at Institute of Formal and Applied Linguistics

Helsingin yliopiston digitaalinen arkisto

Missing Information, Unresponsive Authors, Experimental Flaws: The Impossibility of Assessing the Reproducibility of Previous Human Evaluations in NLP

We report our efforts in identifying a set of previous human evaluations in NLP that would be suitable for a coordinated study examining what makes human evaluations in NLP more/less reproducible. We present our results and findings, which include that just 13% of papers had (i) sufficiently low barriers to reproduction, and (ii) enough obtainable information, to be considered for reproduction, and that all but one of the experiments we selected for reproduction was discovered to have flaws that made the meaningfulness of conducting a reproduction questionable. As a result, we had to change our coordinated study design from a reproduce approach to a standardise-then-reproduce-twice approach. Our overall (negative) finding that the great majority of human evaluations in NLP is not repeatable and/or not reproducible and/or too flawed to justify reproduction, paints a dire picture, but presents an opportunity for a rethink about how to design and report human evaluations in NLP

Utrecht University Repository