41 research outputs found
Impact of Locus of Control on Depression of Elderly In Punjab, Pakistan: The Moderating Role of Religosity;An empirical evidence
The current research was intended to see the impact of locus of control on the depression of
elderly in Punjab, Pakistan and the role of religiosity as a moderator.The sample of the present study consisted of 800 older individuals of Pakistan.They were further equally divided into males and females 400 each.Correspondingly, The data on elderly population was chosen from the different areas of Punjab, Pakistan.To measure the level of depression short form of geriatric depression scale was used.To, measure the level of locus of control Levenson Multidimensional locus of control scale was used.The findings of the research describe that there is a positive relation between the level of locus of control and depression of elderly.In contrast, when religiosity as a moderator is introduced then the positive relation becomes negative.Which means the religiosity have a negative impact on depression. The more the person religious, he/she has a low chance of developing depression in its later age
Automatic Construction of Evaluation Suites for Natural Language Generation Datasets
Machine learning approaches applied to NLP are often evaluated by summarizing
their performance in a single number, for example accuracy. Since most test
sets are constructed as an i.i.d. sample from the overall data, this approach
overly simplifies the complexity of language and encourages overfitting to the
head of the data distribution. As such, rare language phenomena or text about
underrepresented groups are not equally included in the evaluation. To
encourage more in-depth model analyses, researchers have proposed the use of
multiple test sets, also called challenge sets, that assess specific
capabilities of a model. In this paper, we develop a framework based on this
idea which is able to generate controlled perturbations and identify subsets in
text-to-scalar, text-to-text, or data-to-text settings. By applying this
framework to the GEM generation benchmark, we propose an evaluation suite made
of 80 challenge sets, demonstrate the kinds of analyses that it enables and
shed light onto the limits of current generation models
Barriers and enabling factors for error analysis in NLG research
Peer reviewedPublisher PD
Underreporting of errors in NLG output, and what to do about it
We observe a severe under-reporting of the different kinds of errors that Natural Language Generation systems make. This is a problem, because mistakes are an important indicator of where systems should still be improved. If authors only report overall performance metrics, the research community is left in the dark about the specific weaknesses that are exhibited by `state-of-the-art' research. Next to quantifying the extent of error under-reporting, this position paper provides recommendations for error identification, analysis and reporting.Peer reviewe
GEMv2 : Multilingual NLG benchmarking in a single line of code
Evaluation in machine learning is usually informed by past choices, for example which datasets or metrics to use. This standardization enables the comparison on equal footing using leaderboards, but the evaluation choices become sub-optimal as better alternatives arise. This problem is especially pertinent in natural language generation which requires ever-improving suites of datasets, metrics, and human evaluation to make definitive claims. To make following best model evaluation practices easier, we introduce GEMv2. The new version of the Generation, Evaluation, and Metrics Benchmark introduces a modular infrastructure for dataset, model, and metric developers to benefit from each others work. GEMv2 supports 40 documented datasets in 51 languages. Models for all datasets can be evaluated online and our interactive data card creation and rendering tools make it easier to add new datasets to the living benchmark.Peer reviewe
GEMv2 : Multilingual NLG benchmarking in a single line of code
Evaluation in machine learning is usually informed by past choices, for example which datasets or metrics to use. This standardization enables the comparison on equal footing using leaderboards, but the evaluation choices become sub-optimal as better alternatives arise. This problem is especially pertinent in natural language generation which requires ever-improving suites of datasets, metrics, and human evaluation to make definitive claims. To make following best model evaluation practices easier, we introduce GEMv2. The new version of the Generation, Evaluation, and Metrics Benchmark introduces a modular infrastructure for dataset, model, and metric developers to benefit from each others work. GEMv2 supports 40 documented datasets in 51 languages. Models for all datasets can be evaluated online and our interactive data card creation and rendering tools make it easier to add new datasets to the living benchmark.Peer reviewe
Missing Information, Unresponsive Authors, Experimental Flaws: The Impossibility of Assessing the Reproducibility of Previous Human Evaluations in NLP
We report our efforts in identifying a set of previous human evaluations in NLP that would be suitable for a coordinated study examining what makes human evaluations in NLP more/less reproducible. We present our results and findings, which include that just 13% of papers had (i) sufficiently low barriers to reproduction, and (ii) enough obtainable information, to be considered for reproduction, and that all but one of the experiments we selected for reproduction was discovered to have flaws that made the meaningfulness of conducting a reproduction questionable. As a result, we had to change our coordinated study design from a reproduce approach to a standardise-then-reproduce-twice approach. Our overall (negative) finding that the great majority of human evaluations in NLP is not repeatable and/or not reproducible and/or too flawed to justify reproduction, paints a dire picture, but presents an opportunity for a rethink about how to design and report human evaluations in NLP