Search CORE

13,488 research outputs found

Recommended from our members

Rethinking the Agreement in Human Evaluation Tasks

Author: Amidei Jacopo
Piwek Paul
Willis Alistair
Publication venue
Publication date: 01/01/2018
Field of study

Human evaluations are broadly thought to be more valuable the higher the inter-annotator agreement. In this paper we examine this idea. We will describe our experiments and analysis within the area of Automatic Question Generation. Our experiments show how annotators diverge in language annotation tasks due to a range of ineliminable factors. For this reason, we believe that annotation schemes for natural language generation tasks that are aimed at evaluating language quality need to be treated with great care. In particular, an unchecked focus on reduction of disagreement among annotators runs the danger of creating generation goals that reward output that is more distant from, rather than closer to, natural human-like language. We conclude the paper by suggesting a new approach to the use of the agreement metrics in natural language generation evaluation tasks

Open Research Online (The Open University)

What does validation of cases in electronic record databases mean? The potential contribution of free text

Author: Cassell Jackie A
Koeling Rob
Nicholson Amanda
Tate Anne Rosemary
Publication venue: 'Wiley'
Publication date: 01/03/2011
Field of study

Electronic health records are increasingly used for research. The definition of cases or endpoints often relies on the use of coded diagnostic data, using a pre-selected group of codes. Validation of these cases, as ‘true’ cases of the disease, is crucial. There are, however, ambiguities in what is meant by validation in the context of electronic records. Validation usually implies comparison of a definition against a gold standard of diagnosis and the ability to identify false negatives (‘true’ cases which were not detected) as well as false positives (detected cases which did not have the condition). We argue that two separate concepts of validation are often conflated in existing studies. Firstly, whether the GP thought the patient was suffering from a particular condition (which we term confirmation or internal validation) and secondly, whether the patient really had the condition (external validation). Few studies have the ability to detect false negatives who have not received a diagnostic code. Natural language processing is likely to open up the use of free text within the electronic record which will facilitate both the validation of the coded diagnosis and searching for false negatives

Crossref

PubMed Central

Lancaster E-Prints

Sussex Research Online

Comparing automatically detected reflective texts with human judgements

Author: Scott Peter
Ullmann Thomas Daniel
Wild Fridolin
Publication venue
Publication date: 01/01/2012
Field of study

This paper reports on the descriptive results of an experiment comparing automatically detected reﬂective and not-reﬂective texts against human judgements. Based on the theory of reﬂective writing assessment and their operationalisation ﬁve elements of reﬂection were deﬁned. For each element of reﬂection a set of indicators was developed, which automatically annotate texts regarding reﬂection based on the parameterisation with authoritative texts. Using a large blog corpus 149 texts were retrieved, which were either annotated as reﬂective or notreﬂective. An online survey was then used to gather human judgements for these texts. These two data sets were used to compare the quality of the reﬂection detection algorithm with human judgments. The analysis indicates the expected diﬀerence between reﬂective and not reﬂective texts

OPUS - University of Technology Sydney

Open Research Online (The Open University)