134,215 research outputs found

    RankME: Reliable Human Ratings for Natural Language Generation

    Full text link
    Human evaluation for natural language generation (NLG) often suffers from inconsistent user ratings. While previous research tends to attribute this problem to individual user preferences, we show that the quality of human judgements can also be improved by experimental design. We present a novel rank-based magnitude estimation method (RankME), which combines the use of continuous scales and relative assessments. We show that RankME significantly improves the reliability and consistency of human ratings compared to traditional evaluation methods. In addition, we show that it is possible to evaluate NLG systems according to multiple, distinct criteria, which is important for error analysis. Finally, we demonstrate that RankME, in combination with Bayesian estimation of system quality, is a cost-effective alternative for ranking multiple NLG systems.Comment: Accepted to NAACL 2018 (The 2018 Conference of the North American Chapter of the Association for Computational Linguistics

    English language knowledge of first-year university students on performance-based tests

    Get PDF
    Come espresso nelle linee guida del MIUR, gli obiettivi previsti nel curriculum di lingua straniera per gli studenti del 5° anno del liceo corrispondono al livello B2 del Quadro comune europeo di riferimento per la conoscenza delle lingue (QCER). In questo livello gli studenti dovrebbero dimostrare un accettabile livello di fluency linguistica. Questo articolo si occupa della capacità degli studenti del primo anno di università di saper utilizzare la propria competenza in lingua inglese nell’espletamento di compiti autentici. Il test costruito e somministrato valuta la conoscenza linguistica degli studenti a livello B2 mediante due compiti e scale di valutazione olistiche e analitiche basate sul Framework delle competenze linguistiche di Bachman e Palmer. Insieme al test è stato somministrato il questionario. Il risultato rivela che il 23% degli studenti che hanno completato il test soddisfano i requisiti del MIUR.As stated by the Guidelines of the Italian Ministry of Education, the aims and objectives of the fifth-year foreign language curriculum of lyceums correspond to the B2 level of the Common European Framework of Reference for languages (CEFR). At this level, students are expected to demonstrate an acceptable level of fluency in writing and speaking. This paper addresses the issue of the ability of first-year university students to employ their English language knowledge to perform authentic tasks, such as writing an enquiry email. The test designed and administered to gather data aims at evaluating student knowledge at a B2 level, by means of two tasks and holistic and analytic rating scales based on Bachman and Palmer’s framework of language competence. At the same time, a student questionnaire was administered. The results reveal that 23% of students who have completed the test meet the requirements of the Ministry of Education

    Measurement with Persons: A European Network

    Get PDF
    The European ‘Measuring the Impossible’ Network MINET promotes new research activities in measurement dependent on human perception and/or interpretation. This includes the perceived attributes of products and services, such as quality or desirability, and societal parameters such as security and well-being. Work has aimed at consensus about four ‘generic’ metrological issues: (1) Measurement Concepts & Terminology; (2) Measurement Techniques: (3) Measurement Uncertainty; and (4) Decision-making & Impact Assessment, and how these can be applied specificallyto the ‘Measurement of Persons’ in terms of ‘Man as a Measurement Instrument’ and ‘Measuring Man.’ Some of the main achievements of MINET include a research repository with glossary; training course; book; series of workshops;think tanks and study visits, which have brought together a unique constellation of researchers from physics, metrology,physiology, psychophysics, psychology and sociology. Metrology (quality-assured measurement) in this area is relativelyunderdeveloped, despite great potential for innovation, and extends beyond traditional physiological metrology in thatit also deals with measurement with all human senses as well as mental and behavioral processes. This is particularlyrelevant in applications where humans are an important component of critical systems, where for instance health andsafety are at stake

    The Classroom Observation Schedule to Measure Intentional Communication (COSMIC): An observational measure of the intentional communication of children with autism in an unstructured classroom setting

    Get PDF
    The Classroom Observation Schedule to Measure Intentional Communication (COSMIC) was devised to provide ecologically valid outcome measures for a communication-focused intervention trial. Ninety-one children with autism spectrum disorder aged 6 years 10 months (SD 16 months) were videoed during their everyday snack, teaching and free play activities. Inter-rater reliability was high and relevant items showed significant associations with comparable items from concurrent Autism Diagnostic Observation Schedule – Generic (Lord et al., 2000) assessments. In a subsample of 28 children initial differences in rates of initiations, initiated speech/vocalisation and commenting were predictive of language and communication competence 15 months later. Results suggest that the use of observational measures of intentional communication in natural settings is a valuable assessment strategy for research and clinical practice
    • 

    corecore