814 research outputs found

    Detecting and ordering adjectival scalemates

    Get PDF
    This paper presents a pattern-based method that can be used to infer adjectival scales, such as , from a corpus. Specifically, the proposed method uses lexical patterns to automatically identify and order pairs of scalemates, followed by a filtering phase in which unrelated pairs are discarded. For the filtering phase, several different similarity measures are implemented and compared. The model presented in this paper is evaluated using the current standard, along with a novel evaluation set, and shown to be at least as good as the current state-of-the-art.Comment: Paper presented at MAPLEX 2015, February 9-10, Yamagata, Japan (http://lang.cs.tut.ac.jp/maplex2015/

    On the use of human reference data for evaluating automatic image descriptions

    Get PDF
    Automatic image description systems are commonly trained and evaluated using crowdsourced, human-generated image descriptions. The best-performing system is then determined using some measure of similarity to the reference data (BLEU, Meteor, CIDER, etc). Thus, both the quality of the systems as well as the quality of the evaluation depends on the quality of the descriptions. As Section 2 will show, the quality of current image description datasets is insufficient. I argue that there is a need for more detailed guidelines that take into account the needs of visually impaired users, but also the feasibility of generating suitable descriptions. With high-quality data, evaluation of image description systems could use reference descriptions, but we should also look for alternatives.Comment: Originally presented as a (non-archival) poster at the VizWiz 2020 workshop, collocated with CVPR 2020. See: https://vizwiz.org/workshops/2020-workshop

    Evaluating NLG systems: A brief introduction

    Full text link
    This year the International Conference on Natural Language Generation (INLG) will feature an award for the paper with the best evaluation. The purpose of this award is to provide an incentive for NLG researchers to pay more attention to the way they assess the output of their systems. This essay provides a short introduction to evaluation in NLG, explaining key terms and distinctions.Comment: To be published on the INLG2023 conference websit

    Exploring and visualizing distributional models

    Get PDF

    Cross-linguistic differences and similarities in image descriptions

    Get PDF
    Automatic image description systems are commonly trained and evaluated on large image description datasets. Recently, researchers have started to collect such datasets for languages other than English. An unexplored question is how different these datasets are from English and, if there are any differences, what causes them to differ. This paper provides a cross-linguistic comparison of Dutch, English, and German image descriptions. We find that these descriptions are similar in many respects, but the familiarity of crowd workers with the subjects of the images has a noticeable influence on description specificity.Comment: Accepted for INLG 2017, Santiago de Compostela, Spain, 4-7 September, 2017. Camera-ready version. See the ACL anthology for full bibliographic informatio

    101 things to do: unravelling and interpreting community policing

    Get PDF
    There is a lively and long-running debate in the literature about what community policing is and how it works in everyday practice. We contribute to this expanding body of knowledge by minutely sifting and classifying the things neighbourhood coordinators (a kind of community officers) do in Amsterdam, the Netherlands. Our endeavours have resulted in a list of 101 tasks they perform. A ranking of tasks was printed on small plasticized cards, enabling neighbourhood coordinators and their managers to identify core and peripheral tasks. Core tasks include keeping contact with citizens, local safety issues (supervising the neighbourhood, signalling small problems, handling accidents and incidents, and conflict mediation), administrative duties and providing the police team with information. Peripheral tasks mostly take the shape of supportive (managerial) work. In addition, we interviewed neighbourhood coordinators and police ward managers to gain their views on community policing

    Preregistering NLP Research

    Full text link
    Preregistration refers to the practice of specifying what you are going to do, and what you expect to find in your study, before carrying out the study. This practice is increasingly common in medicine and psychology, but is rarely discussed in NLP. This paper discusses preregistration in more detail, explores how NLP researchers could preregister their work, and presents several preregistration questions for different kinds of studies. Finally, we argue in favour of registered reports, which could provide firmer grounds for slow science in NLP research. The goal of this paper is to elicit a discussion in the NLP community, which we hope to synthesise into a general NLP preregistration form in future research.Comment: Accepted at NAACL2021; pre-final draft, comments welcom

    Talking about other people:An endless range of possibilities

    Get PDF
    Image description datasets, such as Flickr30K and MS COCO, show a high degree of variation in the ways that crowd-workers talk about the world. Although this gives us a rich and diverse collection of data to work with, it also introduces uncertainty about how the world should be described. This paper shows the extent of this uncertainty in the PEOPLE-domain. We present a taxonomy of different ways to talk about other people. This taxonomy serves as a reference point to think about how other people should be described, and can be used to classify and compute statistics about labels applied to people
    • …
    corecore