825 research outputs found
Detecting and ordering adjectival scalemates
This paper presents a pattern-based method that can be used to infer
adjectival scales, such as , from a corpus. Specifically,
the proposed method uses lexical patterns to automatically identify and order
pairs of scalemates, followed by a filtering phase in which unrelated pairs are
discarded. For the filtering phase, several different similarity measures are
implemented and compared. The model presented in this paper is evaluated using
the current standard, along with a novel evaluation set, and shown to be at
least as good as the current state-of-the-art.Comment: Paper presented at MAPLEX 2015, February 9-10, Yamagata, Japan
(http://lang.cs.tut.ac.jp/maplex2015/
On the use of human reference data for evaluating automatic image descriptions
Automatic image description systems are commonly trained and evaluated using
crowdsourced, human-generated image descriptions. The best-performing system is
then determined using some measure of similarity to the reference data (BLEU,
Meteor, CIDER, etc). Thus, both the quality of the systems as well as the
quality of the evaluation depends on the quality of the descriptions. As
Section 2 will show, the quality of current image description datasets is
insufficient. I argue that there is a need for more detailed guidelines that
take into account the needs of visually impaired users, but also the
feasibility of generating suitable descriptions. With high-quality data,
evaluation of image description systems could use reference descriptions, but
we should also look for alternatives.Comment: Originally presented as a (non-archival) poster at the VizWiz 2020
workshop, collocated with CVPR 2020. See:
https://vizwiz.org/workshops/2020-workshop
Evaluating NLG systems: A brief introduction
This year the International Conference on Natural Language Generation (INLG)
will feature an award for the paper with the best evaluation. The purpose of
this award is to provide an incentive for NLG researchers to pay more attention
to the way they assess the output of their systems. This essay provides a short
introduction to evaluation in NLG, explaining key terms and distinctions.Comment: To be published on the INLG2023 conference websit
Cross-linguistic differences and similarities in image descriptions
Automatic image description systems are commonly trained and evaluated on
large image description datasets. Recently, researchers have started to collect
such datasets for languages other than English. An unexplored question is how
different these datasets are from English and, if there are any differences,
what causes them to differ. This paper provides a cross-linguistic comparison
of Dutch, English, and German image descriptions. We find that these
descriptions are similar in many respects, but the familiarity of crowd workers
with the subjects of the images has a noticeable influence on description
specificity.Comment: Accepted for INLG 2017, Santiago de Compostela, Spain, 4-7 September,
2017. Camera-ready version. See the ACL anthology for full bibliographic
informatio
101 things to do: unravelling and interpreting community policing
There is a lively and long-running debate in the literature about what community policing is and how it works in everyday practice. We contribute to this expanding body of knowledge by minutely sifting and classifying the things neighbourhood coordinators (a kind of community officers) do in Amsterdam, the Netherlands. Our endeavours have resulted in a list of 101 tasks they perform. A ranking of tasks was printed on small plasticized cards, enabling neighbourhood coordinators and their managers to identify core and peripheral tasks. Core tasks include keeping contact with citizens, local safety issues (supervising the neighbourhood, signalling small problems, handling accidents and incidents, and conflict mediation), administrative duties and providing the police team with information. Peripheral tasks mostly take the shape of supportive (managerial) work. In addition, we interviewed neighbourhood coordinators and police ward managers to gain their views on community policing
Preregistering NLP Research
Preregistration refers to the practice of specifying what you are going to
do, and what you expect to find in your study, before carrying out the study.
This practice is increasingly common in medicine and psychology, but is rarely
discussed in NLP. This paper discusses preregistration in more detail, explores
how NLP researchers could preregister their work, and presents several
preregistration questions for different kinds of studies. Finally, we argue in
favour of registered reports, which could provide firmer grounds for slow
science in NLP research. The goal of this paper is to elicit a discussion in
the NLP community, which we hope to synthesise into a general NLP
preregistration form in future research.Comment: Accepted at NAACL2021; pre-final draft, comments welcom
Talking about other people:An endless range of possibilities
Image description datasets, such as Flickr30K and MS COCO, show a high degree of variation in the ways that crowd-workers talk about the world. Although this gives us a rich and diverse collection of data to work with, it also introduces uncertainty about how the world should be described. This paper shows the extent of this uncertainty in the PEOPLE-domain. We present a taxonomy of different ways to talk about other people. This taxonomy serves as a reference point to think about how other people should be described, and can be used to classify and compute statistics about labels applied to people
- …