89 research outputs found
Beyond DICE : measuring the quality of a referring expression
This paper discusses ways in which the similarity between the
contents of two referring expressions can be measured. Simi-
larity metrics of this kind are essential when expressions gen-
erated by an algoritm are compared against the ones produced
by human speakers, for example as part of an experiment in
which referring expressions are elicitated. We discuss argu-
ments for and against different metrics, taking our departure
from the well-known Dice metric.peer-reviewe
Semantic similarity and the generation of referring expressions : a first report
The past decade, has witnessed renewed interest in the Generation of Referring Expressions
(GRE) [23, 24, 8, 9, 10, 12, 22]. Broadening the scope beyond earlier work [3, 4, 5], recent
proposals involve algorithms that refer to sets as well as individuals, using operations such as
set union (āthe cat and the dogsā) and complementation (āthe dog that is not blackā). As a
consequence, it has become more difficult for a generator to choose among alternative expres-
sions that may be coextensive. This paper is part of a concerted effort to shed some empirical
light on the question of expressive choice. The focus is on reference to sets, where a referring
expression is built by unifying two or more singletons. Starting with descriptions of the form
āthe N1 and (the) N2ā, we investigate whether the semantic similarity of N1 and N2 is relevant
in determining the acceptability of the generated NP.
Suppose that, in a given domain, an entity e1 can be referred to as either āthe postgraduateā
or āthe psychologistā; similarly, e2 can be referred to as either āthe undergraduateā or āthe man
on the first floorā. Various alternatives exist for an expression referring to {e1, e2}, e.g.: (i) āthe
postgraduate and the man on the first floorā, (ii) āthe postgraduate and the undergraduateā, (iii)
āthe psychologist and the undergraduateā. Here, (ii) is arguably better than (i) or (iii). Intu-
itively, this is because the conjuncts in (ii) are more semantically similar or ārelatedā. Moreover,
expression (iii) violates the Gricean maxims. The choice of two equally specific [2] but seman-
tically unrelated descriptors, āpsychologistā for e1 versus āundergraduateā for e2, might give rise
to (false) implicatures, such as that the two entities have nothing in common, thus violating
the Gricean Cooperative Principle, and resulting in a description which is less coherent than it
might be. Suppose further that e1 and e2, as well as a third entity e3 referred to as āthe bookā,
were introduced in a discourse. Subsequent reference to a pair of these entities might be made
via a coordinate construction, or some other structure. Considerations of semantic similarity
may guide the choice between alternatives; in particular, referring to the set {e1, e2} using an
NP conjunction is more felicitous than a similar reference to {e1, e3} (āthe psychologist and the
bookā). In the latter case, it may be more felicitous to refer to these two entities using different
phrases. A third consideration has to do with a userās comprehension of a generated text. If a
description gave rise to false implicatures, or simply sounded odd as a result of an infelicitous
choice of descriptors, the quality of the text and its comprehensibility would be reduced. We
next describe a correlational study which investigated the relationship of semantic similarity and
perceived acceptability of conjoined NPs. Our study is closely related in spirit to [13], which also evinces a concern with semantic plausibility and its implications for NLG, albeit in a different
domain.peer-reviewe
Intrinsic vs. extrinsic evaluation measures for referring expression generation
In this paper we present research in which we apply (i) the kind of intrinsic evaluation metrics that are characteristic of current comparative HLT evaluation, and (ii) extrinsic, human task-performance evaluations more in keeping with NLG traditions, to 15 systems implementing a language generation task. We analyse the evaluation results and find that there are no significant correlations between intrinsic and extrinsic evaluation measures for this task.peer-reviewe
Reference and the facilitation of search in spatial domains
This is a pre-final version of the article, whose official publication is expected in the winter of 2013-14.Peer reviewedPreprin
Towards a balanced corpus of multimodal referring expressions in dialogue
This paper describes an experiment in which dialogues are elicited through an identification task. Currently we are transcribing the collected data. The primary purpose of the experiment is to test a number of hypotheses regarding both the production and perception of multimodal referring expressions. To achieve this, the experiment was designed such that a number of factors (prior reference, focus of attention, visual attributes and cardinality) were systematically manipulated. We anticipate that the results of the experiment will yield information that can inform the construction of algorithms for the automatic generation of natural and easy-to-understand referring expressions. Moreover, the balanced corpus of multimodal referring expressions that was collected will hopefully become a resource for answering further, as yet unanticipated, questions on the nature of multimodal referring expressions.peer-reviewe
- ā¦