89 research outputs found

    Beyond DICE : measuring the quality of a referring expression

    Get PDF
    This paper discusses ways in which the similarity between the contents of two referring expressions can be measured. Simi- larity metrics of this kind are essential when expressions gen- erated by an algoritm are compared against the ones produced by human speakers, for example as part of an experiment in which referring expressions are elicitated. We discuss argu- ments for and against different metrics, taking our departure from the well-known Dice metric.peer-reviewe

    Semantic similarity and the generation of referring expressions : a first report

    Get PDF
    The past decade, has witnessed renewed interest in the Generation of Referring Expressions (GRE) [23, 24, 8, 9, 10, 12, 22]. Broadening the scope beyond earlier work [3, 4, 5], recent proposals involve algorithms that refer to sets as well as individuals, using operations such as set union (ā€˜the cat and the dogsā€™) and complementation (ā€˜the dog that is not blackā€™). As a consequence, it has become more difficult for a generator to choose among alternative expres- sions that may be coextensive. This paper is part of a concerted effort to shed some empirical light on the question of expressive choice. The focus is on reference to sets, where a referring expression is built by unifying two or more singletons. Starting with descriptions of the form ā€˜the N1 and (the) N2ā€™, we investigate whether the semantic similarity of N1 and N2 is relevant in determining the acceptability of the generated NP. Suppose that, in a given domain, an entity e1 can be referred to as either ā€˜the postgraduateā€™ or ā€˜the psychologistā€™; similarly, e2 can be referred to as either ā€˜the undergraduateā€™ or ā€˜the man on the first floorā€™. Various alternatives exist for an expression referring to {e1, e2}, e.g.: (i) ā€˜the postgraduate and the man on the first floorā€™, (ii) ā€˜the postgraduate and the undergraduateā€™, (iii) ā€˜the psychologist and the undergraduateā€™. Here, (ii) is arguably better than (i) or (iii). Intu- itively, this is because the conjuncts in (ii) are more semantically similar or ā€˜relatedā€™. Moreover, expression (iii) violates the Gricean maxims. The choice of two equally specific [2] but seman- tically unrelated descriptors, ā€˜psychologistā€™ for e1 versus ā€˜undergraduateā€™ for e2, might give rise to (false) implicatures, such as that the two entities have nothing in common, thus violating the Gricean Cooperative Principle, and resulting in a description which is less coherent than it might be. Suppose further that e1 and e2, as well as a third entity e3 referred to as ā€˜the bookā€™, were introduced in a discourse. Subsequent reference to a pair of these entities might be made via a coordinate construction, or some other structure. Considerations of semantic similarity may guide the choice between alternatives; in particular, referring to the set {e1, e2} using an NP conjunction is more felicitous than a similar reference to {e1, e3} (ā€˜the psychologist and the bookā€™). In the latter case, it may be more felicitous to refer to these two entities using different phrases. A third consideration has to do with a userā€™s comprehension of a generated text. If a description gave rise to false implicatures, or simply sounded odd as a result of an infelicitous choice of descriptors, the quality of the text and its comprehensibility would be reduced. We next describe a correlational study which investigated the relationship of semantic similarity and perceived acceptability of conjoined NPs. Our study is closely related in spirit to [13], which also evinces a concern with semantic plausibility and its implications for NLG, albeit in a different domain.peer-reviewe

    Intrinsic vs. extrinsic evaluation measures for referring expression generation

    Get PDF
    In this paper we present research in which we apply (i) the kind of intrinsic evaluation metrics that are characteristic of current comparative HLT evaluation, and (ii) extrinsic, human task-performance evaluations more in keeping with NLG traditions, to 15 systems implementing a language generation task. We analyse the evaluation results and find that there are no significant correlations between intrinsic and extrinsic evaluation measures for this task.peer-reviewe

    Reference and the facilitation of search in spatial domains

    Get PDF
    This is a pre-final version of the article, whose official publication is expected in the winter of 2013-14.Peer reviewedPreprin

    Towards a balanced corpus of multimodal referring expressions in dialogue

    Get PDF
    This paper describes an experiment in which dialogues are elicited through an identification task. Currently we are transcribing the collected data. The primary purpose of the experiment is to test a number of hypotheses regarding both the production and perception of multimodal referring expressions. To achieve this, the experiment was designed such that a number of factors (prior reference, focus of attention, visual attributes and cardinality) were systematically manipulated. We anticipate that the results of the experiment will yield information that can inform the construction of algorithms for the automatic generation of natural and easy-to-understand referring expressions. Moreover, the balanced corpus of multimodal referring expressions that was collected will hopefully become a resource for answering further, as yet unanticipated, questions on the nature of multimodal referring expressions.peer-reviewe
    • ā€¦
    corecore