4 research outputs found

    Precision and mathematical form in first and subsequent mentions of numerical facts and their relation to document structure

    Get PDF
    In a corpus study we found that authors vary both mathematical form and precision when expressing numerical quantities. Indeed, within the same document, a quantity is often described vaguely in some places and more accurately in others. Vague descriptions tend to occur early in a document and to be expressed in simpler mathematical forms (e.g., fractions or ratios), whereas more accurate descriptions of the same proportions tend to occur later, often expressed in more complex forms (e.g., decimal percentages). Our results can be used in Natural Language Generation (1) to generate repeat descriptions within the same document, and (2) to generate descriptions of numerical quantities for different audiences according to mathematical ability

    A fact-aligned corpus of numerical expressions

    Get PDF
    We describe a corpus of numerical expressions, developed as part of the NUMGEN project. The corpus contains newspaper articles and scientific papers in which exactly the same numerical facts are presented many times (both within and across texts). Some annotations of numerical facts are original: for example, numbers are automatically classified as round or non-round by an algorithm derived from Jansen and Pollmann (2001); also, numerical hedges such as 'about' or 'a little under' are marked up and classified semantically using arithmetical relations. Through explicit alignment of phrases describing the same fact, the corpus can support research on the influence of various contextual factors (e.g., document position, intended readership) on the way in which numerical facts are expressed. As an example we present results from an investigation showing that when a fact is mentioned more than once in a text, there is a clear tendency for precision to increase from first to subsequent mentions, and for mathematical level either to remain constant or to increase

    The preference for approximation

    Get PDF
    A curious fact about the communication of numerical information is that speakers often choose to use approximate or rounded expressions, even when more precise information is available (for instance, reporting the time as ‘three thirty’ when one’s watch reads 3:27). It has been proposed that this tendency towards rounding is driven by a desire to reduce hearers’ processing costs, a specific claim being that rounded values produce the same cognitive effect at less cognitive effort than non-round values (Van der Henst, Carles and Sperber, 2002). To date, however, the posited processing advantage for roundness has not been experimentally substantiated. Focusing on the domain of temporal expressions, we report on two experiments that demonstrate that rounded clock times are easier to remember and manipulate than their non-round counterparts, a finding that provides evidence for the influence of processing considerations on numerical expression choice. We further find a role for domain-specific granularity of measurement.</jats:p

    Proceedings of the 12th European Workshop on Natural Language Generation (ENLG 2009)

    Get PDF
    corecore