8,772 research outputs found

    Generating Weather Forecast Texts with Case Based Reasoning

    Full text link
    Several techniques have been used to generate weather forecast texts. In this paper, case based reasoning (CBR) is proposed for weather forecast text generation because similar weather conditions occur over time and should have similar forecast texts. CBR-METEO, a system for generating weather forecast texts was developed using a generic framework (jCOLIBRI) which provides modules for the standard components of the CBR architecture. The advantage in a CBR approach is that systems can be built in minimal time with far less human effort after initial consultation with experts. The approach depends heavily on the goodness of the retrieval and revision components of the CBR process. We evaluated CBRMETEO with NIST, an automated metric which has been shown to correlate well with human judgements for this domain. The system shows comparable performance with other NLG systems that perform the same task.Comment: 6 page

    Acquiring Correct Knowledge for Natural Language Generation

    Full text link
    Natural language generation (NLG) systems are computer software systems that produce texts in English and other human languages, often from non-linguistic input data. NLG systems, like most AI systems, need substantial amounts of knowledge. However, our experience in two NLG projects suggests that it is difficult to acquire correct knowledge for NLG systems; indeed, every knowledge acquisition (KA) technique we tried had significant problems. In general terms, these problems were due to the complexity, novelty, and poorly understood nature of the tasks our systems attempted, and were worsened by the fact that people write so differently. This meant in particular that corpus-based KA approaches suffered because it was impossible to assemble a sizable corpus of high-quality consistent manually written texts in our domains; and structured expert-oriented KA techniques suffered because experts disagreed and because we could not get enough information about special and unusual cases to build robust systems. We believe that such problems are likely to affect many other NLG systems as well. In the long term, we hope that new KA techniques may emerge to help NLG system builders. In the shorter term, we believe that understanding how individual KA techniques can fail, and using a mixture of different KA techniques with different strengths and weaknesses, can help developers acquire NLG knowledge that is mostly correct

    Using spatial reference frames to generate grounded textual summaries of georeferenced data

    Get PDF
    Summarising georeferenced (can be identified according to it’s location) data in natural language is challenging because it requires linking events describing its nongeographic attributes to their underlying geography. This mapping is not straightforward as often the only explicit geographic information such data contains is latitude and longitude. In this paper we present an approach to generating textual summaries of georeferenced data based on spatial reference frames. This approach has been implemented in a data-to-text system we have deployed in the weather forecasting domain.

    The ins and outs of participation in a weather information system

    Get PDF
    In this paper our aim is to show even though access to technology, information or data holds the potential for improved participation, participation is wired into a larger network of actors, artefacts and information practices. We draw on a case study of a weather information system developed and implemented by a non-profit organisation to both describe the configuration of participation, but also critically assess inclusion and exclusion. We present a set of four questions - a basic, practical toolkit - by which we together with the organisation made sense of and evaluated participation in the system

    Application of fuzzy sets in data-to-text system

    Get PDF
    This PhD dissertation addresses the convergence of two distinct paradigms: fuzzy sets and natural language generation. The object of study is the integration of fuzzy set-derived techniques that model imprecision and uncertainty in human language into systems that generate textual information from numeric data, commonly known as data-to-text systems. This dissertation covers an extensive state of the art review, potential convergence points, two real data-to-text applications that integrate fuzzy sets (in the meteorology and learning analytics domains), and a model that encompasses the most relevant elements in the linguistic description of data discipline and provides a framework for building and integrating fuzzy set-based approaches into natural language generation/data-to-ext systems

    Survey of the State of the Art in Natural Language Generation: Core tasks, applications and evaluation

    Get PDF
    This paper surveys the current state of the art in Natural Language Generation (NLG), defined as the task of generating text or speech from non-linguistic input. A survey of NLG is timely in view of the changes that the field has undergone over the past decade or so, especially in relation to new (usually data-driven) methods, as well as new applications of NLG technology. This survey therefore aims to (a) give an up-to-date synthesis of research on the core tasks in NLG and the architectures adopted in which such tasks are organised; (b) highlight a number of relatively recent research topics that have arisen partly as a result of growing synergies between NLG and other areas of artificial intelligence; (c) draw attention to the challenges in NLG evaluation, relating them to similar challenges faced in other areas of Natural Language Processing, with an emphasis on different evaluation methods and the relationships between them.Comment: Published in Journal of AI Research (JAIR), volume 61, pp 75-170. 118 pages, 8 figures, 1 tabl

    A fact-aligned corpus of numerical expressions

    Get PDF
    We describe a corpus of numerical expressions, developed as part of the NUMGEN project. The corpus contains newspaper articles and scientific papers in which exactly the same numerical facts are presented many times (both within and across texts). Some annotations of numerical facts are original: for example, numbers are automatically classified as round or non-round by an algorithm derived from Jansen and Pollmann (2001); also, numerical hedges such as 'about' or 'a little under' are marked up and classified semantically using arithmetical relations. Through explicit alignment of phrases describing the same fact, the corpus can support research on the influence of various contextual factors (e.g., document position, intended readership) on the way in which numerical facts are expressed. As an example we present results from an investigation showing that when a fact is mentioned more than once in a text, there is a clear tendency for precision to increase from first to subsequent mentions, and for mathematical level either to remain constant or to increase

    A Bayesian framework for verification and recalibration of ensemble forecasts: How uncertain is NAO predictability?

    Get PDF
    Predictability estimates of ensemble prediction systems are uncertain due to limited numbers of past forecasts and observations. To account for such uncertainty, this paper proposes a Bayesian inferential framework that provides a simple 6-parameter representation of ensemble forecasting systems and the corresponding observations. The framework is probabilistic, and thus allows for quantifying uncertainty in predictability measures such as correlation skill and signal-to-noise ratios. It also provides a natural way to produce recalibrated probabilistic predictions from uncalibrated ensembles forecasts. The framework is used to address important questions concerning the skill of winter hindcasts of the North Atlantic Oscillation for 1992-2011 issued by the Met Office GloSea5 climate prediction system. Although there is much uncertainty in the correlation between ensemble mean and observations, there is strong evidence of skill: the 95% credible interval of the correlation coefficient of [0.19,0.68] does not overlap zero. There is also strong evidence that the forecasts are not exchangeable with the observations: With over 99% certainty, the signal-to-noise ratio of the forecasts is smaller than the signal-to-noise ratio of the observations, which suggests that raw forecasts should not be taken as representative scenarios of the observations. Forecast recalibration is thus required, which can be coherently addressed within the proposed framework.Comment: 36 pages, 10 figure
    • …
    corecore