8,772 research outputs found
Generating Weather Forecast Texts with Case Based Reasoning
Several techniques have been used to generate weather forecast texts. In this
paper, case based reasoning (CBR) is proposed for weather forecast text
generation because similar weather conditions occur over time and should have
similar forecast texts. CBR-METEO, a system for generating weather forecast
texts was developed using a generic framework (jCOLIBRI) which provides modules
for the standard components of the CBR architecture. The advantage in a CBR
approach is that systems can be built in minimal time with far less human
effort after initial consultation with experts. The approach depends heavily on
the goodness of the retrieval and revision components of the CBR process. We
evaluated CBRMETEO with NIST, an automated metric which has been shown to
correlate well with human judgements for this domain. The system shows
comparable performance with other NLG systems that perform the same task.Comment: 6 page
Acquiring Correct Knowledge for Natural Language Generation
Natural language generation (NLG) systems are computer software systems that
produce texts in English and other human languages, often from non-linguistic
input data. NLG systems, like most AI systems, need substantial amounts of
knowledge. However, our experience in two NLG projects suggests that it is
difficult to acquire correct knowledge for NLG systems; indeed, every knowledge
acquisition (KA) technique we tried had significant problems. In general terms,
these problems were due to the complexity, novelty, and poorly understood
nature of the tasks our systems attempted, and were worsened by the fact that
people write so differently. This meant in particular that corpus-based KA
approaches suffered because it was impossible to assemble a sizable corpus of
high-quality consistent manually written texts in our domains; and structured
expert-oriented KA techniques suffered because experts disagreed and because we
could not get enough information about special and unusual cases to build
robust systems. We believe that such problems are likely to affect many other
NLG systems as well. In the long term, we hope that new KA techniques may
emerge to help NLG system builders. In the shorter term, we believe that
understanding how individual KA techniques can fail, and using a mixture of
different KA techniques with different strengths and weaknesses, can help
developers acquire NLG knowledge that is mostly correct
Atlas.txt : Linking Geo-referenced Data to Text for NLG
Peer reviewedPreprin
Using spatial reference frames to generate grounded textual summaries of georeferenced data
Summarising georeferenced (can be identified according to it’s location) data in natural language is challenging because it requires linking events describing its nongeographic attributes to their underlying geography. This mapping is not straightforward as often the only explicit geographic information such data contains is latitude and longitude. In this paper we present an approach to generating textual summaries of georeferenced data based on spatial reference frames. This approach has been implemented in a data-to-text system we have deployed in the weather forecasting domain.
The ins and outs of participation in a weather information system
In this paper our aim is to show even though access to technology, information or data holds the potential for improved participation, participation is wired into a larger network of actors, artefacts and information practices. We draw on a case study of a weather information system developed and implemented by a non-profit organisation to both describe the configuration of participation, but also critically assess inclusion and exclusion. We present a set of four questions - a basic, practical toolkit - by which we together with the organisation made sense of and evaluated participation in the system
Application of fuzzy sets in data-to-text system
This PhD dissertation addresses the convergence of two distinct paradigms: fuzzy sets and natural language generation. The object of study is the integration of fuzzy set-derived techniques that model imprecision and uncertainty in human language into systems that generate textual information from numeric data, commonly known as data-to-text systems. This dissertation covers an extensive state of the art review, potential convergence points, two real data-to-text applications that integrate fuzzy sets (in the meteorology and learning analytics domains), and a model that encompasses the most relevant elements in the linguistic description of data discipline and provides a framework for building and integrating fuzzy set-based approaches into natural language generation/data-to-ext systems
Survey of the State of the Art in Natural Language Generation: Core tasks, applications and evaluation
This paper surveys the current state of the art in Natural Language
Generation (NLG), defined as the task of generating text or speech from
non-linguistic input. A survey of NLG is timely in view of the changes that the
field has undergone over the past decade or so, especially in relation to new
(usually data-driven) methods, as well as new applications of NLG technology.
This survey therefore aims to (a) give an up-to-date synthesis of research on
the core tasks in NLG and the architectures adopted in which such tasks are
organised; (b) highlight a number of relatively recent research topics that
have arisen partly as a result of growing synergies between NLG and other areas
of artificial intelligence; (c) draw attention to the challenges in NLG
evaluation, relating them to similar challenges faced in other areas of Natural
Language Processing, with an emphasis on different evaluation methods and the
relationships between them.Comment: Published in Journal of AI Research (JAIR), volume 61, pp 75-170. 118
pages, 8 figures, 1 tabl
A fact-aligned corpus of numerical expressions
We describe a corpus of numerical expressions, developed as part of the NUMGEN project. The corpus contains newspaper articles and scientific papers in which exactly the same numerical facts are presented many times (both within and across texts). Some annotations of numerical facts are original: for example, numbers are automatically classified as round or non-round by an algorithm derived from Jansen and Pollmann (2001); also, numerical hedges such as 'about' or 'a little under' are marked up and classified semantically using arithmetical relations. Through explicit alignment of phrases describing the same fact, the corpus can support research on the influence of various contextual factors (e.g., document position, intended readership) on the way in which numerical facts are expressed. As an example we present results from an investigation showing that when a fact is mentioned more than once in a text, there is a clear tendency for precision to increase from first to subsequent mentions, and for mathematical level either to remain constant or to increase
A Bayesian framework for verification and recalibration of ensemble forecasts: How uncertain is NAO predictability?
Predictability estimates of ensemble prediction systems are uncertain due to
limited numbers of past forecasts and observations. To account for such
uncertainty, this paper proposes a Bayesian inferential framework that provides
a simple 6-parameter representation of ensemble forecasting systems and the
corresponding observations. The framework is probabilistic, and thus allows for
quantifying uncertainty in predictability measures such as correlation skill
and signal-to-noise ratios. It also provides a natural way to produce
recalibrated probabilistic predictions from uncalibrated ensembles forecasts.
The framework is used to address important questions concerning the skill of
winter hindcasts of the North Atlantic Oscillation for 1992-2011 issued by the
Met Office GloSea5 climate prediction system. Although there is much
uncertainty in the correlation between ensemble mean and observations, there is
strong evidence of skill: the 95% credible interval of the correlation
coefficient of [0.19,0.68] does not overlap zero. There is also strong evidence
that the forecasts are not exchangeable with the observations: With over 99%
certainty, the signal-to-noise ratio of the forecasts is smaller than the
signal-to-noise ratio of the observations, which suggests that raw forecasts
should not be taken as representative scenarios of the observations. Forecast
recalibration is thus required, which can be coherently addressed within the
proposed framework.Comment: 36 pages, 10 figure
- …