Search CORE

8,608 research outputs found

Language choice models for microplanning and readability

Author: Williams Sandra
Publication venue: 'Association for Computational Linguistics (ACL)'
Publication date: 01/01/2003
Field of study

This paper describes the construction of language choice models for the microplanning of discourse relations in a Natural Language Generation system that attempts to generate appropriate texts for users with varying levels of literacy. The models consist of constraint satisfaction problem graphs that have been derived from the results of a corpus analysis. The corpus that the models are based on was written for good readers. We adapted the models for poor readers by allowing certain constraints to be tightened, based on psycholinguistic evidence. We describe how the design of microplanner is evolving. We discuss the compromises involved in generating more readable textual output and implications of our design for NLG architectures. Finally we describe plans for future work

CiteSeerX

Crossref

Open Research Online

Recommended from our members

Generating Feedback Reports for Adults Taking Basic Skills Tests

Author: Reiter Ehud
Williams Sandra
Publication venue
Publication date: 01/12/2005
Field of study

SkillSum is an Artificial Intelligence (AI) and Natural Language Generation (NLG) system that produces short feedback reports for people who are taking online tests which check their basic literacy and numeracy skills. In this paper, we describe the SkillSum system and application, focusing on three challenges which we believe are important ones for many systems which try to generate feedback reports from Web-based tests: choosing content based on very limited data, generating appropriate texts for people with varied levels of literacy and knowledge, and integrating the web-based system with existing assessment and support procedures

Open Research Online

Precision and mathematical form in first and subsequent mentions of numerical facts and their relation to document structure

Author: Power Richard
Williams Sandra
Publication venue: 'Association for Computational Linguistics (ACL)'
Publication date: 01/01/2009
Field of study

In a corpus study we found that authors vary both mathematical form and precision when expressing numerical quantities. Indeed, within the same document, a quantity is often described vaguely in some places and more accurately in others. Vague descriptions tend to occur early in a document and to be expressed in simpler mathematical forms (e.g., fractions or ratios), whereas more accurate descriptions of the same proportions tend to occur later, often expressed in more complex forms (e.g., decimal percentages). Our results can be used in Natural Language Generation (1) to generate repeat descriptions within the same document, and (2) to generate descriptions of numerical quantities for different audiences according to mathematical ability

CiteSeerX

Crossref

Open Research Online

Deriving content selection rules from a corpus of non-naturally occurring documents for a novel NLG application

Author: Reiter Ehud
Williams Sandra
Publication venue: Information Technology Research Institute (ITRI)
Publication date: 01/01/2005
Field of study

We describe a methodology for deriving content selection rules for NLG applications that aim to replace oral communications from human experts by written communications that are generated automatically. We argue for greater involvement of users and for a strategy for handling sparse data

CiteSeerX

Open Research Online

A corpus analysis of discourse relations for Natural Language Generation

Author: Reiter Ehud
Williams Sandra
Publication venue
Publication date: 01/01/2003
Field of study

We are developing a Natural Language Generation (NLG) system that generates texts tailored for the reading ability of individual readers. As part of building the system, GIRL (Generator for Individual Reading Levels), we carried out an analysis of the RST Discourse Treebank Corpus to find out how human writers linguistically realise discourse relations. The goal of the analysis was (a) to create a model of the choices that need to be made when realising discourse relations, and (b) to understand how these choices were typically made for “normal” readers, for a variety of discourse relations. We present our results for discourse relations: concession, condition, elaboration additional, evaluation, example, reason and restatement. We discuss the results and how they were used in GIRL

CiteSeerX

Open Research Online

Three Approaches to Generating Texts in Different Styles

Author: Reiter Ehud
Williams Sandra
Publication venue
Publication date: 01/01/2008
Field of study

Natural Language Generation (nlg) systems generate texts in English and other human languages from non-linguistic input data. Usually there are a large number of possible texts that can communicate the input data, and nlg systems must choose one of these. We argue that style can be used by nlg systems to choose between possible texts, and explore how this can be done by (1) explicit stylistic parameters, (2) imitating a genre style, and (3) imitating an individual’s style

CiteSeerX

Open Research Online

SkillSum: basic skills screening with personalised, computer-generated feedback

Author: Reiter Ehud
Williams Sandra
Publication venue
Publication date: 01/01/2008
Field of study

We report on our experiences in developing and evaluating a system that provided formative assessment of basic skills and automatically generated personalised feedback reports for 16-19 year-old users. Development of the system was informed by literacy and numeracy experts and it was trialled 'in the field' with users and basicskills tutors. We experimented with two types of assessment and with feedback that evolved from long, detailed reports with graphics to more readable, shorter ones with no graphics. We discuss the evaluation of our final solution and compare it with related systems

CiteSeerX

Open Research Online

A fact-aligned corpus of numerical expressions

Author: Power Richard
Williams Sandra
Publication venue
Publication date: 01/01/2010
Field of study

We describe a corpus of numerical expressions, developed as part of the NUMGEN project. The corpus contains newspaper articles and scientific papers in which exactly the same numerical facts are presented many times (both within and across texts). Some annotations of numerical facts are original: for example, numbers are automatically classified as round or non-round by an algorithm derived from Jansen and Pollmann (2001); also, numerical hedges such as 'about' or 'a little under' are marked up and classified semantically using arithmetical relations. Through explicit alignment of phrases describing the same fact, the corpus can support research on the influence of various contextual factors (e.g., document position, intended readership) on the way in which numerical facts are expressed. As an example we present results from an investigation showing that when a fact is mentioned more than once in a text, there is a clear tendency for precision to increase from first to subsequent mentions, and for mathematical level either to remain constant or to increase

CiteSeerX

Open Research Online

Recommended from our members

Reading errors made by skilled and unskilled readers: evaluating a system that generates reports for people with poor literacy

Author: Reiter Ehud
Williams Sandra
Publication venue
Publication date: 01/08/2004
Field of study

This study evaluates a natural language generation system that creates literacy assessment reports in order to create more readable documents. Prior research assessed comprehension and reading speed on modified documents. Here, we investigate whether individuals make less reading errors after the system modifies documents to make them more readable. Preliminary results show that poor readers make more errors than good readers. The full paper will describe readers' rates of errors on documents modified for readability

Open Research Online

The effects of discourse and local organizing practices on disabled academics identities

Author: Corlett Sandra
Williams Jannine
Publication venue
Publication date: 01/07/2011
Field of study

Northumbria University Research Portal