202 research outputs found
Recommended from our members
Hyper-Document structure: maintaining discourse coherence in non-linear documents
The passage from linear text to hypertext poses the challenge of expressing discourse coherence in non-linear text, where linguistic discourse markers no longer work. While hypertext introduces new possibilities for discourse organisation, it also requires the use of new devices which can support the expression of coherence by exploiting the technical characteristics and expressive capabilities of the medium. In this paper we show how in hypertext the notion of abstract document structure encompasses animated graphics as a form of meta-language for discourse construction
Structural variation in generated health reports
We present a natural language generator that produces a range of medical reports on the clinical histories of
cancer patients, and discuss the problem of conceptual restatement in generating various textual views of the
same conceptual content. We focus on two features of our system: the demand for 'loose paraphrases' between
the various reports on a given patient, with a high degree of semantic overlap but some necessary amount of distinctive content; and the requirement for paraphrasing at primarily the discourse level
Automatic generation of large-scale paraphrases
Research on paraphrase has mostly focussed on lexical or syntactic variation within individual sentences. Our concern is with larger-scale paraphrases, from multiple sentences or paragraphs to entire documents. In this paper
we address the problem of generating paraphrases of large chunks of texts. We ground our discussion through a
worked example of extending an existing NLG system to accept as input a source text, and to generate a range of fluent semantically-equivalent alternatives, varying not only at the lexical and syntactic levels, but also in document structure and layout
Recommended from our members
Introduction to the Special Issue on Software Architecture for Language Engineering
Every building, and every computer program, has an architecture: structural and organisational principles that underpin its design and construction. The garden shed
once built by one of the authors had an ad hoc architecture, extracted (somewhat painfully) from the imagination during a slow and non-deterministic process that, luckily, resulted in a structure which keeps the rain on the outside and the mower on the inside (at least for the time being). As well as being ad hoc (i.e. not informed by analysis of similar practice or relevant science or engineering) this architecture is implicit: no explicit design was made, and no records or documentation kept of the construction process. The pyramid in the courtyard of the Louvre, by contrast, was constructed in a process involving explicit design performed by qualified engineers with a wealth of theoretical and practical knowledge of the properties of materials, the relative merits and strengths of different construction techniques, et cetera. So it is with software: sometimes it is thrown together by enthusiastic amateurs; sometimes it is architected, built to last, and intended to be 'not something you finish, but something you start' (to paraphrase Brand (1994). A number of researchers argued in the early and middle 1990s that the field of computational infrastructure or architecture for human language computation merited an increase in attention. The reasoning was that the increasingly large-scale and technologically significant nature of language processing science was placing increasing burdens of an engineering nature on research and development workers seeking robust and practical methods (as was the increasingly collaborative nature of research in this field, which puts a large premium on software integration and interoperation). Over the intervening period a number of significant systems and practices have been developed in what we may call Software Architecture for Language Engineering (SALE). This special issue represented an opportunity for practitioners in this area to report their work in a coordinated setting, and to present a snapshot of the state-ofthe-art in infrastructural work, which may indicate where further development and further take-up of these systems can be of benefit
Corpus annotation as a scientific task
Annotation studies in CL are generally unscientific: they are mostly not reproducible, make use of too few (and often non-independent) annotators and use guidelines that are often something of a moving target. Additionally, the notion of тАШexpert annotatorsтАЩ invariably means only that the annotators have linguistic training. While this can be acceptable in some special contexts, it is often far from ideal. This is particularly the case when subtle judgements are required or when, as increasingly, one is making use of corpora originating from technical texts that have been produced by, and intended to be consumed by, an audience of technical experts in the field. We outline a more rigorous approach to collecting human annotations, using as our example a study designed to capture judgements on the meaning of hedge words in medical records
Summarisation and visualisation of e-Health data repositories
At the centre of the Clinical e-Science Framework (CLEF) project is a repository of well organised,
detailed clinical histories, encoded as data that will be available for use in clinical care and in-silico
medical experiments. We describe a system that we have developed as part of the CLEF project, to perform the task of generating a diverse range of textual and graphical summaries of a patientтАЩs clinical history from a data-encoded model, a chronicle, representing the record of the patientтАЩs medical history. Although the focus of our current work is on cancer patients, the approach we
describe is generalisable to a wide range of medical areas
Intuitive querying of e-Health data repositories
At the centre of the Clinical e-Science Framework (CLEF) project is a repository of well organised, detailed clinical histories, encoded as data that will be available for use in clinical care and in-silico medical experiments. An integral part of the CLEF workbench is a tool to allow biomedical researchers and clinicians to query тАУ in an intuitive way тАУ the repository of patient data. This paper describes the CLEF query editing interface, which makes use of natural language generation techniques in order to alleviate some of the problems generally faced by natural language and graphical query interfaces. The query interface also incorporates an answer renderer that dynamically generates responses in both natural language text and graphics
Multilingual generation of controlled languages
We describe techniques based on natural language generation which allow a user to author a document in controlled language for multiple natural languages. The author is expected to be an expert in the application domain but not in the controlled language or in more than one of the supported natural languages. Because the system can produce multiple expressions of the same input in multiple languages, the author can choose among alternative expressions satisfying the constraints of the controlled language. Because the system offers only legitimate choices of wording, correction is unnecessary. Consequently, acceptance of error reports and corrections by trained authors are non-issues
The Influence of layout on the interpretation of referring expressions
From the introduction: The division of text into visual segments such as sentences, paragraphs and sections achieves many functions, such as easing navigation, achieving pragmatic effect, improving readability and
reflecting the organisation of information (Wright, 1983; Schriver 1997). In this paper, we report a small experiment that investigates the effect of different layout configurations on the interpretation of the antecedent of anaphoric referring expressions. Layout has so far played little role in Natural Language Generation (NLG) systems. The layout of output texts is generally very simple. At worst, it consists of only a single paragraph consisting of a few sentences; at best it is predetermined by schemas (Coch, 1996; Porter and Lester, 1997) or discourse plans (Milosavljevic, 1999). However, recent work by Power (2000) and Bouayad et al. (2000) has integrated graphically signalled segments (e.g., by whitespace, punctuation, font and face alternation) such as paragraphs, lists, text-sentences and text-clauses in a hierarchical tree-like representation called the document structure.2 This work was carried out within the ICONOCLAST
project (Integrating CONstraints On Layout and Style), which aims at automatically generating formatted texts in which the formatting decisions affect the wording and vice-versa.3 If document structure affects the comprehensibility of referring expressions, this must be taken into account in any attempt to generate felicitous formatted texts. This will go a step further from
current research in the automatic generation of referring expressions, where only the effect of discourse structure and grammatical function has been investigated (Dale and Reiter, 1995; Cristea et al., 1998;Walker et al., 1998; Kibble and Power, 1999)
Visualising Discourse Coherence in Non-Linear Documents
To produce coherent linear documents, Natural Language Generation systems have traditionally exploited the structuring role of textual discourse markers such as relational and referential phrases. These coherence markers of the traditional notion of text, however, do not work in non-linear documents: a new set of graphical devices is needed together with formation rules to govern their usage, supported by sound theoretical frameworks. If in linear documents graphical devices such as layout and formatting complement textual devices in the expression of discourse coherence, in non-linear documents they play a more important role. In this paper, we present our theoretical and empirical work in progress, which explores new possibilities for expressing coherence in the generation of hypertext documents
- тАж