Search CORE

11 research outputs found

Integrating Discourse Markers into a Pipelined Natural Language Generation Architecture

Author: Charles B. Callaway
Publication venue
Publication date: 01/01/2003
Field of study

Pipelined Natural Language Generation (NLG) systems have grown increasingly complex as architectural modules were added to support language functionalities such as referring expressions, lexical choice, and revision. This has given rise to discussions about the relative placement of these new modules in the overall architecture

CiteSeerX

Crossref

Archivio della ricerca - Fondazione Bruno Kessler

Aggregation with Recombination Patterns

Author: Jiménez López María Dolores
Publication venue: Publicacions URV
Publication date: 28/06/2018
Field of study

In this paper, we show the commonalities between aggregation processes in Natural Language Generation and recombination patterns, a framework introduced recently as a way of generating complex sentences in natural languages using very simple recombination –and therefore biological– rules. By showing similarities between these two mechanisms, we suggest the possibility of carrying out aggregation by means of recombination patterns. We also refer to the possibility of using such a biological-motivated framework in the design of efficient and simple natural language generation devices

Revistes Publicacions URV (Universitat Rovira i Virgili)

Prosody Modelling in Concept-to-Speech Generation: Methodological Issues

Author: Grosz B.
Kathleen R. McKeown
Shimei Pan
Silverman K.
Publication venue: 'Columbia University Libraries/Information Services'
Publication date: 01/01/2000
Field of study

We explore three issues for the development of concept-to-speech (CTS) systems. We identify information available in a language-generation system that has the potential to impact prosody; investigate the role played by different corpora in CTS prosody modelling; and explore different methodologies for learning how linguistic features impact prosody. Our major focus is on the comparison of two machine learning methodologies: generalized rule induction and memory-based learning. We describe this work in the context of multimedia abstract generation of intensive care (MAGIC) data, a system that produces multimedia brings of the status of patients who have just undergone a bypass operation

CiteSeerX

Crossref

Columbia University Academic Commons

Making effective use of healthcare data using data-to-text technology

Healthcare organizations are in a continuous effort to improve health outcomes, reduce costs and enhance patient experience of care. Data is essential to measure and help achieving these improvements in healthcare delivery. Consequently, a data influx from various clinical, financial and operational sources is now overtaking healthcare organizations and their patients. The effective use of this data, however, is a major challenge. Clearly, text is an important medium to make data accessible. Financial reports are produced to assess healthcare organizations on some key performance indicators to steer their healthcare delivery. Similarly, at a clinical level, data on patient status is conveyed by means of textual descriptions to facilitate patient review, shift handover and care transitions. Likewise, patients are informed about data on their health status and treatments via text, in the form of reports or via ehealth platforms by their doctors. Unfortunately, such text is the outcome of a highly labour-intensive process if it is done by healthcare professionals. It is also prone to incompleteness, subjectivity and hard to scale up to different domains, wider audiences and varying communication purposes. Data-to-text is a recent breakthrough technology in artificial intelligence which automatically generates natural language in the form of text or speech from data. This chapter provides a survey of data-to-text technology, with a focus on how it can be deployed in a healthcare setting. It will (1) give an up-to-date synthesis of data-to-text approaches, (2) give a categorized overview of use cases in healthcare, (3) seek to make a strong case for evaluating and implementing data-to-text in a healthcare setting, and (4) highlight recent research challenges.Comment: 27 pages, 2 figures, book chapte

arXiv.org e-Print Archive

Crossref

Survey of the State of the Art in Natural Language Generation: Core tasks, applications and evaluation

Author: Gatt Albert
Krahmer Emiel
Publication venue
Publication date: 01/01/2017
Field of study

This paper surveys the current state of the art in Natural Language Generation (NLG), defined as the task of generating text or speech from non-linguistic input. A survey of NLG is timely in view of the changes that the field has undergone over the past decade or so, especially in relation to new (usually data-driven) methods, as well as new applications of NLG technology. This survey therefore aims to (a) give an up-to-date synthesis of research on the core tasks in NLG and the architectures adopted in which such tasks are organised; (b) highlight a number of relatively recent research topics that have arisen partly as a result of growing synergies between NLG and other areas of artificial intelligence; (c) draw attention to the challenges in NLG evaluation, relating them to similar challenges faced in other areas of Natural Language Processing, with an emphasis on different evaluation methods and the relationships between them.Comment: Published in Journal of AI Research (JAIR), volume 61, pp 75-170. 118 pages, 8 figures, 1 tabl

arXiv.org e-Print Archive

OAR@UM

Tilburg University Repository

Modelling aggregation motivated interactions in descriptive text generation

Author: Cheng Hua
Publication venue: The University of Edinburgh
Publication date: 01/01/2002
Field of study

Edinburgh Research Archive

Microplanning with Communicative Intentions: The SPUD System

Author: Appelt D.
Baldoni M.
Bonnie Webber
Brachman R.
Bratman M. E.
Butt M.
Candito M.
Cassell J.
Chen J.
Cheng H.
Christine Doran
Clark H. H.
Clark H. H.
Dale R.
Dang H. T.
Danlos L.
Davidson D.
Di Eugenio B.
Doran C.
Doran C.
Elhadad M.
Elhadad M.
Farinas del Cerro L.
Gardent C.
Gildea D.
Grice H. P.
Gundel J. K.
Hart P. E.
Hobbs J. R.
Hobbs J. R.
Hockenmaier J.
Hoffman B.
Horacek H.
Jackendoff R.
Jackendoff R.
Joshi A. K.
Joshi A. K.
Joshi A. K.
Kallmeyer L.
Kamp H.
Kehler A.
Kingsbury P.
Kipper K.
Kipper K.
Kittredge R.
Lascarides A.
Levelt W. J. M.
Levin B.
Lewis D.
Mackworth A.
Martha Palmer
Mathiessen C. M. I. M.
Matthew Stone
McDonald D. D.
McDonald D. D.
Mellish C.
Mellish C. S.
Meteer M. W.
Nicolov N.
Nilsson N.
Nogier J.
Palmer M.
Pereira F. C. N.
Pollack M. E.
Prevost S.
Prince E.
Rambow O.
Reiter E.
Reiter E.
Rubinoff R.
Saeboe K. J.
Sarkar A.
Shaw J.
Shieber S. M.
Shieber S. M.
Steedman M.
Stone M.
Stone M.
Stone M.
Stone M.
Stone M.
Stone M.
Stone M.
Stone M.
Talmy L.
Thomason R. H.
Thomason R. H.
Thomason R. H.
Tonia Bleam
van der Sandt R.
Wahlster W.
Wanner L.
Ward G.
Webber B. L.
Xia F.
Publication venue: 'Wiley'
Publication date: 01/01/2003
Field of study

The process of microplanning in Natural Language Generation (NLG) encompasses a range of problems in which a generator must bridge underlying domain-specific representations and general linguistic representations. These problems include constructing linguistic referring expressions to identify domain objects, selecting lexical items to express domain concepts, and using complex linguistic constructions to concisely convey related domain facts. In this paper, we argue that such problems are best solved through a uniform, comprehensive, declarative process. In our approach, the generator directly explores a search space for utterances described by a linguistic grammar. At each stage of search, the generator uses a model of interpretation, which characterizes the potential links between the utterance and the domain and context, to assess its progress in conveying domain-specific representations. We further address the challenges for implementation and knowledge representation in this approach. We show how to implement this approach effectively by using the lexicalized tree-adjoining grammar formalism (LTAG) to connect structure to meaning and using modal logic programming to connect meaning to context. We articulate a detailed methodology for designing grammatical and conceptua

CiteSeerX

Crossref

Edinburgh Research Explorer

Desarrollo y uso de un software de generación de noticias deportivas con sentimiento en la mejora de procesos en el ámbito periodístico

Author: Rey García Carlos del
Publication venue
Publication date: 01/09/2020
Field of study

La generación de texto dejó de ser del dominio exclusivo de los humanos hace años. Hoy en día, existen sistemas de generación de lenguaje natural que escriben resúmenes de documentos; código para construir aplicaciones y textos de todo tipo, incluyendo noticias. Además, cada vez un número mayor de organizaciones son conscientes de los avances que se producen en este campo y adoptan tecnología relacionada para reducir el tiempo que sus trabajadores emplean, recortar sus gastos, etc. Aquí se presenta un trabajo multidisciplinar, en el entorno de una Cátedra en la que colaboran la Universidad Carlos III de Madrid y la Corporación de Radio y Televisión Española, S. A. En él se exponen el desarrollo de una herramienta de generación de noticias deportivas capaz de redactar el texto en función de la afición a la que vaya dirigida la noticia, y la guía para la integración de la mencionada herramienta en la corporación siguiendo el Ciclo de Mejora de los Procesos de negocio.Text generation ceased to be exclusive human domain years ago. Currently, there exists Natural Language Generation (NLG) system that synthesize abstracts, generate code for building applications, and write other texts, including news. Additionally, more and more organizations are becoming aware of the progress that is being made in the field and are including NLG technology into their structures to reduce expenses and help employees save time. This document displays a multidisciplinary thesis within the fellowship involving Universidad Carlos III de Madrid and Corporación de Radio y Televisión Española S. A. The presented work depicts the development of an automatic sports news generator that can tailor the text depending on the sports fans the text is intended to. Furthermore, a guide is provided on how to incorporate such tool in RTVE following CMP, a continuous business process improvement methodology.Doble Grado en Ingeniería Informática y Administración de Empresa

Universidad Carlos III de Madrid e-Archivo

Semantic consistency in text generation

Author: Xu Xinnuo
Publication venue: Mathematical and Computer Sciences
Publication date: 01/10/2022
Field of study

Automatic input-grounded text generation tasks process input texts and generate human-understandable natural language text for the processed information. The development of neural sequence-to-sequence (seq2seq) models, which are usually trained in an end-to-end fashion, pushed the frontier of the performance on text generation tasks expeditiously. However, they are claimed to be defective in semantic consistency w.r.t. their corresponding input texts. Also, not only the models are to blame. The corpora themselves always include examples whose output is semantically inconsistent to its input. Any model that is agnostic to such data divergence issues will be prone to semantic inconsistency. Meanwhile, the most widely-used overlap-based evaluation metrics comparing the generated texts to their corresponding references do not evaluate the input-output semantic consistency explicitly, which makes this problem hard to detect. In this thesis, we focus on studying semantic consistency in three automatic text generation scenarios: Data-to-text Generation, Single Document Abstractive Summarization, and Chit-chat Dialogue Generation, by seeking for the answers to the following research questions: (1) how to define input-output semantic consistency in different text generation tasks? (2) how to quantitatively evaluate the input-output semantic consistency? (3) how to achieve better semantic consistency in individual tasks? We systematically define the semantic inconsistency phenomena in these three tasks as omission, intrinsic hallucination, and extrinsic hallucination. For Data-to-text Generation, we jointly learn a sentence planner that tightly controls which part of input source gets generated in what sequence, with a neural seq2seq text generator, to decrease all three types of semantic inconsistency in model-generated texts. The evaluation results confirm that the texts generated by our model contain much less omissions while maintaining low level of extrinsic hallucinations without sacrificing fluency compared to seq2seq models. For Single Document Abstractive Summarization, we reduce the level of extrinsic hallucinations in training data by automatically introducing assisting articles to each document-summary instance to provide the supplemental world-knowledge that is present in the summary but missing from the doc ument. With the help of a novel metric, we show that seq2seq models trained with as sisting articles demonstrate less extrinsic hallucinations than the ones trained without them. For Chit-chat Dialogue Generation, by filtering out the omitted and hallucinated examples from training set using a newly introduced evaluation metric, and encoding it into the neural seq2seq response generation models as a control factor, we diminish the level of omissions and extrinsic hallucinations in the generated dialogue responses

ROS: The Research Output Service. Heriot-Watt University Edinburgh