Search CORE

153 research outputs found

Automatic generation of textual descriptions in data-to-text systems using a fuzzy temporal ontology: Application in air quality index data series

Author: Bugarín Diz Alberto José
Cascallar Fuentes Andrea
Gallego Fernández Javier
Ramos Soto Alejandro
Saunders Anthony
Publication venue: 'Elsevier BV'
Publication date: 01/01/2022
Field of study

In this paper we present a model based on computational intelligence and natural language generation for the automatic generation of textual summaries from numerical data series, aiming to provide insights which help users to understand the relevant information hidden in the data. Our model includes a fuzzy temporal ontology with temporal references which addresses the problem of managing imprecise temporal knowledge, which is relevant in data series. We fully describe a real use case of application in the environmental information systems field, providing linguistic descriptions about the air quality index (AQI), which is a very well-known indicator provided by all meteorological agencies worldwide. We consider two different data sources of real AQI data provided by the official Galician (NW Spain) Meteorology Agency: (i) AQI distribution in the stations of the meteorological observation network and (ii) time series which describe the state and evolution of the AQI in each meteorological station. Both application models were evaluated following the current standards and good practices of manual human expert evaluation of the Natural Language Generation field. Assessment results by two experts meteorologists were very satisfactory, which empirically confirm that the proposed textual descriptions fit this type of data and service both in content and layoutThis research was funded by the Spanish Ministry for Science, Innovation and Universities (grants TIN2017-84796-C2-1-R, PID2020-112623GB-I00, and PDC2021-121072-C21) and the Galician Ministry of Education, University and Professional Training, Spain (grants ED431C2018/29 and ED431G2019/04). All grants were co-funded by the European Regional Development Fund (ERDF/FEDER program)S

Repositorio Institucional da Universidade de Santiago de Compostela

Generating readable texts for readers with low basic skills

Author: Reiter Ehud
Williams Sandra
Publication venue
Publication date: 01/01/2005
Field of study

Most NLG systems generate texts for readers with good reading ability, but SkillSum adapts its output for readers with poor literacy. Evaluation with lowskilled readers confirms that SkillSum's knowledge-based microplanning choices enhance readability. We also discuss future readability improvements

CiteSeerX

Open Research Online (The Open University)

Data-driven approaches to content selection for data-to-text generation

Author: Gkatzia Dimitra
Publication venue: Mathematical and Computer Sciences
Publication date: 01/05/2015
Field of study

Data-to-text systems are powerful in generating reports from data automatically and thus they simplify the presentation of complex data. Rather than presenting data using visualisation techniques, data-to-text systems use human language, which is the most common way for human-human communication. In addition, data-to-text systems can adapt their output content to users’ preferences, background or interests and therefore they can be pleasant for users to interact with. Content selection is an important part of every data-to-text system, because it is the module that decides which from the available information should be conveyed to the user. This thesis makes three important contributions. Firstly, it investigates data-driven approaches to content selection with respect to users’ preferences. It develops, compares and evaluates two novel content selection methods. The first method treats content selection as a Markov Decision Process (MDP), where the content selection decisions are made sequentially, i.e. given the already chosen content, decide what to talk about next. The MDP is solved using Reinforcement Learning (RL) and is optimised with respect to a cumulative reward function. The second approach considers all content selection decisions simultaneously by taking into account data relationships and treats content selection as a multi-label classification task. The evaluation shows that the users significantly prefer the output produced by the RL framework, whereas the multi-label classification approach scores significantly higher than the RL method in automatic metrics. The results also show that the end users’ preferences should be taken into account when developing Natural Language Generation (NLG) systems. NLG systems are developed with the assistance of domain experts, however the end users are normally non-experts. Consider for instance a student feedback generation system, where the system imitates the teachers. The system will produce feedback based on the lecturers’ rather than the students’ preferences although students are the end users. Therefore, the second contribution of this thesis is an approach that adapts the content to “speakers” and “hearers” simultaneously. It considers initially two types of known stakeholders; lecturers and students. It develops a novel approach that analyses the preferences of the two groups using Principal Component Regression and uses the derived knowledge to hand-craft a reward function that is then optimised using RL. The results show that the end users prefer the output generated by this system, rather than the output that is generated by a system that mimics the experts. Therefore, it is possible to model the middle ground of the preferences of different known stakeholders. In most real world applications however, first-time users are generally unknown, which is a common problem for NLG and interactive systems: the system cannot adapt to user preferences without prior knowledge. This thesis contributes a novel framework for addressing unknown stakeholders such as first time users, using Multi-objective Optimisation to minimise regret for multiple possible user types. In this framework, the content preferences of potential users are modelled as objective functions, which are simultaneously optimised using Multi-objective Optimisation. This approach outperforms two meaningful baselines and minimises regret for unknown users

ROS: The Research Output Service. Heriot-Watt University Edinburgh

A knowledge-based method for generating summaries of spatial movement in geographic areas

Author: Molina Martin
Stent Amanda
Publication venue: Facultad de Informática (UPM)
Publication date: 01/01/2010
Field of study

In this article we describe a method for automatically generating text summaries of data corresponding to traces of spatial movement in geographical areas. The method can help humans to understand large data streams, such as the amounts of GPS data recorded by a variety of sensors in mobile phones, cars, etc. We describe the knowledge representations we designed for our method and the main components of our method for generating the summaries: a discourse planner, an abstraction module and a text generator. We also present evaluation results that show the ability of our method to generate certain types of geospatial and temporal descriptions

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Archivo Digital UPM

Pragmatic enrichment in language processing and development

Author: Lewis Shevaun
Publication venue
Publication date: 01/01/2013
Field of study

The goal of language comprehension for humans is not just to decode the semantic content of sentences, but rather to grasp what speakers intend to communicate. To infer speaker meaning, listeners must at minimum assess whether and how the literal meaning of an utterance addresses a question under discussion in the conversation. In cases of implicature, where the speaker intends to communicate more than just the literal meaning, listeners must access additional relevant information in order to understand the intended contribution of the utterance. I argue that the primary challenge for inferring speaker meaning is in identifying and accessing this relevant contextual information. In this dissertation, I integrate evidence from several different types of implicature to argue that both adults and children are able to execute complex pragmatic inferences relatively efficiently, but encounter some difficulty finding what is relevant in context. I argue that the variability observed in processing costs associated with adults' computation of scalar implicatures can be better understood by examining how the critical contextual information is presented in the discourse context. I show that children's oft-cited hyper-literal interpretation style is limited to scalar quantifiers. Even 3-year-olds are adept at understanding indirect requests and "parenthetical" readings of belief reports. Their ability to infer speaker meanings is limited only by their relative inexperience in conversation and lack of world knowledge

Digital Repository at the University of Maryland

Survey of the State of the Art in Natural Language Generation: Core tasks, applications and evaluation

Author: Gatt Albert
Krahmer Emiel
Publication venue
Publication date: 01/01/2017
Field of study

This paper surveys the current state of the art in Natural Language Generation (NLG), defined as the task of generating text or speech from non-linguistic input. A survey of NLG is timely in view of the changes that the field has undergone over the past decade or so, especially in relation to new (usually data-driven) methods, as well as new applications of NLG technology. This survey therefore aims to (a) give an up-to-date synthesis of research on the core tasks in NLG and the architectures adopted in which such tasks are organised; (b) highlight a number of relatively recent research topics that have arisen partly as a result of growing synergies between NLG and other areas of artificial intelligence; (c) draw attention to the challenges in NLG evaluation, relating them to similar challenges faced in other areas of Natural Language Processing, with an emphasis on different evaluation methods and the relationships between them.Comment: Published in Journal of AI Research (JAIR), volume 61, pp 75-170. 118 pages, 8 figures, 1 tabl

arXiv.org e-Print Archive

OAR@UM

Tilburg University Repository

Automatic Generation of Student Report Cards

Author: Isard Amy
Knox Jeremy
Publication venue: 'Association for Computational Linguistics (ACL)'
Publication date: 08/09/2016
Field of study

Edinburgh Research Explorer

Recommended from our members

School-Based Data Teams Ask the Darnedest Questions About Statistics: Three Essays in the Epistemology of Statistical Consulting and Teaching

Author: Parker Sean Stanley
Publication venue: 'Harvard University Botany Libraries'
Publication date: 14/11/2014
Field of study

The essays in this thesis attempt to answer the most difficult questions that I have faced as a teacher and consultant for school-based data teams. When we report statistics to our fellow educators, what do we say and what do we leave unsaid? What do averages mean when no student is average? Why do we treat our population of students as infinite when we test for statistical significance? I treat these as important philosophical questions. In the first essay, I use Paul Grice’s philosophical analysis of conversational logic to understand how data teams can accidentally mislead with true statistics, and I use Bernard Williams’s philosophical analysis of truthfulness to understand the value, for data teams, of not misleading with statistics. In short, statistical reports can be misleading when they violate the Gricean maxims of conversation (e.g., “be relevant,” “be orderly”). I argue that, for data teams, adhering to the Gricean maxims is an intrinsic value, alongside Williams’s intrinsic values of Sincerity and Accuracy. I conclude with some recommendations for school-based data teams. In the second essay, I build on Nelson Goodman and Catherine Z. Elgin’s analyses of exemplification to argue that averages (i.e., medians and means) are attenuated, moderate, and sometimes fictive exemplars. As such, medians and means lend themselves to scientific objectivity. In the third essay, I use Goodman’s theory of counterfactuals and Carl Hempel’s theory of explanation to articulate why data teams should make statistical inferences to infinite populations that include possible but not actual students. Data teams are generally concerned that their results are explainable by random chance. Random chance, as an explanation, implies lawlike generalizations, which in turn imply counterfactual claims about possible but not actual subjects. By statistically inferring to an infinite population of students, data teams can evaluate those counterfactual claims in order to assess the plausibility of random chance as an explanation for their findings

Harvard University - DASH

Application of fuzzy sets in data-to-text system

Author: Ramos Soto Alejandro
Publication venue
Publication date: 01/01/2016
Field of study

This PhD dissertation addresses the convergence of two distinct paradigms: fuzzy sets and natural language generation. The object of study is the integration of fuzzy set-derived techniques that model imprecision and uncertainty in human language into systems that generate textual information from numeric data, commonly known as data-to-text systems. This dissertation covers an extensive state of the art review, potential convergence points, two real data-to-text applications that integrate fuzzy sets (in the meteorology and learning analytics domains), and a model that encompasses the most relevant elements in the linguistic description of data discipline and provides a framework for building and integrating fuzzy set-based approaches into natural language generation/data-to-ext systems

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Repositorio Institucional da Universidade de Santiago de Compostela

Making effective use of healthcare data using data-to-text technology

Healthcare organizations are in a continuous effort to improve health outcomes, reduce costs and enhance patient experience of care. Data is essential to measure and help achieving these improvements in healthcare delivery. Consequently, a data influx from various clinical, financial and operational sources is now overtaking healthcare organizations and their patients. The effective use of this data, however, is a major challenge. Clearly, text is an important medium to make data accessible. Financial reports are produced to assess healthcare organizations on some key performance indicators to steer their healthcare delivery. Similarly, at a clinical level, data on patient status is conveyed by means of textual descriptions to facilitate patient review, shift handover and care transitions. Likewise, patients are informed about data on their health status and treatments via text, in the form of reports or via ehealth platforms by their doctors. Unfortunately, such text is the outcome of a highly labour-intensive process if it is done by healthcare professionals. It is also prone to incompleteness, subjectivity and hard to scale up to different domains, wider audiences and varying communication purposes. Data-to-text is a recent breakthrough technology in artificial intelligence which automatically generates natural language in the form of text or speech from data. This chapter provides a survey of data-to-text technology, with a focus on how it can be deployed in a healthcare setting. It will (1) give an up-to-date synthesis of data-to-text approaches, (2) give a categorized overview of use cases in healthcare, (3) seek to make a strong case for evaluating and implementing data-to-text in a healthcare setting, and (4) highlight recent research challenges.Comment: 27 pages, 2 figures, book chapte

arXiv.org e-Print Archive

Crossref