Search CORE

59 research outputs found

An Ensemble method for Content Selection for Data-to-text Systems

Author: Gkatzia Dimitra
Hastie Helen
Publication venue
Publication date: 01/01/2015
Field of study

We present a novel approach for automatic report generation from time-series data, in the context of student feedback generation. Our proposed methodology treats content selection as a multi-label classification (MLC) problem, which takes as input time-series data (students' learning data) and outputs a summary of these data (feedback). Unlike previous work, this method considers all data simultaneously using ensembles of classifiers, and therefore, it achieves higher accuracy and F- score compared to meaningful baselines.Comment: 3 pages, 2 figures, 1st International Workshop on Data-to-text Generatio

arXiv.org e-Print Archive

Heriot Watt Pure

Natural Language Generation enhances human decision-making with uncertain information

Author: Gkatzia Dimitra
Lemon Oliver
Rieser Verena
Publication venue
Publication date: 01/01/2016
Field of study

Decision-making is often dependent on uncertain data, e.g. data associated with confidence scores or probabilities. We present a comparison of different information presentations for uncertain data and, for the first time, measure their effects on human decision-making. We show that the use of Natural Language Generation (NLG) improves decision-making under uncertainty, compared to state-of-the-art graphical-based representation methods. In a task-based study with 442 adults, we found that presentations using NLG lead to 24% better decision-making on average than the graphical presentations, and to 44% better decision-making when NLG is combined with graphics. We also show that women achieve significantly better results when presented with NLG output (an 87% increase on average compared to graphical presentations).Comment: 54th annual meeting of the Association for Computational Linguistics (ACL), Berlin 201

arXiv.org e-Print Archive

Heriot Watt Pure

Data-driven approaches to content selection for data-to-text generation

Author: Gkatzia Dimitra
Publication venue: Mathematical and Computer Sciences
Publication date: 01/05/2015
Field of study

Data-to-text systems are powerful in generating reports from data automatically and thus they simplify the presentation of complex data. Rather than presenting data using visualisation techniques, data-to-text systems use human language, which is the most common way for human-human communication. In addition, data-to-text systems can adapt their output content to users’ preferences, background or interests and therefore they can be pleasant for users to interact with. Content selection is an important part of every data-to-text system, because it is the module that decides which from the available information should be conveyed to the user. This thesis makes three important contributions. Firstly, it investigates data-driven approaches to content selection with respect to users’ preferences. It develops, compares and evaluates two novel content selection methods. The first method treats content selection as a Markov Decision Process (MDP), where the content selection decisions are made sequentially, i.e. given the already chosen content, decide what to talk about next. The MDP is solved using Reinforcement Learning (RL) and is optimised with respect to a cumulative reward function. The second approach considers all content selection decisions simultaneously by taking into account data relationships and treats content selection as a multi-label classification task. The evaluation shows that the users significantly prefer the output produced by the RL framework, whereas the multi-label classification approach scores significantly higher than the RL method in automatic metrics. The results also show that the end users’ preferences should be taken into account when developing Natural Language Generation (NLG) systems. NLG systems are developed with the assistance of domain experts, however the end users are normally non-experts. Consider for instance a student feedback generation system, where the system imitates the teachers. The system will produce feedback based on the lecturers’ rather than the students’ preferences although students are the end users. Therefore, the second contribution of this thesis is an approach that adapts the content to “speakers” and “hearers” simultaneously. It considers initially two types of known stakeholders; lecturers and students. It develops a novel approach that analyses the preferences of the two groups using Principal Component Regression and uses the derived knowledge to hand-craft a reward function that is then optimised using RL. The results show that the end users prefer the output generated by this system, rather than the output that is generated by a system that mimics the experts. Therefore, it is possible to model the middle ground of the preferences of different known stakeholders. In most real world applications however, first-time users are generally unknown, which is a common problem for NLG and interactive systems: the system cannot adapt to user preferences without prior knowledge. This thesis contributes a novel framework for addressing unknown stakeholders such as first time users, using Multi-objective Optimisation to minimise regret for multiple possible user types. In this framework, the content preferences of potential users are modelled as objective functions, which are simultaneously optimised using Multi-objective Optimisation. This approach outperforms two meaningful baselines and minimises regret for unknown users

ROS: The Research Output Service. Heriot-Watt University Edinburgh

The REAL corpus: A crowd-sourced Corpus of human generated and evaluated spatial references to real-world urban scenes

Author: Bartie Phil
Gkatzia Dimitra
Mackaness William
Rieser Verena
Publication venue: 'Museum National d''Histoire Naturelle, Paris, France'
Publication date: 01/01/2016
Field of study

We present a newly crowd-sourced data set of natural language references to objects anchored in complex urban scenes (In short: The REAL Corpus – Referring Expressions Anchored Language). The REAL corpus contains a collection of images of real-world urban scenes together with verbal descriptions of target objects generated by humans, paired with data on how successful other people were able to identify the same object based on these descriptions. In total, the corpus contains 32 images with on average 27 descriptions per image and 3 verifications for each description. In addition, the corpus is annotated with a variety of linguistically motivated features. The paper highlights issues posed by collecting data using crowd-sourcing with an unrestricted input format, as well as using real-world urban scenes. The corpus will be released via the ELRA repository as part of this submission

Heriot Watt Pure

Stirling Online Research Repository (RIOXX)

Stirling Online Research Repository

Building a dual dataset of text- and image-grounded conversations and summarisation in Gàidhlig (Scottish Gaelic)

Author: Gkatzia Dimitra
Groundwater Anna
Howcroft David M.
Lamb Will
Publication venue
Publication date: 30/09/2023
Field of study

Edinburgh Research Explorer

Generating Unambiguous and Diverse Referring Expressions  

Author: Gkatzia Dimitra
Hart Emma
Panagiaris Nikolaos
Publication venue: Elsevier
Publication date: 01/07/2021
Field of study

Neural Referring Expression Generation (REG) models have shown promising results in generating expressions which uniquely describe visual objects. However, current REG models still lack the ability to produce diverse and unambiguous referring expressions (REs). To address the lack of diversity, we propose generating a set of diverse REs, rather than one-shot REs. To reduce the ambiguity of referring expressions, we directly optimise non-differentiable test metrics using reinforcement learning (RL), and we show that our approaches achieve better results under multiple different settings. Specifically, we initially present a novel RL approach to REG training, which instead of drawing one sample per input, it averages over multiple samples to normalize the reward during RL training. Secondly, we present an innovative REG model that utilizes an object attention mechanism that explicitly incorporates information about the target object and is optimised using our proposed RL approach. Thirdly, we propose a novel transformer model optimised with RL that exploits different levels of visual information. Our human evaluation demonstrates the effectiveness of this model, where we improve the state-of-the-art results in RefCOCO testA and testB in terms of task success from to and from to respectively. While in RefCOCO+ testA we show improvements from to . Finally, we present a thorough comparison of diverse decoding strategies (sampling and maximisation-based) and how they control the trade-off between the quality and diversity

Repository@Napier

Data-to-Text Generation Improves Decision-Making Under Uncertainty

Author: Gkatzia Dimitra
Lemon Oliver
Rieser Verena
Publication venue: Institute of Electrical and Electronics Engineers
Publication date: 01/08/2017
Field of study

Decision-making is often dependent on uncertain data, e.g. data associated with confidence scores or probabilities. This article presents a comparison of different information presentations for uncertain data and, for the first time, measures their effects on human decision-making, in the domain of weather forecast generation. We use a game-based setup to evaluate the different systems. We show that the use of Natural Language Generation (NLG) enhances decision-making under uncertainty, compared to state-of-the-art graphical-based representation methods.In a task-based study with 442 adults, we found that presentations using NLG led to 24% better decision-making on average than the graphical presentations, and to 44% better decision-making when NLG is combined with graphics. We also show that women achieve significantly better results when presented with NLG output (an 87% increase on average compared to graphical presentations). Finally, we present a further analysis of demographic data and its impact on decision-making, and we discuss implications for future NLG systems

Heriot Watt Pure

Repository@Napier

Proceedings of the Workshop on NLG for Human–Robot Interaction

Author: Buschmeier Hendrik
Dimitra Gkatzia
Foster Mary Ellen
Publication venue: Association for Computational Linguistics
Publication date: 01/01/2018
Field of study

Foster ME, Buschmeier H, Dimitra G, eds. Proceedings of the Workshop on NLG for Human–Robot Interaction. Stroudsburg, PA, USA: Association for Computational Linguistics; 2018

Publications at Bielefeld University