Search CORE

35 research outputs found

Domain transfer for deep natural language generation from abstract meaning representations

Author: Dethlefs Nina
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 18/07/2017
Field of study

Stochastic natural language generation systems that are trained from labelled datasets are often domainspecific in their annotation and in their mapping from semantic input representations to lexical-syntactic outputs. As a result, learnt models fail to generalize across domains, heavily restricting their usability beyond single applications. In this article, we focus on the problem of domain adaptation for natural language generation. We show how linguistic knowledge from a source domain, for which labelled data is available, can be adapted to a target domain by reusing training data across domains. As a key to this, we propose to employ abstract meaning representations as a common semantic representation across domains. We model natural language generation as a long short-term memory recurrent neural network encoderdecoder, in which one recurrent neural network learns a latent representation of a semantic input, and a second recurrent neural network learns to decode it to a sequence of words. We show that the learnt representations can be transferred across domains and can be leveraged effectively to improve training on new unseen domains. Experiments in three different domains and with six datasets demonstrate that the lexical-syntactic constructions learnt in one domain can be transferred to new domains and achieve up to 75-100% of the performance of in-domain training. This is based on objective metrics such as BLEU and semantic error rate and a subjective human rating study. Training a policy from prior knowledge from a different domain is consistently better than pure in-domain training by up to 10%

Repository@Hull - Worktribe

A Domain Agnostic Approach to Verbalizing n-ary Events without Parallel Corpora

Author: Cerisara Christophe
Gardent Claire
Gyawali Bikash
Publication venue: HAL CCSD
Publication date: 01/01/2015
Field of study

International audienceWe present a method for automatically generating descriptions of biological events encoded in the KB Bio 101 Knowledge base. We evaluate our approach on a corpus of 336 event descriptions, provide a qualitative and quantitative analysis of the results obtained and discuss possible directions for further work

Crossref

INRIA a CCSD electronic archive server

A Hybrid Approach To Generating from the KBGen Knowledge-Base

Author: Gardent Claire
Gyawali Bikash
Publication venue: HAL CCSD
Publication date: 08/08/2013
Field of study

International audienceThis abstract describes a contribution to the 2013 KBGen Challenge from CNRS/LORIA and the University of Lorraine. Our contribution focuses on an attempt to automate the extraction of a Feature Based Tree Adjoining Grammar equipped with a unification based compositional semantics which can be used to generate from KBGen data

INRIA a CCSD electronic archive server

A Study of Automatic Metrics for the Evaluation of Natural Language Explanations

Author: Clinciu Miruna
Eshghi Arash
Hastie Helen
Publication venue
Publication date: 15/03/2021
Field of study

As transparency becomes key for robotics and AI, it will be necessary to evaluate the methods through which transparency is provided, including automatically generated natural language (NL) explanations. Here, we explore parallels between the generation of such explanations and the much-studied field of evaluation of Natural Language Generation (NLG). Specifically, we investigate which of the NLG evaluation measures map well to explanations. We present the ExBAN corpus: a crowd-sourced corpus of NL explanations for Bayesian Networks. We run correlations comparing human subjective ratings with NLG automatic measures. We find that embedding-based automatic NLG evaluation methods, such as BERTScore and BLEURT, have a higher correlation with human ratings, compared to word-overlap metrics, such as BLEU and ROUGE. This work has implications for Explainable AI and transparent robotic and autonomous systems.Comment: Accepted at EACL 202

arXiv.org e-Print Archive

Heriot Watt Pure

Edinburgh Research Explorer

The KBGen Challenge

Author: Banik Eva
Gardent Claire
Kow Eric
Publication venue: HAL CCSD
Publication date: 08/08/2013
Field of study

International audienceGiven a preselected set of relations extracted from the AURA knowledge base on biology, the KBGEN Task consisted in generating a sentence verbalising these relations. Three team submitted the results of their systems. The systems were compared using both automatic metrics (BLEU, NIST) and subjective ratings by 12 human users for three dimensions namely, fluency, grammaticality and meaning similarity. In this report, we summarise the KBGen Task, the evaluation methods and the results obtained

INRIA a CCSD electronic archive server

SimpleNLG-IT: adapting SimpleNLG to Italian

Author: Battaglino Cristina
Bosco Cristina
Mazzei Alessandro
Publication venue: 'Association for Computational Linguistics (ACL)'
Publication date: 01/01/2016
Field of study

Institutional Research Information System University of Turin

Mind the Labels: Describing Relations in Knowledge Graphs With Pretrained Models

Author: Dušek Ondřej
Kasner Zdeněk
Konstas Ioannis
Publication venue
Publication date: 16/10/2023
Field of study

Pretrained language models (PLMs) for data-to-text (D2T) generation can use human-readable data labels such as column headings, keys, or relation names to generalize to out-of-domain examples. However, the models are well-known in producing semantically inaccurate outputs if these labels are ambiguous or incomplete, which is often the case in D2T datasets. In this paper, we expose this issue on the task of descibing a relation between two entities. For our experiments, we collect a novel dataset for verbalizing a diverse set of 1,522 unique relations from three large-scale knowledge graphs (Wikidata, DBPedia, YAGO). We find that although PLMs for D2T generation expectedly fail on unclear cases, models trained with a large variety of relation labels are surprisingly robust in verbalizing novel, unseen relations. We argue that using data with a diverse set of clear and meaningful labels is key to training D2T generation systems capable of generalizing to novel domains.Comment: Long paper at EACL '23. Code and data: https://github.com/kasnerz/rel2tex

arXiv.org e-Print Archive

Grammars for generating isiXhosa and isiZulu weather bulletin verbs

Author: Mahlaza Zola
Publication venue: Department of Computer Science
Publication date: 01/01/2018
Field of study

The Met Office has investigated the use of natural language generation (NLG) technologies to streamline the production of weather forecasts. Their approach would be of great benefit in South Africa because there is no fast and large scale producer, automated or otherwise, of textual weather summaries for Nguni languages. This is because of, among other things, the complexity of Nguni languages. The structure of these languages is very different from Indo-European languages, and therefore we cannot reuse existing technologies that were developed for the latter group. Traditional NLG techniques such as templates are not compatible with 'Bantu' languages, and existing works that document scaled-down 'Bantu' language grammars are also not sufficient to generate weather text. In pursuance of generating weather text in isiXhosa and isiZulu - we restricted our text to only verbs in order to ensure a manageable scope. In particular, we have developed a corpus of weather sentences in order to determine verb features. We then created context free verbal grammar rules using an incremental approach. The quality of these rules was evaluated using two linguists. We then investigated the grammatical similarity of isiZulu verbs with their isiXhosa counterparts, and the extent to which a singular merged set of grammar rules can be used to produce correct verbs for both languages. The similarity analysis of the two languages was done through the developed rules' parse trees, and by applying binary similarity measures on the sets of verbs generated by the rules. The parse trees show that the differences between the verb's components are minor, and the similarity measures indicate that the verb sets are at most 59.5% similar (Driver-Kroeber metric). We also examined the importance of the phonological conditioning process by developing functions that calculate the ratio of verbs that will require conditioning out of the total strings that can be generated. We have found that the phonological conditioning process affects at least 45% of strings for isiXhosa, and at least 67% of strings for isiZulu depending on the type of verb root that is used. Overall, this work shows that the differences between isiXhosa and isiZulu verbs are minor, however, the exploitation of these similarities for the goal of creating a unified rule set for both languages cannot be achieved without significant maintainability compromises because there are dependencies that exist in one language and not the other between the verb's 'modules'. Furthermore, the phonological conditioning process should be implemented in order to improve generated text due to the high ratio of verbs it affects

Cape Town University OpenUCT