Search CORE

19 research outputs found

Creating Training Corpora for NLG Micro-Planning

Author: Gardent Claire
Narayan Shashi
Perez-Beltrachini Laura
Shimorina Anastasia
Publication venue: 'Association for Computational Linguistics (ACL)'
Publication date: 01/01/2017
Field of study

International audienceIn this paper, we focus on how to create data-to-text corpora which can support the learning of wide-coverage micro-planners i.e., generation systems that handle lexicalisation, aggregation, surface re-alisation, sentence segmentation and referring expression generation. We start by reviewing common practice in designing training benchmarks for Natural Language Generation. We then present a novel framework for semi-automatically creating linguistically challenging NLG corpora from existing Knowledge Bases. We apply our framework to DBpedia data and compare the resulting dataset with (Wen et al., 2016)'s dataset. We show that while (Wen et al., 2016)'s dataset is more than twice larger than ours, it is less diverse both in terms of input and in terms of text. We thus propose our corpus generation framework as a novel method for creating challenging data sets from which NLG models can be learned which are capable of generating text from KB data

Crossref

INRIA a CCSD electronic archive server

Edinburgh Research Explorer

Recommended from our members

Learning meaning representations for text generation with deep generative models

Author: Cao Kris
Publication venue: University of Cambridge
Publication date: 19/09/2019
Field of study

This thesis explores conditioning a language generation model with auxiliary variables. By doing so, we hope to be able to better control the output of the language generator. We explore several kinds of auxiliary variables in this thesis, from unstructured continuous, to discrete, to structured discrete auxiliary variables, and evaluate their advantages and disadvantages. We consider three primary axes of variation: how interpretable the auxiliary variables are, how much control they provide over the generated text, and whether the variables can be induced from unlabelled data. The latter consideration is particularly interesting: if we can show that induced latent variables correspond to the semantics of the generated utterance, then by manipulating the variables, we have fine-grained control over the meaning of the generated utterance, thereby learning simple meaning representations for text generation. We investigate three language generation tasks: open domain conversational response generation, sentence generation from a semantic topic, and generating surface form realisations of meaning representations. We use a different type of auxiliary variable for each task, describe the reasons for choosing that type of variable, and critically discuss how much the task benefited from an auxiliary variable decomposition. All of the models that we use combine a high-level graphical model with a neural language model text generator. The graphical model lets us specify the structure of the text generating process, while the neural text generator can learn how to generate fluent text from a large corpus of examples. We aim to show the utility of such \textit{deep generative models} of text for text generation in the following work

Apollo (Cambridge)

Gidaje: The socio-cultural morphology of Hausa living spaces

Author: Muhammad-Oumar A.A.
Publication venue: 'Queen Mary University of London'
Publication date: 01/01/1997
Field of study

Hausa architecture is an important part of African indigenous architecture. In many respects its construction techniques, its wail decoration and its structural forms, have been recognised as unique. Most of the Hausa Architecture studied has been in the form of palaces, mosques and few houses of the affluent, merchants and administrators. However the bulk of the Hausa built environment is, and for long has been, composed of ordinary domestic houses that accommodate the citizens of its cities and hamlets. This work deals with Hausa architecture as found in the older parts a major Hausa urban centre; to wit the walled city of Kano. The Kano built environment is composed of several forms of architecture, but the main concern here is specifically with the Hausa domestic architecture in the walled city of Kano. The study is informed by the theoretical proposition that a correlation exists between the spatial organisation of domestic house and the social life of its inhabitants; consequently changes in one result in changes in the other and vice-versa. The study has four main objectives: to establish the basic characteristics of Hausa domestic architecture, i.e. its dominant spatial themes; to show how the resulting domestic environment is supportive of the Hausa-Islamic culture; to examine the cultural impact of colonialism on the concept of the dwelling unit and by extension, on the culture of the Hausa; and to broaden the data base of an indigenous knowledge system in the field of architecture. The principal findings of the work are: that Hausa domestic architecture as found in the walled city is conceptually of two broad types; that the design concept of these types is rooted in the Hausa socio-cultural paradigm; that the design concept is flexible enough to cater for the subcultural elements that are the hallmarks of any Hausa society; that the changes in the political, economic and social fabric of the Hausa society in its recent history have had very little effect on the spatial quality of Hausa domestic architecture

UCL Discovery

Handbook of Lexical Functional Grammar

Author
Publication venue
Publication date: 01/01/2023
Field of study

Lexical Functional Grammar (LFG) is a nontransformational theory of linguistic structure, first developed in the 1970s by Joan Bresnan and Ronald M. Kaplan, which assumes that language is best described and modeled by parallel structures representing different facets of linguistic organization and information, related by means of functional correspondences. This volume has five parts. Part I, Overview and Introduction, provides an introduction to core syntactic concepts and representations. Part II, Grammatical Phenomena, reviews LFG work on a range of grammatical phenomena or constructions. Part III, Grammatical modules and interfaces, provides an overview of LFG work on semantics, argument structure, prosody, information structure, and morphology. Part IV, Linguistic disciplines, reviews LFG work in the disciplines of historical linguistics, learnability, psycholinguistics, and second language learning. Part V, Formal and computational issues and applications, provides an overview of computational and formal properties of the theory, implementations, and computational work on parsing, translation, grammar induction, and treebanks. Part VI, Language families and regions, reviews LFG work on languages spoken in particular geographical areas or in particular language families. The final section, Comparing LFG with other linguistic theories, discusses LFG work in relation to other theoretical approaches

Institutional Repository of the Freie Universität Berlin

Experimental phonetic study of the timing of voicing in English obstruents

Author: Docherty Gerard James
Publication venue: University of Edinburgh
Publication date: 01/01/1989
Field of study

The treatment given to the timing of voicing in three areas of phonetic research -- phonetic taxonomy, speech production modelling, and speech synthesis -- Is considered in the light of an acoustic study of the timing of voicing in British English obstruents. In each case, it is found to be deficient. The underlying cause is the difficulty in applying a rigid segmental approach to an aspect of speech production characterised by important inter-articulator asynchronies, coupled to the limited quantitative data available concerning the systematic properties of the timing of voicing in languages. It is argued that the categories and labels used to describe the timing of voicing In obstruents are Inadequate for fulfilling the descriptive goals of phonetic theory. One possible alternative descriptive strategy is proposed, based on incorporating aspects of the parametric organisation of speech into the descriptive framework. Within the domain of speech production modelling, no satisfactory account has been given of fine-grained variability of the timing of voicing not capable of explanation in terms of general properties of motor programming and utterance execution. The experimental results support claims In the literature that the phonetic control of an utterance may be somewhat less abstract than has been suggestdd in some previous reports. A schematic outline is given, of one way in which the timing of voicing could be controlled in speech production. The success of a speech synthesis-by-rule system depends to a great extent on a comprehensive encoding of the systematic phonetic characteristics of the target language. Only limited success has been achieved in the past thirty years. A set of rules is proposed for generating more naturalistic patterns of voicing in obstruents, reflecting those observed in the experimental component of this study. Consideration Is given to strategies for evaluating the effect of fine-grained phonetic rules In speech synthesis

Edinburgh Research Archive

OpenGrey Repository

The Semantic Prosody of Natural Phenomena in the Qur’an: A Corpus-Based Study

Author: Alshahrani Hala Jamal Ali
Publication venue: University of Leeds
Publication date: 01/01/2020
Field of study

This thesis explores the Semantic Prosody (SP) of natural phenomena in the Qur’an and five of its prominent English translations [Pickthall (1930), Yusuf Ali (1939/ revised edition 1987), Arberry (1957), Saheeh International (1997), and Abdel Haleem (2004)]. SP, scarcely explored in Qur’anic research, is defined as ‘a form of meaning established through the proximity of a consistent series of collocates’ (Louw 2000, p.50). Theoretically, it is both an evaluative prosody (i.e., lexical items collocating with semantic word classes that are positive, negative, or neutral) and a discourse prosody (i.e., having a communicative purpose). Given the stylistic uniqueness of the Qur’an and considering that SP can be examined empirically via corpora, the present study explores the SP of 154 words associated with nature referenced throughout the Qur’an using Corpus Linguistics techniques. Firstly, the Python-based Natural Language Toolkit was used for the following: to define nature terms via WordNet; to disambiguate their variant forms with Stemmers, and to compute their frequencies. Once frequencies were found, a quantitative analysis using Evert’s (2008) five-step statistical analysis was implemented on the 30 most frequent terms to investigate their collocations and SPs. Following this, a qualitative analysis was conducted as per the Extended Lexical Unit via concordance to analyse collocations and the Lexical-Functional Grammar to find the variation of meanings produced by lexico-grammatical patterns. Finally, the resulting datasets were aligned to evaluate their congruency with the Qur’an. Findings of this research confirm that words referring to nature in the Qur’an do have semantic prosody. For example, astronomical bodies are primed to occur in predominantly positive collocations referring to glorifying God, while weather phenomena in negative ones refer to Day of Judgment calamities. In addition, results show that Abdel-Haleem’s translation can be considered the most congruent. This research develops an approach to explore themes (e.g., nature) via SP analysis in texts and their translations and provides several linguistic resources that can be used for future corpus-based studies on the language of the Qur’an.

White Rose E-theses Online

Empirical modelling of translation and interpreting

Author
Publication venue
Publication date
Field of study

"Empirical research is carried out in a cyclic way: approaching a research area bottom-up, data lead to interpretations and ideally to the abstraction of laws, on the basis of which a theory can be derived. Deductive research is based on a theory, on the basis of which hypotheses can be formulated and tested against the background of empirical data. Looking at the state-of-the-art in translation studies, either theories as well as models are designed or empirical data are collected and interpreted. However, the final step is still lacking: so far, empirical data has not lead to the formulation of theories or models, whereas existing theories and models have not yet been comprehensively tested with empirical methods. This publication addresses these issues from several perspectives: multi-method product- as well as process-based research may gain insights into translation as well as interpreting phenomena. These phenomena may include cognitive and organizational processes, procedures and strategies, competence and performance, translation properties and universals, etc. Empirical findings about the deeper structures of translation and interpreting will reduce the gap between translation and interpreting practice and model and theory building. Furthermore, the availability of more large-scale empirical testing triggers the development of models and theories concerning translation and interpreting phenomena and behavior based on quantifiable, replicable and transparent data.

OAPEN Library

Proceedings of the Seventh Italian Conference on Computational Linguistics CLiC-it 2020

Author
Publication venue: 'OpenEdition'
Publication date: 01/07/2022
Field of study

On behalf of the Program Committee, a very warm welcome to the Seventh Italian Conference on Computational Linguistics (CLiC-it 2020). This edition of the conference is held in Bologna and organised by the University of Bologna. The CLiC-it conference series is an initiative of the Italian Association for Computational Linguistics (AILC) which, after six years of activity, has clearly established itself as the premier national forum for research and development in the fields of Computational Linguistics and Natural Language Processing, where leading researchers and practitioners from academia and industry meet to share their research results, experiences, and challenges

Directory of Open Access Books (DOAB)

Dynamic Analysis of Automatic Emotion Recognition Using Generalized Additive Mixed Models

Author: Bolster Andrew
Booth Adam
Dupre Damien
McKeown Gary
Morrison Gawain
Publication venue
Publication date: 18/04/2017
Field of study

Queen's University Belfast Research Portal

Tagungsband der 12. Tagung Phonetik und Phonologie im deutschsprachigen Raum

Author
Publication venue
Publication date: 01/10/2016
Field of study