19 research outputs found
Creating Training Corpora for NLG Micro-Planning
International audienceIn this paper, we focus on how to create data-to-text corpora which can support the learning of wide-coverage micro-planners i.e., generation systems that handle lexicalisation, aggregation, surface re-alisation, sentence segmentation and referring expression generation. We start by reviewing common practice in designing training benchmarks for Natural Language Generation. We then present a novel framework for semi-automatically creating linguistically challenging NLG corpora from existing Knowledge Bases. We apply our framework to DBpedia data and compare the resulting dataset with (Wen et al., 2016)'s dataset. We show that while (Wen et al., 2016)'s dataset is more than twice larger than ours, it is less diverse both in terms of input and in terms of text. We thus propose our corpus generation framework as a novel method for creating challenging data sets from which NLG models can be learned which are capable of generating text from KB data
Recommended from our members
Learning meaning representations for text generation with deep generative models
This thesis explores conditioning a language generation model with auxiliary variables. By doing so, we hope to be able to better control the output of the language generator. We explore several kinds of auxiliary variables in this thesis, from unstructured continuous, to discrete, to structured discrete auxiliary variables, and evaluate their advantages and disadvantages. We consider three primary axes of variation: how interpretable the auxiliary variables are, how much control they provide over the generated text, and whether the variables can be induced from unlabelled data. The latter consideration is particularly interesting: if we can show that induced latent variables correspond to the semantics of the generated utterance, then by manipulating the variables, we have fine-grained control over the meaning of the generated utterance, thereby learning simple meaning representations for text generation.
We investigate three language generation tasks: open domain conversational response generation, sentence generation from a semantic topic, and generating surface form realisations of meaning representations. We use a different type of auxiliary variable for each task, describe the reasons for choosing that type of variable, and critically discuss how much the task benefited from an auxiliary variable decomposition. All of the models that we use combine a high-level graphical model with a neural language model text generator. The graphical model lets us specify the structure of the text generating process, while the neural text generator can learn how to generate fluent text from a large corpus of examples. We aim to show the utility of such \textit{deep generative models} of text for text generation in the following work
Gidaje: The socio-cultural morphology of Hausa living spaces
Hausa architecture is an important part of African indigenous architecture. In many respects its
construction techniques, its wail decoration and its structural forms, have been recognised as
unique. Most of the Hausa Architecture studied has been in the form of palaces, mosques and
few houses of the affluent, merchants and administrators. However the bulk of the Hausa built
environment is, and for long has been, composed of ordinary domestic houses that accommodate
the citizens of its cities and hamlets.
This work deals with Hausa architecture as found in the older parts a major Hausa urban centre; to
wit the walled city of Kano. The Kano built environment is composed of several forms of
architecture, but the main concern here is specifically with the Hausa domestic architecture in the
walled city of Kano. The study is informed by the theoretical proposition that a correlation exists
between the spatial organisation of domestic house and the social life of its inhabitants;
consequently changes in one result in changes in the other and vice-versa.
The study has four main objectives: to establish the basic characteristics of Hausa domestic
architecture, i.e. its dominant spatial themes; to show how the resulting domestic environment is
supportive of the Hausa-Islamic culture; to examine the cultural impact of colonialism on the
concept of the dwelling unit and by extension, on the culture of the Hausa; and to broaden the data
base of an indigenous knowledge system in the field of architecture.
The principal findings of the work are: that Hausa domestic architecture as found in the walled
city is conceptually of two broad types; that the design concept of these types is rooted in the
Hausa socio-cultural paradigm; that the design concept is flexible enough to cater for the subcultural
elements that are the hallmarks of any Hausa society; that the changes in the political,
economic and social fabric of the Hausa society in its recent history have had very little effect on
the spatial quality of Hausa domestic architecture
Handbook of Lexical Functional Grammar
Lexical Functional Grammar (LFG) is a nontransformational theory of
linguistic structure, first developed in the 1970s by Joan Bresnan and
Ronald M. Kaplan, which assumes that language is best described and
modeled by parallel structures representing different facets of
linguistic organization and information, related by means of
functional correspondences. This volume has five parts. Part I,
Overview and Introduction, provides an introduction to core syntactic
concepts and representations. Part II, Grammatical Phenomena, reviews
LFG work on a range of grammatical phenomena or constructions. Part
III, Grammatical modules and interfaces, provides an overview of LFG
work on semantics, argument structure, prosody, information structure,
and morphology. Part IV, Linguistic disciplines, reviews LFG work in
the disciplines of historical linguistics, learnability,
psycholinguistics, and second language learning. Part V, Formal and
computational issues and applications, provides an overview of
computational and formal properties of the theory, implementations,
and computational work on parsing, translation, grammar induction, and
treebanks. Part VI, Language families and regions, reviews LFG work
on languages spoken in particular geographical areas or in particular
language families. The final section, Comparing LFG with other
linguistic theories, discusses LFG work in relation to other
theoretical approaches
Experimental phonetic study of the timing of voicing in English obstruents
The treatment given to the timing of voicing in three areas of phonetic
research -- phonetic taxonomy, speech production modelling, and speech
synthesis -- Is considered in the light of an acoustic study of the timing of
voicing in British English obstruents. In each case, it is found to be deficient.
The underlying cause is the difficulty in applying a rigid segmental approach to
an aspect of speech production characterised by important inter-articulator
asynchronies, coupled to the limited quantitative data available concerning the
systematic properties of the timing of voicing in languages.
It is argued that the categories and labels used to describe the timing of
voicing In obstruents are Inadequate for fulfilling the descriptive goals of
phonetic theory. One possible alternative descriptive strategy is proposed,
based on incorporating aspects of the parametric organisation of speech into
the descriptive framework. Within the domain of speech production modelling,
no satisfactory account has been given of fine-grained variability of the timing
of voicing not capable of explanation in terms of general properties of motor
programming and utterance execution. The experimental results support claims
In the literature that the phonetic control of an utterance may be somewhat
less abstract than has been suggestdd in some previous reports. A schematic
outline is given, of one way in which the timing of voicing could be controlled
in speech production. The success of a speech synthesis-by-rule system
depends to a great extent on a comprehensive encoding of the systematic
phonetic characteristics of the target language. Only limited success has been
achieved in the past thirty years. A set of rules is proposed for generating
more naturalistic patterns of voicing in obstruents, reflecting those observed in
the experimental component of this study. Consideration Is given to strategies
for evaluating the effect of fine-grained phonetic rules In speech synthesis
The Semantic Prosody of Natural Phenomena in the Qur’an: A Corpus-Based Study
This thesis explores the Semantic Prosody (SP) of natural phenomena in the Qur’an and five of its prominent English translations [Pickthall (1930), Yusuf Ali (1939/ revised edition 1987), Arberry (1957), Saheeh International (1997), and Abdel Haleem (2004)]. SP, scarcely explored in Qur’anic research, is defined as ‘a form of meaning established through the proximity of a consistent series of collocates’ (Louw 2000, p.50). Theoretically, it is both an evaluative prosody (i.e., lexical items collocating with semantic word classes that are positive, negative, or neutral) and a discourse prosody (i.e., having a communicative purpose).
Given the stylistic uniqueness of the Qur’an and considering that SP can be examined empirically via corpora, the present study explores the SP of 154 words associated with nature referenced throughout the Qur’an using Corpus Linguistics techniques. Firstly, the Python-based Natural Language Toolkit was used for the following: to define nature terms via WordNet; to disambiguate their variant forms with Stemmers, and to compute their frequencies. Once frequencies were found, a quantitative analysis using Evert’s (2008) five-step statistical analysis was implemented on the 30 most frequent terms to investigate their collocations and SPs. Following this, a qualitative analysis was conducted as per the Extended Lexical Unit via concordance to analyse collocations and the Lexical-Functional Grammar to find the variation of meanings produced by lexico-grammatical patterns. Finally, the resulting datasets were aligned to evaluate their congruency with the Qur’an.
Findings of this research confirm that words referring to nature in the Qur’an do have semantic prosody. For example, astronomical bodies are primed to occur in predominantly positive collocations referring to glorifying God, while weather phenomena in negative ones refer to Day of Judgment calamities. In addition, results show that Abdel-Haleem’s translation can be considered the most congruent.
This research develops an approach to explore themes (e.g., nature) via SP analysis in texts and their translations and provides several linguistic resources that can be used for future corpus-based studies on the language of the Qur’an.
Empirical modelling of translation and interpreting
"Empirical research is carried out in a cyclic way: approaching a research area bottom-up, data lead to interpretations and ideally to the abstraction of laws, on the basis of which a theory can be derived. Deductive research is based on a theory, on the basis of which hypotheses can be formulated and tested against the background of empirical data. Looking at the state-of-the-art in translation studies, either theories as well as models are designed or empirical data are collected and interpreted. However, the final step is still lacking: so far, empirical data has not lead to the formulation of theories or models, whereas existing theories and models have not yet been comprehensively tested with empirical methods. This publication addresses these issues from several perspectives: multi-method product- as well as process-based research may gain insights into translation as well as interpreting phenomena. These phenomena may include cognitive and organizational processes, procedures and strategies, competence and performance, translation properties and universals, etc. Empirical findings about the deeper structures of translation and interpreting will reduce the gap between translation and interpreting practice and model and theory building. Furthermore, the availability of more large-scale empirical testing triggers the development of models and theories concerning translation and interpreting phenomena and behavior based on quantifiable, replicable and transparent data.
Proceedings of the Seventh Italian Conference on Computational Linguistics CLiC-it 2020
On behalf of the Program Committee, a very warm welcome to the Seventh Italian Conference on Computational Linguistics (CLiC-it 2020). This edition of the conference is held in Bologna and organised by the University of Bologna. The CLiC-it conference series is an initiative of the Italian Association for Computational Linguistics (AILC) which, after six years of activity, has clearly established itself as the premier national forum for research and development in the fields of Computational Linguistics and Natural Language Processing, where leading researchers and practitioners from academia and industry meet to share their research results, experiences, and challenges