27 research outputs found

    Survey of the State of the Art in Natural Language Generation: Core tasks, applications and evaluation

    Get PDF
    This paper surveys the current state of the art in Natural Language Generation (NLG), defined as the task of generating text or speech from non-linguistic input. A survey of NLG is timely in view of the changes that the field has undergone over the past decade or so, especially in relation to new (usually data-driven) methods, as well as new applications of NLG technology. This survey therefore aims to (a) give an up-to-date synthesis of research on the core tasks in NLG and the architectures adopted in which such tasks are organised; (b) highlight a number of relatively recent research topics that have arisen partly as a result of growing synergies between NLG and other areas of artificial intelligence; (c) draw attention to the challenges in NLG evaluation, relating them to similar challenges faced in other areas of Natural Language Processing, with an emphasis on different evaluation methods and the relationships between them.Comment: Published in Journal of AI Research (JAIR), volume 61, pp 75-170. 118 pages, 8 figures, 1 tabl

    Probabilistic Modelling of Morphologically Rich Languages

    Full text link
    This thesis investigates how the sub-structure of words can be accounted for in probabilistic models of language. Such models play an important role in natural language processing tasks such as translation or speech recognition, but often rely on the simplistic assumption that words are opaque symbols. This assumption does not fit morphologically complex language well, where words can have rich internal structure and sub-word elements are shared across distinct word forms. Our approach is to encode basic notions of morphology into the assumptions of three different types of language models, with the intention that leveraging shared sub-word structure can improve model performance and help overcome data sparsity that arises from morphological processes. In the context of n-gram language modelling, we formulate a new Bayesian model that relies on the decomposition of compound words to attain better smoothing, and we develop a new distributed language model that learns vector representations of morphemes and leverages them to link together morphologically related words. In both cases, we show that accounting for word sub-structure improves the models' intrinsic performance and provides benefits when applied to other tasks, including machine translation. We then shift the focus beyond the modelling of word sequences and consider models that automatically learn what the sub-word elements of a given language are, given an unannotated list of words. We formulate a novel model that can learn discontiguous morphemes in addition to the more conventional contiguous morphemes that most previous models are limited to. This approach is demonstrated on Semitic languages, and we find that modelling discontiguous sub-word structures leads to improvements in the task of segmenting words into their contiguous morphemes.Comment: DPhil thesis, University of Oxford, submitted and accepted 2014. http://ora.ox.ac.uk/objects/uuid:8df7324f-d3b8-47a1-8b0b-3a6feb5f45c

    Towards Artificial Language Learning in a Potts Attractor Network

    Get PDF
    It remains a mystery how children acquire natural languages; languages far beyond the few symbols that a young chimp struggles to learn, and with complex rules that incomparably surpass the repetitive structure of bird songs. How should one explain the emergence of such a capacity from the basic elements of the nervous system, namely neuronal networks? To understand the brain mechanisms underlying the language phenomenon, specifically sentence construction, different approaches have been attempted to implement an artificial neural network that encodes words and constructs sentences (see e.g. (Hummel, J.E. and Holyoak, 1997; Huyck, 2009; Velde and de Kamps, 2006; Stewart and Eliasmith, 2009)). These attempts differ on how the sentence constituents (parts) are represented\u2014either individually and locally, or in a distributed fashion\u2014and on how these constituents are bound together. In LISA (Hummel, J.E. and Holyoak, 1997), each sentence constituent (either a word, a phrase, or even a proposition) is represented individually by a unit\u2014intended to be a population of neurons (Hummel and Holyoak, 2003)\u2014and relevant constituents synchronously get activated in the construction of a sentence (or the inference of a proposition). Considering the productivity of the language\u2014the ability of humans to create many possible sentences out of a limited vocabulary\u2014this representation results in an exponential growth in the number of units needed for structure representation. In order to avoid this problem, Neural Blackboard Architectures (Velde and de Kamps, 2006) were proposed as systems endowed with dynamic bindings between assemblies of words, roles (e.g. theme or agent), and word categories (e.g. nouns or verbs). A neural blackboard architecture resembles a switchboard (a blackboard) that wires sentence constituents together via circuits, using highly complex and meticulously (unrealistic) organized connections. As opposed to localized approaches, in a Vector Symbolic Architecture (Gayler, 2003; Plate, 1991), words are represented in a fully distributed fashion on a vector. The words are bound (and merged) together by algebraic operations\u2014e.g. tensor products (Smolensky, 1990) or circular convolution (Plate, 1991)\u2014in the vector space. In order to give a biological account, some steps have been attempted towards the neural implementation of such operations (Stewart and Eliasmith, 2009). Another distributed approach was toward implementing a simple recurrent neural network that predicts the next word in a sentence (Elman, 1991). Apart from the limited language size that the network could deal with (Elman, 1993), this system lacked an explicit representation of syntactic constituents, thus resulting in a lack of grammatical knowledge in the network (Borensztajn, 2011; Velde and de Kamps, 2006). However, despite all these attempts, there remains the lack of a neural model that addresses the challenges of language size, semantic and syntactic distinction, word binding, and word implementation in a neurally plausible manner. We are exploring a novel approach to address these challenges, that involves first constructing an artificial language of intermediate complexity and then implementing a neural network, as a simplified cortical model of sentence production, which stores the vocabulary and the grammar of the artificial language in a neurally inspired manner on two components: one semantic and one syntactic. As the training language of the network, we have constructed BLISS (Pirmoradian and Treves, 2011), a scaled-down synthetic language of intermediate complexity, with about 150 words, 40 production rules, and a definition of semantics that is reduced to statistical dependence between words. In Chapter 2, we will explain the details of the implementation of BLISS. As a sentence production model, we have implemented a Potts attractor neural network, whose units hypothetically represent patches of cortex. The choice of the Potts network, for sentence production, has been mainly motivated by the latching dynamics it exhibits (Kropff and Treves, 2006); that is, an ability to spontaneously hop, or latch, across memory patterns, which have been stored as dynamical attractors, thus producing a long or even infinite sequence of patterns, at least in some regimes (Russo and Treves, 2012). The goal is to train the Potts network with a corpus of sentences in BLISS. This involves setting first the structure of the network, then the generating algorithm for word representations, and finally the protocol to train the network with the specific transitions present in the BLISS corpus, using both auto- and hetero-associative learning rules. In Chapter 3, we will explain the details of the procedure we have adapted for word representation in the network. The last step involves utilizing the spontaneous latching dynamics exhibited by the Potts network, the word representation we have developed, and crucially hetero-associative weights favouring specific transitions, to generate, with a suitable associative training procedure, sentences \u201duttered\u201d by the network. This last stage of spontaneous sentence production by the network has been explained in Chapter 4

    Grounding language in events

    Get PDF
    Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2008.Includes bibliographical references (p. 137-142).Broadcast video and virtual environments are just two of the growing number of domains in which language is embedded in multiple modalities of rich non-linguistic information. Applications for such multimodal domains are often based on traditional natural language processing techniques that ignore the connection between words and the non-linguistic context in which they are used. This thesis describes a methodology for representing these connections in models which ground the meaning of words in representations of events. Incorporating these grounded language models with text-based techniques significantly improves the performance of three multimodal applications: natural language understanding in videogames, sports video search and automatic speech recognition. Two approaches to representing the structure of events are presented and used to model the meaning of words. In the domain of virtual game worlds, a hand-designed hierarchical behavior grammar is used to explicitly represent all the various actions that an agent can take in a virtual world. This grammar is used to interpret events by parsing sequences of observed actions in order to generate hierarchical event structures. In the noisier and more open -ended domain of broadcast sports video, hierarchical temporal patterns are automatically mined from large corpora of unlabeled video data. The structure of events in video is represented by vectors of these hierarchical patterns.(cont.) Grounded language models are encoded using Hierarchical Bayesian models to represent the probability of words given elements of these event structures. These grounded language models are used to incorporate non-linguistic information into text-based approaches to multimodal applications. In the virtual game domain, this non-linguistic information improves natural language understanding for a virtual agent by nearly 10% and cuts in half the negative effects of noise caused by automatic speech recognition. For broadcast video of baseball and American football, video search systems that incorporate grounded language models are shown to perform up to 33% better than text-based systems. Further, systems for recognizing speech in baseball video that use grounded language models show 25% greater word accuracy than traditional systems.by Michael Ben Fleischman.Ph.D

    New resources and ideas for semantic parser induction

    Get PDF
    In this thesis, we investigate the general topic of computational natural language understanding (NLU), which has as its goal the development of algorithms and other computational methods that support reasoning about natural language by the computer. Under the classical approach, NLU models work similar to computer compilers (Aho et al., 1986), and include as a central component a semantic parser that translates natural language input (i.e., the compiler’s high-level language) to lower-level formal languages that facilitate program execution and exact reasoning. Given the difficulty of building natural language compilers by hand, recent work has centered around semantic parser induction, or on using machine learning to learn semantic parsers and semantic representations from parallel data consisting of example text-meaning pairs (Mooney, 2007a). One inherent difficulty in this data-driven approach is finding the parallel data needed to train the target semantic parsing models, given that such data does not occur naturally “in the wild” (Halevy et al., 2009). Even when data is available, the amount of domain- and language-specific data and the nature of the available annotations might be insufficient for robust machine learning and capturing the full range of NLU phenomena. Given these underlying resource issues, the semantic parsing field is in constant need of new resources and datasets, as well as novel learning techniques and task evaluations that make models more robust and adaptable to the many applications that require reliable semantic parsing. To address the main resource problem involving finding parallel data, we investigate the idea of using source code libraries, or collections of code and text documentation, as a parallel corpus for semantic parser development and introduce 45 new datasets in this domain and a new and challenging text-to-code translation task. As a way of addressing the lack of domain- and language-specific parallel data, we then use these and other benchmark datasets to investigate training se- mantic parsers on multiple datasets, which helps semantic parsers to generalize across different domains and languages and solve new tasks such as polyglot decoding and zero-shot translation (i.e., translating over and between multiple natural and formal languages and unobserved language pairs). Finally, to address the issue of insufficient annotations, we introduce a new learning framework called learning from entailment that uses entailment information (i.e., high-level inferences about whether the meaning of one sentence follows from another) as a weak learning signal to train semantic parsers to reason about the holes in their analysis and learn improved semantic representations. Taken together, this thesis contributes a wide range of new techniques and technical solutions to help build semantic parsing models with minimal amounts of training supervision and manual engineering effort, hence avoiding the resource issues described at the onset. We also introduce a diverse set of new NLU tasks for evaluating semantic parsing models, which we believe help to extend the scope and real world applicability of semantic parsing and computational NLU

    Domain knowledge acquisition via language grounding

    Get PDF
    Thesis (S.M.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2013.Cataloged from PDF version of thesis.Includes bibliographical references (p. 54-60).This thesis addresses the language grounding problem at the level of word relation extraction. We propose methods to acquire knowledge represented in the form of relations and utilize them in two domain applications, high-level planning in a complex virtual world and input parser generation from input format specifications. In the first application, we propose a reinforcement learning framework to jointly learn to predict precondition relations from text and to perform high-level planning guided by those relations. When applied to a complex virtual world and text describing that world, our relation extraction technique performs on par with a supervised baseline, and we show that a high-level planner utilizing these extracted relations significantly outperforms a strong, text unaware baseline. In the second application, we use a sampling framework to predict relation trees and to generate input parser code from those trees. Our results show that our approach outperforms a state-of-the-art semantic parser on a dataset of input format specifications from the ACM International Collegiate Programming Contest, which were written in English for humans with no intention of providing support for automated processing.by Tao Lei.S.M
    corecore