58 research outputs found

    An Abstract Machine for Unification Grammars

    Full text link
    This work describes the design and implementation of an abstract machine, Amalia, for the linguistic formalism ALE, which is based on typed feature structures. This formalism is one of the most widely accepted in computational linguistics and has been used for designing grammars in various linguistic theories, most notably HPSG. Amalia is composed of data structures and a set of instructions, augmented by a compiler from the grammatical formalism to the abstract instructions, and a (portable) interpreter of the abstract instructions. The effect of each instruction is defined using a low-level language that can be executed on ordinary hardware. The advantages of the abstract machine approach are twofold. From a theoretical point of view, the abstract machine gives a well-defined operational semantics to the grammatical formalism. This ensures that grammars specified using our system are endowed with well defined meaning. It enables, for example, to formally verify the correctness of a compiler for HPSG, given an independent definition. From a practical point of view, Amalia is the first system that employs a direct compilation scheme for unification grammars that are based on typed feature structures. The use of amalia results in a much improved performance over existing systems. In order to test the machine on a realistic application, we have developed a small-scale, HPSG-based grammar for a fragment of the Hebrew language, using Amalia as the development platform. This is the first application of HPSG to a Semitic language.Comment: Doctoral Thesis, 96 pages, many postscript figures, uses pstricks, pst-node, psfig, fullname and a macros fil

    A Note on the Complexity of Restricted Attribute-Value Grammars

    Full text link
    The recognition problem for attribute-value grammars (AVGs) was shown to be undecidable by Johnson in 1988. Therefore, the general form of AVGs is of no practical use. In this paper we study a very restricted form of AVG, for which the recognition problem is decidable (though still NP-complete), the R-AVG. We show that the R-AVG formalism captures all of the context free languages and more, and introduce a variation on the so-called `off-line parsability constraint', the `honest parsability constraint', which lets different types of R-AVG coincide precisely with well-known time complexity classes.Comment: 18 pages, also available by (1) anonymous ftp at ftp://ftp.fwi.uva.nl/pub/theory/illc/researchReports/CT-95-02.ps.gz ; (2) WWW from http://www.fwi.uva.nl/~mtrautwe

    A Feature-Based Lexicalized Tree Adjoining Grammar for Korean

    Get PDF
    This document describes an on-going project of developing a grammar of Korean, the Korean XTAG grammar, written in the TAG formalism and implemented for use with the XTAG system enriched with a Korean morphological analyzer. The Korean XTAG grammar described in this report is based on the TAG formalism (Joshi et al. (1975)), which has been extended to include lexicalization (Schabes et al. (1988)), and unification-based feature structures (Vijay-Shanker and Joshi (1991)). The document first describes the modifications that we have made to the XTAG system (The XTAG-Group (1998)) to handle rich inflectional morphology in Korean. Then various syntactic phenomena that can be currently handled are described, including adverb modification, relative clauses, complex noun phrases, auxiliary verb constructions, gerunds and adjunct clauses. The work reported here is a first step towards the development of an implemented TAG grammar for Korean, which is continuously updated with the addition of new analyses and modification of old ones

    Baldwinian accounts of language evolution

    Get PDF
    Since Hinton & Nowlan published their seminal paper (Hinton & Nowlan 1987), the neglected evolutionary process of the Baldwin effect has been widely acknowledged. Especially in the field of language evolution, the Baldwin effect (Baldwin 1896d, Simpson 1953) has been expected to salvage the long-lasting deadlocked situation of modern linguistics: i.e., it may shed light on the relationship between environment and innateness in the formation of language.However, as intense research of this evolutionary theory goes on, certain robust difficulties have become apparent. One example is genotype-phenotype correlation. By computer simulations, both Yamauchi (1999, 2001) and Mayley (19966) show that for the Baldwin effect to work legitimately, correlation between genotypes and phenotypes is the most essential underpinning. This is due to the fact that this type of the Baldwin effect adopts as its core mechanism Waddington's (1975) "genetic assimilation". In this mechanism, phenocopies have to be genetically closer to the innately predisposed genotype. Unfortunately this is an overly naiive assumption for the theory of language evolution. As a highly complex cognitive ability, the possibility that this type of genotype-phenotype correlation exists in the domain of linguistic ability is vanishingly small.In this thesis, we develop a new type of mechanism, called "Baldwinian Niche Construction (BNC), that has a rich explanatory power and can potentially over¬ come this bewildering problem of the Baldwin effect. BNC is based on the theory of niche construction that has been developed by Odling-Smee et al. (2003). The incorporation of the theory into the Baldwin effect was first suggested by Deacon (1997) and briefly introduced by Godfrey-Smith (2003). However, its formulation is yet incomplete.In the thesis, first, we review the studies of the Baldwin effect in both biology and the study of language evolution. Then the theory of BNC is more rigorously developed. Linguistic communication has an intrinsic property that is fundamentally described in the theory of niche construction. This naturally leads us to the theoretical necessity of BNC in language evolution. By creating a new linguistic niche, learning discloses a previously hidden genetic variance on which the Baldwin 'canalizing' effect can take place. It requires no genetic modification in a given genepool. There is even no need that genes responsible for learning occupy the same loci as genes for the innate linguistic knowledge. These and other aspects of BNC are presented with some results from computer simulations

    Structure and Intonation

    Full text link

    Supervised Training on Synthetic Languages: A Novel Framework for Unsupervised Parsing

    Get PDF
    This thesis focuses on unsupervised dependency parsing—parsing sentences of a language into dependency trees without accessing the training data of that language. Different from most prior work that uses unsupervised learning to estimate the parsing parameters, we estimate the parameters by supervised training on synthetic languages. Our parsing framework has three major components: Synthetic language generation gives a rich set of training languages by mix-and-match over the real languages; surface-form feature extraction maps an unparsed corpus of a language into a fixed-length vector as the syntactic signature of that language; and, finally, language-agnostic parsing incorporates the syntactic signature during parsing so that the decision on each word token is reliant upon the general syntax of the target language. The fundamental question we are trying to answer is whether some useful information about the syntax of a language could be inferred from its surface-form evidence (unparsed corpus). This is the same question that has been implicitly asked by previous papers on unsupervised parsing, which only assumes an unparsed corpus to be available for the target language. We show that, indeed, useful features of the target language can be extracted automatically from an unparsed corpus, which consists only of gold part-of-speech (POS) sequences. Providing these features to our neural parser enables it to parse sequences like those in the corpus. Strikingly, our system has no supervision in the target language. Rather, it is a multilingual system that is trained end-to-end on a variety of other languages, so it learns a feature extractor that works well. This thesis contains several large-scale experiments requiring hundreds of thousands of CPU-hours. To our knowledge, this is the largest study of unsupervised parsing yet attempted. We show experimentally across multiple languages: (1) Features computed from the unparsed corpus improve parsing accuracy. (2) Including thousands of synthetic languages in the training yields further improvement. (3) Despite being computed from unparsed corpora, our learned task-specific features beat previous works’ interpretable typological features that require parsed corpora or expert categorization of the language
    • …
    corecore