60 research outputs found

    Wide-coverage statistical parsing with minimalist grammars

    Get PDF
    Syntactic parsing is the process of automatically assigning a structure to a string of words, and is arguably a necessary prerequisite for obtaining a detailed and precise representation of sentence meaning. For many NLP tasks, it is sufficient to use parsers based on simple context free grammars. However, for tasks in which precision on certain relatively rare but semantically crucial constructions (such as unbounded wh-movements for open domain question answering) is important, more expressive grammatical frameworks still have an important role to play. One grammatical framework which has been conspicuously absent from journals and conferences on Natural Language Processing (NLP), despite continuing to dominate much of theoretical syntax, is Minimalism, the latest incarnation of the Transformational Grammar (TG) approach to linguistic theory developed very extensively by Noam Chomsky and many others since the early 1950s. Until now, all parsers using genuine transformational movement operations have had only narrow coverage by modern standards, owing to the lack of any wide-coverage TG grammars or treebanks on which to train statistical models. The received wisdom within NLP is that TG is too complex and insufficiently formalised to be applied to realistic parsing tasks. This situation is unfortunate, as it is arguably the most extensively developed syntactic theory across the greatest number of languages, many of which are otherwise under-resourced, and yet the vast majority of its insights never find their way into NLP systems. Conversely, the process of constructing large grammar fragments can have a salutary impact on the theory itself, forcing choices between competing analyses of the same construction, and exposing incompatibilities between analyses of different constructions, along with areas of over- and undergeneration which may otherwise go unnoticed. This dissertation builds on research into computational Minimalism pioneered by Ed Stabler and others since the late 1990s to present the first ever wide-coverage Minimalist Grammar (MG) parser, along with some promising initial experimental results. A wide-coverage parser must of course be equipped with a wide-coverage grammar, and this dissertation will therefore also present the first ever wide-coverage MG, which has analyses with a high level of cross-linguistic descriptive adequacy for a great many English constructions, many of which are taken or adapted from proposals in the mainstream Minimalist literature. The grammar is very deep, in the sense that it describes many long-range dependencies which even most other expressive wide-coverage grammars ignore. At the same time, it has also been engineered to be highly constrained, with continuous computational testing being applied to minimize both under- and over-generation. Natural language is highly ambiguous, both locally and globally, and even with a very strong formal grammar, there may still be a great many possible structures for a given sentence and its substrings. The standard approach to resolving such ambiguity is to equip the parser with a probability model allowing it to disregard certain unlikely search paths, thereby increasing both its efficiency and accuracy. The most successful parsing models are those extracted in a supervised fashion from labelled data in the form of a corpus of syntactic trees, known as a treebank. Constructing such a treebank from scratch for a different formalism is extremely time-consuming and expensive, however, and so the standard approach is to map the trees in an existing treebank into trees of the target formalism. Minimalist trees are considerably more complex than those of other formalisms, however, containing many more null heads and movement operations, making this conversion process far from trivial. This dissertation will describe a method which has so far been used to convert 56% of the Penn Treebank trees into MG trees. Although still under development, the resulting MGbank corpus has already been used to train a statistical A* MG parser, described here, which has an expected asymptotic time complexity of O(n3); this is much better than even the most optimistic worst case analysis for the formalism

    Deep into Pharo

    Get PDF
    International audienceThis is a book on Pharo a programming language available at http://www.pharo.or

    Statistical Knowledge and Learning in Phonology

    Get PDF
    This thesis deals with the theory of the phonetic component of grammar in a formal probabilistic inference framework: (1) it has been recognized since the beginning of generative phonology that some language-specific phonetic implementation is actually context-dependent, and thus it can be said that there are gradient "phonetic processes" in grammar in addition to categorical "phonological processes." However, no explicit theory has been developed to characterize these processes. Meanwhile, (2) it is understood that language acquisition and perception are both really informed guesswork: the result of both types of inference can be reasonably thought to be a less-than-perfect committment, with multiple candidate grammars or parses considered and each associated with some degree of credence. Previous research has used probability theory to formalize these inferences in implemented computational models, especially in phonetics and phonology. In this role, computational models serve to demonstrate the existence of working learning/per- ception/parsing systems assuming a faithful implementation of one particular theory of human language, and are not intended to adjudicate whether that theory is correct. The current thesis (1) develops a theory of the phonetic component of grammar and how it relates to the greater phonological system and (2) uses a formal Bayesian treatment of learning to evaluate this theory of the phonological architecture and for making predictions about how the resulting grammars will be organized. The coarse description of the consequence for linguistic theory is that the processes we think of as "allophonic" are actually language-specific, gradient phonetic processes, assigned to the phonetic component of grammar; strict allophones have no representation in the output of the categorical phonological grammar

    Head-Driven Phrase Structure Grammar

    Get PDF
    Head-Driven Phrase Structure Grammar (HPSG) is a constraint-based or declarative approach to linguistic knowledge, which analyses all descriptive levels (phonology, morphology, syntax, semantics, pragmatics) with feature value pairs, structure sharing, and relational constraints. In syntax it assumes that expressions have a single relatively simple constituent structure. This volume provides a state-of-the-art introduction to the framework. Various chapters discuss basic assumptions and formal foundations, describe the evolution of the framework, and go into the details of the main syntactic phenomena. Further chapters are devoted to non-syntactic levels of description. The book also considers related fields and research areas (gesture, sign languages, computational linguistics) and includes chapters comparing HPSG with other frameworks (Lexical Functional Grammar, Categorial Grammar, Construction Grammar, Dependency Grammar, and Minimalism)

    The nanosyntax of case

    Get PDF
    This dissertation proposes a new approach to case. It unifies its syntax, morphology and semantics in a simple, fine-grained and restrictive picture. One of the assumptions frequently made in works on case is that cases such as nominative and accusative are not primitive entities, but they are each composed of various features. The central hypothesis of this dissertation is that these features are universal, and each of them is its own terminal node in the syntactic tree. Individual cases thus correspond to phrasal constituents built out of these terminals. The idea that syntactic trees are built by Merge from individual atomic features is one of the core principles of a cartographic approach to syntax pursued by M. Starke: Nanosyntax. Hence “The nanosyntax of case.” I motivate the approach on the material of case syncretism. I propose a hypothesis according to which case syncretism across various languages obeys a single restrictive template. The template corresponds to a cross-linguistically fixed sequence of cases, in which only adjacent cases show syncretism. In order to derive this, I argue that case features are syntactic heads, ordered in a universal functional sequence. If this is so, it follows that these sub-morphemic features interact with core syntactic processes, such as movement. The prediction is borne out: the interaction of (phrasal) movement and the fine-grained syntactic representation derives a typological generalization concerning cross-linguistic variation in the amount of case marking (Blake’s hierarchy). Additional facts fall out from the picture: the role of functional prepositions, prepositional syncretism, case compounding, and preposition stacking. I further investigate in detail the spell out of these highly articulate structures. I follow Starke (2005) and propose that individual morphemes spell out phrasal constituents of varying size, and that their insertion is governed by the Superset Principle. I argue that phrasal spell out is both empirically required, and theoretically beneficial: it simplifies the overall architecture of grammar. In particular, there is no part left to play for a separate morphological structure. With the proposal in place, I observe that there are generalizations which connect the proposed representation and the DP external syntax. To account for this, I adopt the Peeling theory of movement (Starke 2005). The theory says that arguments are base-generated with a number of case projections on top of them, and they strand these projections when they move up in the tree. The theory is shown to capture the initial observations, as well as additional generalizations: Burzio’s generalization among them. The resulting theory does not introduce any domain specific tools to account for case: its representation corresponds to a binary syntactic structure, its computation corresponds to syntactic movement

    Head-Driven Phrase Structure Grammar

    Get PDF
    Head-Driven Phrase Structure Grammar (HPSG) is a constraint-based or declarative approach to linguistic knowledge, which analyses all descriptive levels (phonology, morphology, syntax, semantics, pragmatics) with feature value pairs, structure sharing, and relational constraints. In syntax it assumes that expressions have a single relatively simple constituent structure. This volume provides a state-of-the-art introduction to the framework. Various chapters discuss basic assumptions and formal foundations, describe the evolution of the framework, and go into the details of the main syntactic phenomena. Further chapters are devoted to non-syntactic levels of description. The book also considers related fields and research areas (gesture, sign languages, computational linguistics) and includes chapters comparing HPSG with other frameworks (Lexical Functional Grammar, Categorial Grammar, Construction Grammar, Dependency Grammar, and Minimalism)

    User Interface Management Systems: A Survey and a Proposed Design

    Get PDF
    The growth of interactive computing has resulted in increasingly more complex styles of interaction between user and computer. To facilitate the creation of highly interactive systems, the concept of the User Interface Management System (UIMS) has been developed. Following the definition of the term 'UIMS' and a consideration of the putative advantages of the UIMS approach, a number of User Interface Management Systems are examined. This examination focuses in turn on the run-time execution system, the specification notation and the design environment, with a view to establishing the features which an "ideal" UIMS should possess. On the basis of this examination, a proposal for the design of a new UIMS is presented, and progress reported towards the implementation of a prototype based on this design

    Deep into Pharo

    Get PDF
    International audienceThis is a book on Pharo a programming language available at http://www.pharo.or
    • 

    corecore