388 research outputs found
Simple K-star Categorial Dependency Grammars and their Inference
International audienceWe propose a novel subclass in the family of Categorial Dependency Grammars (CDG), based on a syntactic criterion on categorial types associated to words in the lexicon and study its learnability. This proposal relies on a linguistic principle and relates to a former non-constructive condition on iterated dependencies. We show that the projective CDG in this subclass are incrementally learnable in the limit from dependency structures. In contrast to previous proposals, our criterion is both syntactic and does not impose a (rigidity) bound on the number of categorial types associated to a word
Parsing Strategies With \u27Lexicalized\u27 Grammars: Application to Tree Adjoining Grammars
In this paper, we present a parsing strategy that arose from the development of an Earley-type parsing algorithm for TAGs (Schabes and Joshi 1988) and from some recent linguistic work in TAGs (Abeillé: 1988a).
In our approach, each elementary structure is systematically associated with a lexical head. These structures specify extended domains of locality (as compared to a context-free grammar) over which constraints can be stated. These constraints either hold within the elementary structure itself or specify what other structures can be composed with a given elementary structure. The \u27grammar\u27 consists of a lexicon where each lexical item is associated with a finite number of structures for which that item is the head. There are no separate grammar rules. There are, of course, \u27rules\u27 which tell us how these structures are composed. A grammar of this form will be said to be \u27lexicalized\u27.
We show that in general context-free grammars cannot be \u27lexicalized\u27. We then show how a \u27lexicalized\u27 grammar naturally follows from the extended domain of locality of TAGs and examine briefly some of the linguistic implications of our approach.
A general parsing strategy for \u27lexicalized\u27 grammars is discussed. In the first stage, the parser selects a set of elementary structures associated with the lexical items in the input sentence, and in the second stage the sentence is parsed with respect to this set. The strategy is independent of nature of the elementary structures in the underlying grammar. However, we focus our attention on TAGs. Since the set of trees selected at the end of the first stage is not infinite, the parser can use in principle any search strategy. Thus, in particular, a top-down strategy can be used since problems due to recursive structures are eliminated.
We then explain how the Earley-type parser for TAGs can be modified to take advantage of this approach
Structure Unification Grammar: A Unifying Framework for Investigating Natural Language
This thesis presents Structure Unification Grammar and demonstrates its suitability as a framework for investigating natural language from a variety of perspectives. Structure Unification Grammar is a linguistic formalism which represents grammatical information as partial descriptions of phrase structure trees, and combines these descriptions by equating their phrase structure tree nodes. This process can be depicted by taking a set of transparencies which each contain a picture of a tree fragment, and overlaying them so they form a picture of a complete phrase structure tree. The nodes which overlap in the resulting picture are those which are equated. The flexibility with which information can be specified in the descriptions of trees and the generality of the combination operation allows a grammar writer or parser to specify exactly what is known where it is known. The specification of grammatical constraints is not restricted to any particular structural or informational domains. This property provides for a very perspicuous representation of grammatical information, and for the representations necessary for incremental parsing.
The perspicuity of SUG\u27s representation is complemented by its high formal power. The formal power of SUG allows other linguistic formalisms to be expressed in it. By themselves these translations are not terribly interesting, but the perspicuity of SUG\u27s representation often allows the central insights of the other investigations to be expressed perspicuously in SUG. Through this process it is possible to unify the insights from a diverse collection of investigations within a single framework, thus furthering our understanding of natural language as a whole. This thesis gives several examples of how insights from investigations into natural language can be captured in SUG. Since these investigations come from a variety of perspectives on natural language, these examples demonstrate that SUG can be used as a unifying framework for investigating natural language
Recommended from our members
Inducing grammars from linguistic universals and realistic amounts of supervision
The best performing NLP models to date are learned from large volumes of manually-annotated data. For tasks like part-of-speech tagging and grammatical parsing, high performance can be achieved with plentiful supervised data. However, such resources are extremely costly to produce, making them an unlikely option for building NLP tools in under-resourced languages or domains. This dissertation is concerned with reducing the annotation required to learn NLP models, with the goal of opening up the range of domains and languages to which NLP technologies may be applied. In this work, we explore the possibility of learning from a degree of supervision that is at or close to the amount that could reasonably be collected from annotators for a particular domain or language that currently has none. We show that just a small amount of annotation input — even that which can be collected in just a few hours — can provide enormous advantages if we have learning algorithms that can appropriately exploit it. This work presents new algorithms, models, and approaches designed to learn grammatical information from weak supervision. In particular, we look at ways of intersecting a variety of different forms of supervision in complementary ways, thus lowering the overall annotation burden. Sources of information include tag dictionaries, morphological analyzers, constituent bracketings, and partial tree annotations, as well as unannotated corpora. For example, we present algorithms that are able to combine faster-to-obtain type-level annotation with unannotated text to remove the need for slower-to-obtain token-level annotation. Much of this dissertation describes work on Combinatory Categorial Grammar (CCG), a grammatical formalism notable for its use of structured, logic-backed categories that describe how each word and constituent fits into the overall syntax of the sentence. This work shows how linguistic universals intrinsic to the CCG formalism itself can be encoded as Bayesian priors to improve learning.Computer Science
Category-Theoretic Quantitative Compositional Distributional Models of Natural Language Semantics
This thesis is about the problem of compositionality in distributional
semantics. Distributional semantics presupposes that the meanings of words are
a function of their occurrences in textual contexts. It models words as
distributions over these contexts and represents them as vectors in high
dimensional spaces. The problem of compositionality for such models concerns
itself with how to produce representations for larger units of text by
composing the representations of smaller units of text.
This thesis focuses on a particular approach to this compositionality
problem, namely using the categorical framework developed by Coecke, Sadrzadeh,
and Clark, which combines syntactic analysis formalisms with distributional
semantic representations of meaning to produce syntactically motivated
composition operations. This thesis shows how this approach can be
theoretically extended and practically implemented to produce concrete
compositional distributional models of natural language semantics. It
furthermore demonstrates that such models can perform on par with, or better
than, other competing approaches in the field of natural language processing.
There are three principal contributions to computational linguistics in this
thesis. The first is to extend the DisCoCat framework on the syntactic front
and semantic front, incorporating a number of syntactic analysis formalisms and
providing learning procedures allowing for the generation of concrete
compositional distributional models. The second contribution is to evaluate the
models developed from the procedures presented here, showing that they
outperform other compositional distributional models present in the literature.
The third contribution is to show how using category theory to solve linguistic
problems forms a sound basis for research, illustrated by examples of work on
this topic, that also suggest directions for future research.Comment: DPhil Thesis, University of Oxford, Submitted and accepted in 201
Meaning versus Grammar
This volume investigates the complicated relationship between grammar, computation, and meaning in natural languages. It details conditions under which meaning-driven processing of natural language is feasible, discusses an operational and accessible implementation of the grammatical cycle for Dutch, and offers analyses of a number of further conjectures about constituency and entailment in natural language
- …