7 research outputs found

    Evaluation of LTAG parsing with supertag compaction

    Get PDF
    One of the biggest concerns that has been raised over the feasibility of using large-scale LTAGs in NLP is the amount of redundancy within a grammar¿s elementary tree set. This has led to various proposals on how best to represent grammars in a way that makes them compact and easily maintained (Vijay-Shanker and Schabes, 1992; Becker, 1993; Becker, 1994; Evans, Gazdar and Weir, 1995; Candito, 1996). Unfortunately, while this work can help to make the storage of grammars more efficient, it does nothing to prevent the problem reappearing when the grammar is processed by a parser and the complete set of trees is reproduced. In this paper we are concerned with an approach that addresses this problem of computational redundancy in the trees, and evaluate its effectiveness

    Extensible Dependency Grammar: a modular grammar formalism based on multigraph description

    Get PDF
    This thesis develops Extensible Dependency Grammar (XDG), a new grammar formalism combining dependency grammar, model-theoretic syntax, and Jackendoff\u27;s parallel grammar architecture. The design of XDG is strongly geared towards modularity: grammars can be modularly extended by any linguistic aspect such as grammatical functions, word order, predicate-argument structure, scope, information structure and prosody, where each aspect is modeled largely independently on a separate dimension. The intersective demands of the dimensions make many complex linguistic phenomena such as extraction in syntax, scope ambiguities in the semantics, and control and raising in the syntax-semantics interface simply fall out as by-products without further stipulation. This thesis makes three main contributions: 1. The first formalization of XDG as a multigraph description language in higher order logic, and investigations of its expressivity and computational complexity. 2. The first implementation of XDG, the XDG Development Kit (XDK), an extensive grammar development environment built around a constraint parser for XDG. 3. The first application of XDG to natural language, modularly modeling a fragment of English

    The Automatic Acquisition of Knowledge about Discourse Connectives

    Get PDF
    Institute for Communicating and Collaborative SystemsThis thesis considers the automatic acquisition of knowledge about discourse connectives. It focuses in particular on their semantic properties, and on the relationships that hold between them. There is a considerable body of theoretical and empirical work on discourse connectives. For example, Knott (1996) motivates a taxonomy of discourse connectives based on relationships between them, such as HYPONYMY and EXCLUSIVE, which are defined in terms of substitution tests. Such work requires either great theoretical insight or manual analysis of large quantities of data. As a result, to date no manual classification of English discourse connectives has achieved complete coverage. For example, Knott gives relationships between only about 18% of pairs obtained from a list of 350 discourse connectives. This thesis explores the possibility of classifying discourse connectives automatically, based on their distributions in texts. This thesis demonstrates that state-of-the-art techniques in lexical acquisition can successfully be applied to acquiring information about discourse connectives. Central to this thesis is the hypothesis that distributional similarity correlates positively with semantic similarity. Support for this hypothesis has previously been found for word classes such as nouns and verbs (Miller and Charles, 1991; Resnik and Diab, 2000, for example), but there has been little exploration of the degree to which it also holds for discourse connectives. We investigate the hypothesis through a number of machine learning experiments. These experiments all use unsupervised learning techniques, in the sense that they do not require any manually annotated data, although they do make use of an automatic parser. First, we show that a range of semantic properties of discourse connectives, such as polarity and veridicality (whether or not the semantics of a connective involves some underlying negation, and whether the connective implies the truth of its arguments, respectively), can be acquired automatically with a high degree of accuracy. Second, we consider the tasks of predicting the similarity and substitutability of pairs of discourse connectives. To assist in this, we introduce a novel information theoretic function based on variance that, in combination with distributional similarity, is useful for learning such relationships. Third, we attempt to automatically construct taxonomies of discourse connectives capturing substitutability relationships. We introduce a probability model of taxonomies, and show that this can improve accuracy on learning substitutability relationships. Finally, we develop an algorithm for automatically constructing or extending such taxonomies which uses beam search to help find the optimal taxonomy

    Interpretación tabular de autómatas para lenguajes de adjunción de árboles

    Get PDF
    [Resumen] Las gramáticas de adjunción de árboles son una extensión de las gramáticas independientes del contexto que utilizan árboles en vez de producciones como estructuras elementales y que resultan adecuadas para la descripción de la mayor parte de las construcciones sintácticas presentes en el lenguaje natural. Los lenguajes generados por esta clase de gramáticas se denominan lenguajes de adjunción de árboles y son equivalentes a los lenguajes generados por las gramáticas lineales de índices y otros formalismos suavemente dependientes del contexto. En la primera parte de esta memoria se presenta el problema del análisis sintáctico de los lenguajes de adjunción de árboles. Para ello, se establece un camino evolutivo continuo en el que se sitúan los algoritmos de análisis sintáctico que incorporan las estrategias de análisis más importantes, tanto para el caso de las gramáticas de adjunción de árboles como para el caso de las gramáticas lineales de índices. En la segunda parte se definen diferentes modelos de autómata que aceptan exactamente los lenguajes de adjunción de árboles y se proponen técnicas que permiten su ejecución eficiente. La utilización de autómatas para realizar el análisis sintáctico es interesante porque permite separar el problema de la definición de un algoritmo de análisis sintáctico del problema de la ejecución del mismo, al tiempo que simplifica las pruebas de corrección. Concretamente, hemos estudiado los siguientes modelos de autómata: • Los autómatas a pila embebidos descendentes y ascendentes, dos extensiones de ^ los autómatas a pila que utilizan como estructura de almacenamiento una pila de pilas. Hemos definido nuevas versiones de estos autómatas en las cuales se simplifica la forma de las transiciones y se elimina el control de estado finito, manteniendo la potencia expresiva. • La restricción de los autómatas lógicos a pila para adaptarlos al reconocimiento de las gramáticas lineales de índices, obteniéndose diferentes tipos de autómatas especializados en diversas estrategias de análisis según el conjunto de transiciones permitido. • Los autómatas lineales de índices, tanto los orientados a la derecha, adecuados para estrategias en las cuales las adjunciones se reconocen de manera ascendente, los orientados a la izquierda, aptos para estrategias de análisis en las que las adjunciones se tratan de forma descendente, como los fuertemente dirigidos, capaces de incorporar estrategias de análisis en las cuales las adjunciones se tratan de manera ascendente y/o descendente. • Los autómatas con dos pilas, una extensión de los autómatas a pila que trabaja con una pila maestra encargada de dirigir el proceso de análisis y una pila auxiliar que restringe las transiciones aplicables en un momento dado. Hemos descrito dos versiones diferentes de este tipo de autómatas, los autómatas con dos pilas fuertemente dirigidos, aptos para describir estrategias de análisis arbitrarias, y los autómatas con dos pilas ascendentes, adecuados para describir estrategias de análisis en las cuales las adjunciones se procesan ascendentemente. Hemos definido esquemas de compilación para todos estos modelos de autómata. Estos esquemas permiten obtener el conjunto de transiciones correspondiente a la implantación de una determinada estrategia de análisis sintáctico para una gramática dada. Todos los modelos de autómata pueden ser ejecutados en tiempo polinomial con respecto a la longitud de la cadena de entrada mediante la aplicación de técnicas de interpretación tabular. Estas técnicas se basan en la manipulación de representaciones colapsadas de las configuraciones del autómata, denominadas ítems, que se almacenan en una tabla para su posterior reutilización. Con ello se evita la realización de cálculos redundantes. Finalmente, hemos analizado conjuntamente los diferentes modelos de autómata, los cuales se pueden dividir en tres grandes grupos: la familia de los autómatas generales, de la que forman parte los autómatas lineales de índices fuertemente dirigidos y los autómatas con dos pilas fuertemente dirigidos; la familia de los autómatas descendentes, en la que se encuadran los autómatas a pila embebidos y los autómatas lineales de índices orientados a la izquierda; y la familia de los autómatas ascendentes, en la que se enmarcan los autómatas a pila embebidos ascendentes, los autómatas lineales de índices orientados a la derecha y los autómatas con dos pilas ascendentes.[Abstract] Tree adjoining grammars are an extension of context-free grammars that use trees instead of productions as the primary representing structure and that are considered to be adequate to describe most of syntactic phenomena occurring in natural languages. These grammars generate the class of tree adjoining languages, which is equivalent to the class of languages generated by linear indexed grammars and other mildly context-sensitive formalisms. In the first part of this dissertation, we introduce the problem of parsing tree adjoining grammars and linear indexed grammars, creating, for both formalisms, a continuum from simple pure bottom-up algorithms to complex predictive algorithms and showing what transformations must be applied to each one in order to obtain the next one in the continuum. In the second part, we define several models of automata that accept the class of tree adjoining languages, proposing techniques for their efficient execution. The use of automata for parsing is interesting because they allow us to separate the problem of the definition of parsing algorithms from the problem of their execution. We have considered the following types of automata: • Top-down and bottom-up embedded push-down automata, two extensions of push-down automata working on nested stacks. A new definition is provided in which the finite-state control has been eliminated and several kinds of normalized transition have been defined, preserving the equivalence with tree adjoining languages. • Logical push-down automata restricted to the case of tree adjoining languages. Depending on the set of allowed transitions, we obtain three different types of automata. • Linear indexed automata, left-oriented and right-oriented to describe parsing strategies in which adjuntions are recognized top-down and bottom-up, respectively, and stronglydriven to define parsing strategies recognizing adjunctions top-down and/or bottom-up. • 2-stack automata, an extension of push-down automata working on a pair of stacks, a master stack driving the parsing process and an auxiliary stack restricting the set of transitions that can be applied at a given moment. Strongly-driven 2-stack automata can be used to describe bottom-up, top-down or mixed parsing strategies for tree adjoining languages with respect to the recognition of the adjunctions. Bottom-up 2-stack automata are specifically designed for parsing strategies recognizing adjunctions bottom-up. Compilation schemata for these models of automata have been defined. A compilation schema allow us to obtain the set of transitions corresponding to the implementation of a^ parsing strategy for a given grammar. All the presented automata can be executed in polynomial time with respect to the length of the input string by applying tabulation techniques. A tabular technique makes possible to interpret an automaton by means of the manipulation of collapsed representation of configurations (called items) instead of actual configurations. Items are stored into a table in order to be reused, avoiding redundant computations. Finally, we have studied the relations among the diíferent classes of automata, the main dif%rence being the storage structure used: embedded stacks, indices lists or coupled stacks. According to the strategies that can be implemented, we can distinguish three kinds of automata: bottom-up automata, including bottom-up embedded push-down automata, bottomup restricted logic push-down automata, right-oriented linear indexed automata and bottom-up 2-stack automata; top-down automata, including (top-down) embedded push-down automata, top-down restricted logic push-down automata and left-oriented linear indexed automata; and general automata, including strongly-driven linear indexed automata and strongly-driven 2- stack automata
    corecore