341 research outputs found

    An Incremental Algorithm for Transition-based CCG Parsing

    Get PDF
    Incremental parsers have potential advantages for applications like language modeling for machine translation and speech recognition. We describe a new algorithm for incremental transition-based Combinatory Categorial Grammar parsing. As English CCGbank derivations are mostly right branching and non-incremental, we design our algorithm based on the dependencies resolved rather than the derivation. We introduce two new actions in the shift-reduce paradigm based on the idea of 'revealing' (Pareschi and Steedman, 1987) the required information during parsing. On the standard CCGbank test data, our algorithm achieved improvements of 0.88% in labeled and 2.0% in unlabeled F-score over a greedy non-incremental shift-reduce parser.11 page(s

    Transition-based combinatory categorial grammar parsing for English and Hindi

    Get PDF
    Given a natural language sentence, parsing is the task of assigning it a grammatical structure, according to the rules within a particular grammar formalism. Different grammar formalisms like Dependency Grammar, Phrase Structure Grammar, Combinatory Categorial Grammar, Tree Adjoining Grammar are explored in the literature for parsing. For example, given a sentence like “John ate an apple”, parsers based on the widely used dependency grammars find grammatical relations, such as that ‘John’ is the subject and ‘apple’ is the object of the action ‘ate’. We mainly focus on Combinatory Categorial Grammar (CCG) in this thesis. In this thesis, we present an incremental algorithm for parsing CCG for two diverse languages: English and Hindi. English is a fixed word order, SVO (Subject-Verb- Object), and morphologically simple language, whereas, Hindi, though predominantly a SOV (Subject-Object-Verb) language, is a free word order and morphologically rich language. Developing an incremental parser for Hindi is really challenging since the predicate needed to resolve dependencies comes at the end. As previously available shift-reduce CCG parsers use English CCGbank derivations which are mostly right branching and non-incremental, we design our algorithm based on the dependencies resolved rather than the derivation. Our novel algorithm builds a dependency graph in parallel to the CCG derivation which is used for revealing the unbuilt structure without backtracking. Though we use dependencies for meaning representation and CCG for parsing, our revealing technique can be applied to other meaning representations like lambda expressions and for non-CCG parsing like phrase structure parsing. Any statistical parser requires three major modules: data, parsing algorithm and learning algorithm. This thesis is broadly divided into three parts each dealing with one major module of the statistical parser. In Part I, we design a novel algorithm for converting dependency treebank to CCGbank. We create Hindi CCGbank with a decent coverage of 96% using this algorithm. We also do a cross-formalism experiment where we show that CCG supertags can improve widely used dependency parsers. We experiment with two popular dependency parsers (Malt and MST) for two diverse languages: English and Hindi. For both languages, CCG categories improve the overall accuracy of both parsers by around 0.3-0.5% in all experiments. For both parsers, we see larger improvements specifically on dependencies at which they are known to be weak: long distance dependencies for Malt, and verbal arguments for MST. The result is particularly interesting in the case of the fast greedy parser (Malt), since improving its accuracy without significantly compromising speed is relevant for large scale applications such as parsing the web. We present a novel algorithm for incremental transition-based CCG parsing for English and Hindi, in Part II. Incremental parsers have potential advantages for applications like language modeling for machine translation and speech recognition. We introduce two new actions in the shift-reduce paradigm for revealing the required information during parsing. We also analyze the impact of a beam and look-ahead for parsing. In general, using a beam and/or look-ahead gives better results than not using them. We also show that the incremental CCG parser is more useful than a non-incremental version for predicting relative sentence complexity. Given a pair of sentences from wikipedia and simple wikipedia, we build a classifier which predicts if one sentence is simpler/complex than the other. We show that features from a CCG parser in general and incremental CCG parser in particular are more useful than a chart-based phrase structure parser both in terms of speed and accuracy. In Part III, we develop the first neural network based training algorithm for parsing CCG. We also study the impact of neural network based tagging models, and greedy versus beam-search parsing, by using a structured neural network model. In greedy settings, neural network models give significantly better results than the perceptron models and are also over three times faster. Using a narrow beam, structured neural network model gives consistently better results than the basic neural network model. For English, structured neural network gives similar performance to structured perceptron parser. But for Hindi, structured perceptron is still the winner

    A syntactified direct translation model with linear-time decoding

    Get PDF
    Recent syntactic extensions of statistical translation models work with a synchronous context-free or tree-substitution grammar extracted from an automatically parsed parallel corpus. The decoders accompanying these extensions typically exceed quadratic time complexity. This paper extends the Direct Translation Model 2 (DTM2) with syntax while maintaining linear-time decoding. We employ a linear-time parsing algorithm based on an eager, incremental interpretation of Combinatory Categorial Grammar (CCG). As every input word is processed, the local parsing decisions resolve ambiguity eagerly, by selecting a single supertag–operator pair for extending the dependency parse incrementally. Alongside translation features extracted from the derived parse tree, we explore syntactic features extracted from the incremental derivation process. Our empirical experiments show that our model significantly outperforms the state-of-the art DTM2 system

    A Transition-Based Directed Acyclic Graph Parser for UCCA

    Full text link
    We present the first parser for UCCA, a cross-linguistically applicable framework for semantic representation, which builds on extensive typological work and supports rapid annotation. UCCA poses a challenge for existing parsing techniques, as it exhibits reentrancy (resulting in DAG structures), discontinuous structures and non-terminal nodes corresponding to complex semantic units. To our knowledge, the conjunction of these formal properties is not supported by any existing parser. Our transition-based parser, which uses a novel transition set and features based on bidirectional LSTMs, has value not just for UCCA parsing: its ability to handle more general graph structures can inform the development of parsers for other semantic DAG structures, and in languages that frequently use discontinuous structures.Comment: 16 pages; Accepted as long paper at ACL201
    corecore