1,470 research outputs found

    A Machine learning approach to POS tagging

    Get PDF
    We have applied inductive learning of statistical decision trees and relaxation labelling to the Natural Language Processing (NLP) task of morphosyntactic disambiguation (Part Of Speech Tagging). The learning process is supervised and obtains a language model oriented to resolve POS ambiguities. This model consists of a set of statistical decision trees expressing distribution of tags and words in some relevant contexts. The acquired language models are complete enough to be directly used as sets of POS disambiguation rules, and include more complex contextual information than simple collections of n-grams usually used in statistical taggers. We have implemented a quite simple and fast tagger that has been tested and evaluated on the Wall Street Journal (WSJ) corpus with a remarkable accuracy. However, better results can be obtained by translating the trees into rules to feed a flexible relaxation labelling based tagger. In this direction we describe a tagger which is able to use information of any kind (n-grams, automatically acquired constraints, linguistically motivated manually written constraints, etc.), and in particular to incorporate the machine learned decision trees. Simultaneously, we address the problem of tagging when only small training material is available, which is crucial in any process of constructing, from scratch, an annotated corpus. We show that quite high accuracy can be achieved with our system in this situation.Postprint (published version

    Complexity of Lexical Descriptions and its Relevance to Partial Parsing

    Get PDF
    In this dissertation, we have proposed novel methods for robust parsing that integrate the flexibility of linguistically motivated lexical descriptions with the robustness of statistical techniques. Our thesis is that the computation of linguistic structure can be localized if lexical items are associated with rich descriptions (supertags) that impose complex constraints in a local context. However, increasing the complexity of descriptions makes the number of different descriptions for each lexical item much larger and hence increases the local ambiguity for a parser. This local ambiguity can be resolved by using supertag co-occurrence statistics collected from parsed corpora. We have explored these ideas in the context of Lexicalized Tree-Adjoining Grammar (LTAG) framework wherein supertag disambiguation provides a representation that is an almost parse. We have used the disambiguated supertag sequence in conjunction with a lightweight dependency analyzer to compute noun groups, verb groups, dependency linkages and even partial parses. We have shown that a trigram-based supertagger achieves an accuracy of 92.1‰ on Wall Street Journal (WSJ) texts. Furthermore, we have shown that the lightweight dependency analysis on the output of the supertagger identifies 83‰ of the dependency links accurately. We have exploited the representation of supertags with Explanation-Based Learning to improve parsing effciency. In this approach, parsing in limited domains can be modeled as a Finite-State Transduction. We have implemented such a system for the ATIS domain which improves parsing eciency by a factor of 15. We have used the supertagger in a variety of applications to provide lexical descriptions at an appropriate granularity. In an information retrieval application, we show that the supertag based system performs at higher levels of precision compared to a system based on part-of-speech tags. In an information extraction task, supertags are used in specifying extraction patterns. For language modeling applications, we view supertags as syntactically motivated class labels in a class-based language model. The distinction between recursive and non-recursive supertags is exploited in a sentence simplification application

    Developing a knowledge base for preposition sense disambiguation: A view from Role and Reference Grammar and FunGramKB

    Full text link
    Prepositions represent a grammatical category of frequent use in many European languages. The combination of their semantics with other lexical categories usually makes them difficult to be computationally tractable. As far as natural language processing is concerned, some studies have contributed to make progress on the usage of prepositions. However, there still exists a need to develop a model that allows tackling the problems which result from the disambiguation of prepositional semantics. The goal of this paper is to describe a lexico-conceptual model which can store the knowledge required to disambiguate predicate prepositions, as well as how this model can be exploited by a parser to extract the semantic representation of a text. The theoretical foundation of this approach, which is grounded on the premises of Role and Reference Grammar and FunGramKB, is illustrated with temporal adjuncts expressed by prepositional phrases in English.Financial support for this research has been provided by the DGI, Spanish Ministry of Education and Science, grant FFI2011-29798-C02-01. Moreover, much of this work has resulted from the first author's ongoing PhD thesis "La desambiguacion semantica de los sintagmas prepositivos como adjuntos perifericos en el marco de la Gramatica del Papel y la Referencia: un enfoque desde la linguistica computacional y la ingenieria del conocimiento", to be presented in Universidad Nacional de Educacion a Distancia (UNED).Hernández-Pastor, D.; Periñán Pascual, JC. (2016). Developing a knowledge base for preposition sense disambiguation: A view from Role and Reference Grammar and FunGramKB. Onomázein : Revista de Lingüística, Filología y Traducción. 33:251-288. https://doi.org/10.7764/onomazein.33.16S2512883
    • …
    corecore