460 research outputs found

    Domain-Specific Knowledge Acquisition for Conceptual Sentence Analysis

    Get PDF
    The availability of on-line corpora is rapidly changing the field of natural language processing (NLP) from one dominated by theoretical models of often very specific linguistic phenomena to one guided by computational models that simultaneously account for a wide variety of phenomena that occur in real-world text. Thus far, among the best-performing and most robust systems for reading and summarizing large amounts of real-world text are knowledge-based natural language systems. These systems rely heavily on domain-specific, handcrafted knowledge to handle the myriad syntactic, semantic, and pragmatic ambiguities that pervade virtually all aspects of sentence analysis. Not surprisingly, however, generating this knowledge for new domains is time-consuming, difficult, and error-prone, and requires the expertise of computational linguists familiar with the underlying NLP system. This thesis presents Kenmore, a general framework for domain-specific knowledge acquisition for conceptual sentence analysis. To ease the acquisition of knowledge in new domains, Kenmore exploits an on-line corpus using symbolic machine learning techniques and robust sentence analysis while requiring only minimal human intervention. Unlike most approaches to knowledge acquisition for natural language systems, the framework uniformly addresses a range of subproblems in sentence analysis, each of which traditionally had required a separate computational mechanism. The thesis presents the results of using Kenmore with corpora from two real-world domains (1) to perform part-of-speech tagging, semantic feature tagging, and concept tagging of all open-class words in the corpus; (2) to acquire heuristics for part-ofspeech disambiguation, semantic feature disambiguation, and concept activation; and (3) to find the antecedents of relative pronouns

    SCREEN: Learning a Flat Syntactic and Semantic Spoken Language Analysis Using Artificial Neural Networks

    Get PDF
    In this paper, we describe a so-called screening approach for learning robust processing of spontaneously spoken language. A screening approach is a flat analysis which uses shallow sequences of category representations for analyzing an utterance at various syntactic, semantic and dialog levels. Rather than using a deeply structured symbolic analysis, we use a flat connectionist analysis. This screening approach aims at supporting speech and language processing by using (1) data-driven learning and (2) robustness of connectionist networks. In order to test this approach, we have developed the SCREEN system which is based on this new robust, learned and flat analysis. In this paper, we focus on a detailed description of SCREEN's architecture, the flat syntactic and semantic analysis, the interaction with a speech recognizer, and a detailed evaluation analysis of the robustness under the influence of noisy or incomplete input. The main result of this paper is that flat representations allow more robust processing of spontaneous spoken language than deeply structured representations. In particular, we show how the fault-tolerance and learning capability of connectionist networks can support a flat analysis for providing more robust spoken-language processing within an overall hybrid symbolic/connectionist framework.Comment: 51 pages, Postscript. To be published in Journal of Artificial Intelligence Research 6(1), 199

    Lexical Functions For Ants Based Semantic Analysis.

    Get PDF
    Semantic analysis (SA) is a central operation in natural language processing. We can consider it as the resolution of 5 problems: lexical ambiguity, references, prepositional attachments, interpretative paths and lexical functions instanciation

    Statistical Parsing by Machine Learning from a Classical Arabic Treebank

    Get PDF
    Research into statistical parsing for English has enjoyed over a decade of successful results. However, adapting these models to other languages has met with difficulties. Previous comparative work has shown that Modern Arabic is one of the most difficult languages to parse due to rich morphology and free word order. Classical Arabic is the ancient form of Arabic, and is understudied in computational linguistics, relative to its worldwide reach as the language of the Quran. The thesis is based on seven publications that make significant contributions to knowledge relating to annotating and parsing Classical Arabic. Classical Arabic has been studied in depth by grammarians for over a thousand years using a traditional grammar known as i’rāb (إعغاة ). Using this grammar to develop a representation for parsing is challenging, as it describes syntax using a hybrid of phrase-structure and dependency relations. This work aims to advance the state-of-the-art for hybrid parsing by introducing a formal representation for annotation and a resource for machine learning. The main contributions are the first treebank for Classical Arabic and the first statistical dependency-based parser in any language for ellipsis, dropped pronouns and hybrid representations. A central argument of this thesis is that using a hybrid representation closely aligned to traditional grammar leads to improved parsing for Arabic. To test this hypothesis, two approaches are compared. As a reference, a pure dependency parser is adapted using graph transformations, resulting in an 87.47% F1-score. This is compared to an integrated parsing model with an F1-score of 89.03%, demonstrating that joint dependency-constituency parsing is better suited to Classical Arabic. The Quran was chosen for annotation as a large body of work exists providing detailed syntactic analysis. Volunteer crowdsourcing is used for annotation in combination with expert supervision. A practical result of the annotation effort is the corpus website: http://corpus.quran.com, an educational resource with over two million users per year

    Modelling, Detection And Exploitation Of Lexical Functions For Analysis.

    Get PDF
    Lexical functions (LF) model relations between terms in the lexicon. These relations can be knowledge about the world (Napoleon was an emperor) or knowledge about the language (‘destiny’ is synonym of ‘fate’)

    PP Attachment Ambiguity Resolution with Corpus-Based Pattern Distributions and Lexical Signatures

    No full text
    Invited PaperInternational audienceIn this paper, we propose a method combining unsupervised learning of lexical frequencies with semantic information aiming at improving PP attachment ambiguity resolution. Using the output of a robust parser, i.e. the set of all possible attachments for a given sentence, we query the Web and obtain statistical information about the frequencies of the attachments distributions as well as lexical signatures of the terms on the patterns. All this information is used to weight the dependencies yielded by the parser

    The spy saw a cop with a telescope: Who has the telescope? An attempt to understand the basic building blocks of ambiguous PP-attachment sequences

    Get PDF
    This paper explores the problem of ambiguous PP-attachment by extracting information from a PP-attachment corpus using Python. Cases of ambiguous PP-attachment involve sequences of the head words of the following type: verb > noun > preposition > noun. The head nouns of ambiguous PP-attachment sentences, as well as aspects beyond head words, are investigated by testing a number of hypotheses using a corpus of thousands of real-world examples. The hypotheses are partially based on theory and partially on empirical evidence. The results support some theoretical claims while discarding others. For instance, one finding that supports an existing claim is that of-PPs always attach to NPs whose heads are classifiers. This kind of knowledge can be put into practice when parsing natural language.This paper explores the problem of ambiguous PP-attachment by extracting information from a PP-attachment corpus using Python. Cases of ambiguous PP-attachment involve sequences of the head words of the following type: verb > noun > preposition > noun. The head nouns of ambiguous PP-attachment sentences, as well as aspects beyond head words, are investigated by testing a number of hypotheses using a corpus of thousands of real-world examples. The hypotheses are partially based on theory and partially on empirical evidence. The results support some theoretical claims while discarding others. For instance, one finding that supports an existing claim is that of-PPs always attach to NPs whose heads are classifiers. This kind of knowledge can be put into practice when parsing natural language
    corecore