101 research outputs found

    Advanced Semantics for Commonsense Knowledge Extraction

    Get PDF
    Commonsense knowledge (CSK) about concepts and their properties is useful for AI applications such as robust chatbots. Prior works like ConceptNet, TupleKB and others compiled large CSK collections, but are restricted in their expressiveness to subject-predicate-object (SPO) triples with simple concepts for S and monolithic strings for P and O. Also, these projects have either prioritized precision or recall, but hardly reconcile these complementary goals. This paper presents a methodology, called Ascent, to automatically build a large-scale knowledge base (KB) of CSK assertions, with advanced expressiveness and both better precision and recall than prior works. Ascent goes beyond triples by capturing composite concepts with subgroups and aspects, and by refining assertions with semantic facets. The latter are important to express temporal and spatial validity of assertions and further qualifiers. Ascent combines open information extraction with judicious cleaning using language models. Intrinsic evaluation shows the superior size and quality of the Ascent KB, and an extrinsic evaluation for QA-support tasks underlines the benefits of Ascent.Comment: Web interface available at https://ascent.mpi-inf.mpg.d

    The spy saw a cop with a telescope: Who has the telescope? An attempt to understand the basic building blocks of ambiguous PP-attachment sequences

    Get PDF
    This paper explores the problem of ambiguous PP-attachment by extracting information from a PP-attachment corpus using Python. Cases of ambiguous PP-attachment involve sequences of the head words of the following type: verb > noun > preposition > noun. The head nouns of ambiguous PP-attachment sentences, as well as aspects beyond head words, are investigated by testing a number of hypotheses using a corpus of thousands of real-world examples. The hypotheses are partially based on theory and partially on empirical evidence. The results support some theoretical claims while discarding others. For instance, one finding that supports an existing claim is that of-PPs always attach to NPs whose heads are classifiers. This kind of knowledge can be put into practice when parsing natural language.This paper explores the problem of ambiguous PP-attachment by extracting information from a PP-attachment corpus using Python. Cases of ambiguous PP-attachment involve sequences of the head words of the following type: verb > noun > preposition > noun. The head nouns of ambiguous PP-attachment sentences, as well as aspects beyond head words, are investigated by testing a number of hypotheses using a corpus of thousands of real-world examples. The hypotheses are partially based on theory and partially on empirical evidence. The results support some theoretical claims while discarding others. For instance, one finding that supports an existing claim is that of-PPs always attach to NPs whose heads are classifiers. This kind of knowledge can be put into practice when parsing natural language

    Information access and retieval with semantic background knowledge

    Get PDF
    Thesis (Ph. D.)--Massachusetts Institute of Technology, Program in Media Arts & Sciences, 1995.Includes bibliographical references (p. 161-174).Anil Srinivasa Chakravarthy.Ph.D

    ON THE USE OF NATURAL LANGUAGE PROCESSING FOR AUTOMATED CONCEPTUAL DATA MODELING

    Get PDF
    This research involved the development of a natural language processing (NLP) architecture for the extraction of entity relation diagrams (ERDs) from natural language requirements specifications. Conceptual data modeling plays an important role in database and software design and many approaches to automating and developing software tools for this process have been attempted. NLP approaches to this problem appear to be plausible because compared to general free texts, natural language requirements documents are relatively formal and exhibit some special regularities which reduce the complexity of the problem. The approach taken here involves a loose integration of several linguistic components. Outputs from syntactic parsing are used by a set of hueristic rules developed for this particular domain to produce tuples representing the underlying meanings of the propositions in the documents and semantic resources are used to distinguish between correct and incorrect tuples. Finally the tuples are integrated into full ERD representations. The major challenge addressed in this research is how to bring the various resources to bear on the translation of the natural language documents into the formal language. This system is taken to be representative of a potential class of similar systems designed to translate documents in other restricted domains into corresponding formalisms. The system is incorporated into a tool that presents the final ERDs to users who can modify them in the attempt to produce an accurate ERD for the requirements document. An experiment demonstrated that users with limited experience in ERD specifications could produce better representations of requirements documents than they could without the system, and could do so in less time

    Advanced Semantics for Commonsense Knowledge Extraction

    Get PDF
    Commonsense knowledge (CSK) about concepts and their properties is useful for AI applications such as robust chatbots. Prior works like ConceptNet, TupleKB and others compiled large CSK collections, but are restricted in their expressiveness to subject-predicate-object (SPO) triples with simple concepts for S and monolithic strings for P and O. Also, these projects have either prioritized precision or recall, but hardly reconcile these complementary goals. This paper presents a methodology, called Ascent, to automatically build a large-scale knowledge base (KB) of CSK assertions, with advanced expressiveness and both better precision and recall than prior works. Ascent goes beyond triples by capturing composite concepts with subgroups and aspects, and by refining assertions with semantic facets. The latter are important to express temporal and spatial validity of assertions and further qualifiers. Ascent combines open information extraction with judicious cleaning using language models. Intrinsic evaluation shows the superior size and quality of the Ascent KB, and an extrinsic evaluation for QA-support tasks underlines the benefits of Ascent

    Integrated conceptual parser

    Get PDF

    Anaphora resolution for Arabic machine translation :a case study of nafs

    Get PDF
    PhD ThesisIn the age of the internet, email, and social media there is an increasing need for processing online information, for example, to support education and business. This has led to the rapid development of natural language processing technologies such as computational linguistics, information retrieval, and data mining. As a branch of computational linguistics, anaphora resolution has attracted much interest. This is reflected in the large number of papers on the topic published in journals such as Computational Linguistics. Mitkov (2002) and Ji et al. (2005) have argued that the overall quality of anaphora resolution systems remains low, despite practical advances in the area, and that major challenges include dealing with real-world knowledge and accurate parsing. This thesis investigates the following research question: can an algorithm be found for the resolution of the anaphor nafs in Arabic text which is accurate to at least 90%, scales linearly with text size, and requires a minimum of knowledge resources? A resolution algorithm intended to satisfy these criteria is proposed. Testing on a corpus of contemporary Arabic shows that it does indeed satisfy the criteria.Egyptian Government
    corecore