1,103 research outputs found

    Combining linguistic and statistical analysis to extract relations from web documents

    No full text
    Search engines, question answering systems and classification systems alike can greatly profit from formalized world knowledge. Unfortunately, manually compiled collections of world knowledge (such as WordNet or the Suggested Upper Merged Ontology SUMO) often suffer from low coverage, high assembling costs and fast aging. In contrast, the World Wide Web provides an endless source of knowledge, assembled by millions of people, updated constantly and available for free. In this paper, we propose a novel method for learning arbitrary binary relations from natural language Web documents, without human interaction. Our system, LEILA, combines linguistic analysis and machine learning techniques to find robust patterns in the text and to generalize them. For initialization, we only require a set of examples of the target relation and a set of counterexamples (e.g. from WordNet). The architecture consists of 3 stages: Finding patterns in the corpus based on the given examples, assessing the patterns based on probabilistic confidence, and applying the generalized patterns to propose pairs for the target relation. We prove the benefits and practical viability of our approach by extensive experiments, showing that LEILA achieves consistent improvements over existing comparable techniques (e.g. Snowball, TextToOnto)

    Fusion of Knowledge-Based and Data-Driven Approaches to Grammar Induction

    Get PDF
    Georgiladakis S, Unger C, Iosif E, et al. Fusion of Knowledge-Based and Data-Driven Approaches to Grammar Induction. In: Fifteenth Annual Conference of the International Speech Communication Association. 2014.Using different sources of information for grammar induction results in grammars that vary in coverage and precision. Fusing such grammars with a strategy that exploits their strengths while minimizing their weaknesses is expected to produce grammars with superior performance. We focus on the fusion of grammars produced using a knowledge-based approach using lexicalized ontologies and a data-driven approach using semantic similarity clustering. We propose various algorithms for finding the map- ping between the (non-terminal) rules generated by each gram- mar induction algorithm, followed by rule fusion. Three fusion approaches are investigated: early, mid and late fusion. Results show that late fusion provides the best relative F-measure per- formance improvement by 20%

    A review of the state of the art in Machine Learning on the Semantic Web: Technical Report CSTR-05-003

    Get PDF
    • …
    corecore