2 research outputs found

    Ripple-down rules based open information extraction for the web documents

    Full text link
    The World Wide Web contains a massive amount of information in unstructured natural language and obtaining valuable information from informally written Web documents is a major research challenge. One research focus is Open Information Extraction (OIE) aimed at developing relation-independent information extraction. Open Information Extraction systems seek to extract all potential relations from the text rather than extracting few pre-defined relations. Previous machine learning-based Open Information Extraction systems require large volumes of labelled training examples and have trouble handling NLP tools errors caused by Web s informality. These systems used self-supervised learning that generates a labelled training dataset automatically using NLP tools with some heuristic rules. As the number of NLP tool errors increase because of the Web s informality, the self-supervised learning-based labelling technique produces noisy label and critical extraction errors. This thesis presents Ripple-Down Rules based Open Information Extraction (RDROIE) an approach to Open Information Extraction that uses Ripple-Down Rules (RDR) incremental learning technique. The key advantages of this approach are that it does not require labelled training dataset and can handle the freer writing style that occurs in Web documents and can correct errors introduced by NLP tools. The RDROIE system, with minimal low-cost rule addition, outperformed previous OIE systems on informal Web documents
    corecore