Search CORE

2 research outputs found

Actes du 17ème séminaire sur le raisonnement à partir de Cas, Paris (29--30 juin)

Author: Fuchs Béatrice
Napoli Amedeo
Publication venue: LORIA
Publication date: 01/01/2009
Field of study

Actes du séminaire RàPC 2009

INRIA a CCSD electronic archive server

Hal-Diderot

Ripple-down rules based open information extraction for the web documents

Author: Kim Myung Hee
Publication venue: UNSW, Sydney
Publication date: 01/01/2012
Field of study

The World Wide Web contains a massive amount of information in unstructured natural language and obtaining valuable information from informally written Web documents is a major research challenge. One research focus is Open Information Extraction (OIE) aimed at developing relation-independent information extraction. Open Information Extraction systems seek to extract all potential relations from the text rather than extracting few pre-defined relations. Previous machine learning-based Open Information Extraction systems require large volumes of labelled training examples and have trouble handling NLP tools errors caused by Web s informality. These systems used self-supervised learning that generates a labelled training dataset automatically using NLP tools with some heuristic rules. As the number of NLP tool errors increase because of the Web s informality, the self-supervised learning-based labelling technique produces noisy label and critical extraction errors. This thesis presents Ripple-Down Rules based Open Information Extraction (RDROIE) an approach to Open Information Extraction that uses Ripple-Down Rules (RDR) incremental learning technique. The key advantages of this approach are that it does not require labelled training dataset and can handle the freer writing style that occurs in Web documents and can correct errors introduced by NLP tools. The RDROIE system, with minimal low-cost rule addition, outperformed previous OIE systems on informal Web documents

UNSWorks