1 research outputs found

    Collective Ontology-based Information Extraction using Probabilistic Graphical Models

    No full text
    This doctoral research is being led under the supervision of prof. dr. Marko Bajec 1. Abstract. Information Extraction (IE) is a process of extracting structured data from unstructured sources. It roughly consists of subtasks named entity recognition, relation extraction and coreference resolution. Researchers have primarily focused just on one subtask or their combination in a pipeline. In this paper we introduce an intelligent collective IE system combining all three subtasks by employing conditional random fields. The usage of same learning model enables us to easily communicate between iterations on the fly and to correct errors during iterative process execution. In addition to the architecture we introduce novel semantic and collective feature functions. The system’s output is labelled according to an ontology and new instances are automatically created during runtime. The ontology as a schema encodes a set of constraints, defines optional manual rules or patterns and with instances provides semantic gazetteer lists. The proposed framework is being developed during ongoing PhD research. It’s main contributions are intelligent iterative interconnection of the selected subtasks, extensive use of context-specific features and parameterless system that can be guided by an ontology. Some preliminary results combining just two subtasks already show promising results over traditional approaches
    corecore