Semi-Automatic Knowledge Augmentation: Methods and Tools

Abstract

Text mining techniques are being adopted in many different fields to face the problem of extracting meaningful information hidden in unstructured data. Hybrid processes (human-machine) of knowledge extraction are usually the best solution for companies to achieve great results and to ensure the conformity of the output of the knowledge extraction process. Anyway, state-of-art literature on Natural Language Processing (NLP) lacks in process management studies. In particular, researchers have not yet studied the best way to integrate NLP outputs with human activities. To our best knowledge, the present thesis is a first step in the desired direction. This work aims to investigate the techniques used for the development of Knowledge Base to be used in Text Mining applications and to develop a semi-automatic procedure for Knowledge Augmentation. After an overview on the state-of-art, different techniques of knowledge extraction are applied to four case studies: 1. A completely human-based approach; 2. An automatic keyword extraction approach based on the TF-IDF plus a manual review of the results; 3. POS-tagging based keyword extraction plus a manual review of the results; 4. Hybrid approach that uses regular expressions and an advanced deep-learning method (word embeddings) to extract keywords from documents. Statistical filters are then used to select meaningful words. The amount of human intervention decreases from the first to the last case study

    Similar works