3 research outputs found

    Knowledge Discovery in Online Repositories: A Text Mining Approach

    Get PDF
    Before the advent of the Internet, the newspapers were the prominent instrument of mobilization for independence and political struggles. Since independence in Nigeria, the political class has adopted newspapers as a medium of Political Competition and Communication. Consequently, most political information exists in unstructured form and hence the need to tap into it using text mining algorithm. This paper implements a text mining algorithm on some unstructured data format in some newspapers. The algorithm involves the following natural language processing techniques: tokenization, text filtering and refinement. As a follow-up to the natural language techniques, association rule mining technique of data mining is used to extract knowledge using the Modified Generating Association Rules based on Weighting scheme (GARW). The main contributions of the technique are that it integrates information retrieval scheme (Term Frequency Inverse Document Frequency) (for keyword/feature selection that automatically selects the most discriminative keywords for use in association rules generation) with Data Mining technique for association rules discovery. The program is applied to Pre-Election information gotten from the website of the Nigerian Guardian newspaper. The extracted association rules contained important features and described the informative news included in the documents collection when related to the concluded 2007 presidential election. The system presented useful information that could help sanitize the polity as well as protect the nascent democracy

    Semantic Annotation Using Horizontal and Vertical Contexts

    No full text

    Semantic Annotation using Horizontal and Vertical Contexts

    No full text
    Abstract. This paper addresses the issue of semantic annotation using horizontal and vertical contexts. Semantic annotation is a task of annotating web pages with ontological information. Information on a web page is usually two-dimensionally laid out, previous semantic annotation methods that view a web page as an ’object ’ sequence has limitations. In this paper, to better incorporate the two-dimensional contexts, semantic annotation is formalized as a problem of block detection and text annotation. Block detection is aimed at detecting the text block by making use of context in one dimension and text annotation is aimed at detecting the ‘targeted instance ’ in the identified blocks using the other dimensional context. A two-stage method for semantic annotation using machine learning has been proposed. Experimental results indicate that the proposed method can significantly outperform the baseline method as well as the sequence-based method for semantic annotation. 1
    corecore