30,794 research outputs found

    Knowledge Discovery in Online Repositories: A Text Mining Approach

    Get PDF
    Before the advent of the Internet, the newspapers were the prominent instrument of mobilization for independence and political struggles. Since independence in Nigeria, the political class has adopted newspapers as a medium of Political Competition and Communication. Consequently, most political information exists in unstructured form and hence the need to tap into it using text mining algorithm. This paper implements a text mining algorithm on some unstructured data format in some newspapers. The algorithm involves the following natural language processing techniques: tokenization, text filtering and refinement. As a follow-up to the natural language techniques, association rule mining technique of data mining is used to extract knowledge using the Modified Generating Association Rules based on Weighting scheme (GARW). The main contributions of the technique are that it integrates information retrieval scheme (Term Frequency Inverse Document Frequency) (for keyword/feature selection that automatically selects the most discriminative keywords for use in association rules generation) with Data Mining technique for association rules discovery. The program is applied to Pre-Election information gotten from the website of the Nigerian Guardian newspaper. The extracted association rules contained important features and described the informative news included in the documents collection when related to the concluded 2007 presidential election. The system presented useful information that could help sanitize the polity as well as protect the nascent democracy

    Opening up Magpie via semantic services

    Get PDF
    Magpie is a suite of tools supporting a ‘zero-cost’ approach to semantic web browsing: it avoids the need for manual annotation by automatically associating an ontology-based semantic layer to web resources. An important aspect of Magpie, which differentiates it from superficially similar hypermedia systems, is that the association between items on a web page and semantic concepts is not merely a mechanism for dynamic linking, but it is the enabling condition for locating services and making them available to a user. These services can be manually activated by a user (pull services), or opportunistically triggered when the appropriate web entities are encountered during a browsing session (push services). In this paper we analyze Magpie from the perspective of building semantic web applications and we note that earlier implementations did not fulfill the criterion of “open as to services”, which is a key aspect of the emerging semantic web. For this reason, in the past twelve months we have carried out a radical redesign of Magpie, resulting in a novel architecture, which is open both with respect to ontologies and semantic web services. This new architecture goes beyond the idea of merely providing support for semantic web browsing and can be seen as a software framework for designing and implementing semantic web applications

    FrameNet CNL: a Knowledge Representation and Information Extraction Language

    Full text link
    The paper presents a FrameNet-based information extraction and knowledge representation framework, called FrameNet-CNL. The framework is used on natural language documents and represents the extracted knowledge in a tailor-made Frame-ontology from which unambiguous FrameNet-CNL paraphrase text can be generated automatically in multiple languages. This approach brings together the fields of information extraction and CNL, because a source text can be considered belonging to FrameNet-CNL, if information extraction parser produces the correct knowledge representation as a result. We describe a state-of-the-art information extraction parser used by a national news agency and speculate that FrameNet-CNL eventually could shape the natural language subset used for writing the newswire articles.Comment: CNL-2014 camera-ready version. The final publication is available at link.springer.co

    Semantic Technologies for Manuscript Descriptions — Concepts and Visions

    Get PDF
    The contribution at hand relates recent developments in the area of the World Wide Web to codicological research. In the last number of years, an informational extension of the internet has been discussed and extensively researched: the Semantic Web. It has already been applied in many areas, including digital information processing of cultural heritage data. The Semantic Web facilitates the organisation and linking of data across websites, according to a given semantic structure. Software can then process this structural and semantic information to extract further knowledge. In the area of codicological research, many institutions are making efforts to improve the online availability of handwritten codices. If these resources could also employ Semantic Web techniques, considerable research potential could be unleashed. However, data acquisition from less structured data sources will be problematic. In particular, data stemming from unstructured sources needs to be made accessible to SemanticWeb tools through information extraction techniques. In the area of museum research, the CIDOC Conceptual Reference Model (CRM) has been widely examined and is being adopted successfully. The CRM translates well to Semantic Web research, and its concentration on contextualization of objects could support approaches in codicological research. Further concepts for the creation and management of bibliographic coherences and structured vocabularies related to the CRM will be considered in this chapter. Finally, a user scenario showing all processing steps in their context will be elaborated on
    corecore