50 research outputs found

    Knowledge Discovery in Online Repositories: A Text Mining Approach

    Get PDF
    Before the advent of the Internet, the newspapers were the prominent instrument of mobilization for independence and political struggles. Since independence in Nigeria, the political class has adopted newspapers as a medium of Political Competition and Communication. Consequently, most political information exists in unstructured form and hence the need to tap into it using text mining algorithm. This paper implements a text mining algorithm on some unstructured data format in some newspapers. The algorithm involves the following natural language processing techniques: tokenization, text filtering and refinement. As a follow-up to the natural language techniques, association rule mining technique of data mining is used to extract knowledge using the Modified Generating Association Rules based on Weighting scheme (GARW). The main contributions of the technique are that it integrates information retrieval scheme (Term Frequency Inverse Document Frequency) (for keyword/feature selection that automatically selects the most discriminative keywords for use in association rules generation) with Data Mining technique for association rules discovery. The program is applied to Pre-Election information gotten from the website of the Nigerian Guardian newspaper. The extracted association rules contained important features and described the informative news included in the documents collection when related to the concluded 2007 presidential election. The system presented useful information that could help sanitize the polity as well as protect the nascent democracy

    Trans-disciplinarity and digital humanity: lessons learned from developing text mining tools for textual analysis

    Get PDF
    This peer-reviewed chapter advances social science research on text mining and data mining, which are key artificial intelligence technologies applied in the digital humanities. The chapter provides a detailed documentation of an interdisciplinary project conducted by a team consisting of social scientists, linguists and software engineers to develop a set of bespoke text-mining tools for researchers in the humanities. Through looking at the user-participatory development processes of the text-mining tools, this chapter aims to improve our understandings of digital humanities in the context of scholarly research and, from a pragmatist perspective, to highlight its trans-disciplinary potential. The paper both analyses and produces an empirical account of interdisciplinary research practices across the social sciences and humanities. It concludes with a discussion of some methodological and socio-technical challenges of the 'digital humanity' emerging in this shift towards trans-disciplinarity, particularly focusing on the topic of 'interpretative flexibility'. The edited collection, which is interdisciplinary in nature, develops knowledge of how the application of new computational techniques and visualisation technologies in the arts and humanities is resulting in fresh approaches and methodologies for the study of new and traditional corpora. It includes articles from internationally significant scholars such as N. Katherine Hayles and Lev Manovich. The realisation of this piece has benefited from discussion at the 2009 Media, Communication and Cultural Studies Association (MeCCSA) conference at Bradford, 14–16 January 2009, and the Computational Turn Workshop at Swansea on 9 March 2010, where an earlier version of this paper was presented

    Improving Customer Relationship Management through Integrated Mining of Heterogeneous Data

    Get PDF
    The volume of information available on the Internet and corporate intranets continues to increase along with the corresponding increase in the data (structured and unstructured) stored by many organizations. In customer relationship management, information is the raw material for decision making. For this to be effective, there is need to discover knowledge from the seamless integration of structured and unstructured data for completeness and comprehensiveness which is the main focus of this paper. In the integration process, the structured component is selected based on the resulting keywords from the unstructured text preprocessing process, and association rules is generated based on the modified GARW (Generating Association Rules Based on Weighting Scheme) Algorithm. The main contribution of this technique is that the unstructured component of the integration is based on Information retrieval technique which is based on content similarity of XML (Extensible Markup Language) document. This similarity is based on the combination of syntactic and semantic relevance. Experiments carried out revealed that the extracted association rules contain important features which form a worthy platform for making effective decisions as regards customer relationship management. The performance of the integration approach is also compared with a similar approach which uses just syntactic relevance in its information extraction process to reveal a significant reduction in the large itemsets and execution time. This leads to reduction in rules generated to more interesting ones due to the semantic clustering of XML documents introduced into the improved integrated mining technique

    Text Mining e-Complaints Data From e-Auction Store With Implications For Internet Marketing Research

    Get PDF
    This study seeks to analyze the effectiveness of the text mining process.  Complaint forums on various consumer report websites will be analyzed using text and data mining software.  Data from feedback forums will be compiled and analyzed using a text miner software program.  The relationships and patterns among keywords and their associations will be cluster-analyzed to gain a deeper understanding of the data. A case study will also be conducted to assay the effectiveness of text mined.  The data of Internet complaint forum, http://www.planetfeeback.com, will be text mined.  The decision to use an Internet complaint forum as the case subject was made because of its easy access and reputation as storage medium for large sources of data.  The main goal of this study is to gauge the effectiveness of text mining.  The complaint forum will be text mined to find relationships.  The results will then be analyzed and then interpreted to determine the effectiveness of the text and data mining process.   &nbsp

    A support system for predicting eBay end prices.

    Get PDF
    In this report a support system for predicting end prices on eBay is proposed. The end price predictions are based on the item descriptions found in the item listings of eBay, and on some numerical item features. The system uses text mining and boosting algorithms from the field of machine learning. Our system substantially outperforms the naive method of predicting the category mean price. Moreover, interpretation of the model enables us to identify influential terms in the item descriptions and shows that the item description is more influential than the seller feedback rating, which was shown to be influential in earlier studies