50 research outputs found
Knowledge Discovery in Online Repositories: A Text Mining Approach
Before the advent of the Internet, the newspapers were the prominent instrument of
mobilization for independence and political struggles. Since independence in Nigeria, the
political class has adopted newspapers as a medium of Political Competition and
Communication. Consequently, most political information exists in unstructured form and
hence the need to tap into it using text mining algorithm.
This paper implements a text mining algorithm on some unstructured data format in some newspapers. The algorithm involves the following natural language processing techniques: tokenization, text filtering and refinement. As a follow-up to the natural language techniques, association rule mining technique of data mining is used to extract knowledge using the Modified Generating Association Rules based on Weighting scheme (GARW).
The main contributions of the technique are that it integrates information retrieval scheme (Term Frequency Inverse Document Frequency) (for keyword/feature selection that automatically selects the most discriminative keywords for use in association rules generation) with Data Mining technique for association rules discovery. The program is applied to Pre-Election information gotten from the website of the Nigerian Guardian newspaper. The extracted association rules contained important features and described the informative news included in the documents collection when related to the concluded 2007 presidential election. The system presented useful information that could help sanitize the polity as well as protect the nascent democracy
Trans-disciplinarity and digital humanity: lessons learned from developing text mining tools for textual analysis
This peer-reviewed chapter advances social science research on text mining and data mining, which are key artificial intelligence technologies applied in the digital humanities. The chapter provides a detailed documentation of an interdisciplinary project conducted by a team consisting of social scientists, linguists and software engineers to develop a set of bespoke text-mining tools for researchers in the humanities. Through looking at the user-participatory development processes of the text-mining tools, this chapter aims to improve our understandings of digital humanities in the context of scholarly research and, from a pragmatist perspective, to highlight its trans-disciplinary potential. The paper both analyses and produces an empirical account of interdisciplinary research practices across the social sciences and humanities. It concludes with a discussion of some methodological and socio-technical challenges of the 'digital humanity' emerging in this shift towards trans-disciplinarity, particularly focusing on the topic of 'interpretative flexibility'.
The edited collection, which is interdisciplinary in nature, develops knowledge of how the application of new computational techniques and visualisation technologies in the arts and humanities is resulting in fresh approaches and methodologies for the study of new and traditional corpora. It includes articles from internationally significant scholars such as N. Katherine Hayles and Lev Manovich.
The realisation of this piece has benefited from discussion at the 2009 Media, Communication and Cultural Studies Association (MeCCSA) conference at Bradford, 14–16 January 2009, and the Computational Turn Workshop at Swansea on 9 March 2010, where an earlier version of this paper was presented
Improving Customer Relationship Management through Integrated Mining of Heterogeneous Data
The volume of information available on the
Internet and corporate intranets continues to increase along
with the corresponding increase in the data (structured and
unstructured) stored by many organizations. In customer
relationship management, information is the raw material for
decision making. For this to be effective, there is need to
discover knowledge from the seamless integration of structured
and unstructured data for completeness and comprehensiveness
which is the main focus of this paper.
In the integration process, the structured component is
selected based on the resulting keywords from the unstructured
text preprocessing process, and association rules is generated
based on the modified GARW (Generating Association Rules
Based on Weighting Scheme) Algorithm. The main contribution
of this technique is that the unstructured component of the
integration is based on Information retrieval technique which is
based on content similarity of XML (Extensible Markup
Language) document. This similarity is based on the
combination of syntactic and semantic relevance.
Experiments carried out revealed that the extracted
association rules contain important features which form a
worthy platform for making effective decisions as regards
customer relationship management. The performance of the
integration approach is also compared with a similar approach
which uses just syntactic relevance in its information extraction
process to reveal a significant reduction in the large itemsets
and execution time. This leads to reduction in rules generated to
more interesting ones due to the semantic clustering of XML
documents introduced into the improved integrated mining
technique
Text Mining e-Complaints Data From e-Auction Store With Implications For Internet Marketing Research
This study seeks to analyze the effectiveness of the text mining process. Complaint forums on various consumer report websites will be analyzed using text and data mining software. Data from feedback forums will be compiled and analyzed using a text miner software program. The relationships and patterns among keywords and their associations will be cluster-analyzed to gain a deeper understanding of the data. A case study will also be conducted to assay the effectiveness of text mined. The data of Internet complaint forum, http://www.planetfeeback.com, will be text mined. The decision to use an Internet complaint forum as the case subject was made because of its easy access and reputation as storage medium for large sources of data. The main goal of this study is to gauge the effectiveness of text mining. The complaint forum will be text mined to find relationships. The results will then be analyzed and then interpreted to determine the effectiveness of the text and data mining process.  
A support system for predicting eBay end prices.
In this report a support system for predicting end prices on eBay is
proposed. The end price predictions are based on the item descriptions found in
the item listings of eBay, and on some numerical item features.
The system uses text mining and boosting algorithms from the
field of machine learning.
Our system substantially outperforms the naive method of
predicting the category mean price. Moreover, interpretation of
the model enables us to identify influential terms in the item
descriptions and shows that the item description is more
influential than the seller feedback rating, which was shown to be
influential in earlier studies