Search CORE

51 research outputs found

Knowledge Discovery in Online Repositories: A Text Mining Approach

Author: Afolabi I. T.
Ayo C. K.
Musa G. A.
Sofoluwe A. B.
Publication venue: EuroJournals Publishing
Publication date: 01/01/2008
Field of study

Before the advent of the Internet, the newspapers were the prominent instrument of mobilization for independence and political struggles. Since independence in Nigeria, the political class has adopted newspapers as a medium of Political Competition and Communication. Consequently, most political information exists in unstructured form and hence the need to tap into it using text mining algorithm. This paper implements a text mining algorithm on some unstructured data format in some newspapers. The algorithm involves the following natural language processing techniques: tokenization, text filtering and refinement. As a follow-up to the natural language techniques, association rule mining technique of data mining is used to extract knowledge using the Modified Generating Association Rules based on Weighting scheme (GARW). The main contributions of the technique are that it integrates information retrieval scheme (Term Frequency Inverse Document Frequency) (for keyword/feature selection that automatically selects the most discriminative keywords for use in association rules generation) with Data Mining technique for association rules discovery. The program is applied to Pre-Election information gotten from the website of the Nigerian Guardian newspaper. The extracted association rules contained important features and described the informative news included in the documents collection when related to the concluded 2007 presidential election. The system presented useful information that could help sanitize the polity as well as protect the nascent democracy

Covenant University Repository

Evolving rules for document classification

Author: A. Bergström
C. Apté
C.M. Tan
D. Montana
D.R. Tauritz
F. Sebastiani
G. Salton
H. Lodhi
J.R. Koza
K. Bennet
M. Damashek
T. Joachims
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2005
Field of study

We describe a novel method for using Genetic Programming to create compact classification rules based on combinations of N-Grams (character strings). Genetic programs acquire fitness by producing rules that are effective classifiers in terms of precision and recall when evaluated against a set of training documents. We describe a set of functions and terminals and provide results from a classification task using the Reuters 21578 dataset. We also suggest that because the induced rules are meaningful to a human analyst they may have a number of other uses beyond classification and provide a basis for text mining applications

CiteSeerX

Crossref

Sheffield Hallam University Research Archive

UCL Discovery

Evolving text classification rules with genetic programming

Author: Anthony N.
Ebert D.
Hirsch L.
Joachims T.
Karanikas H.
Koza J. R.
Koza J. R.
Langdon W.B.
Laurence Hirsch
Lodhi H.
Masoud Saeedi
Montana D
Robin Hirsch
Salton G.
Van Rijsbergen C. J.
Publication venue: 'Informa UK Limited'
Publication date: 07/09/2005
Field of study

We describe a novel method for using genetic programming to create compact classification rules using combinations of N-grams (character strings). Genetic programs acquire fitness by producing rules that are effective classifiers in terms of precision and recall when evaluated against a set of training documents. We describe a set of functions and terminals and provide results from a classification task using the Reuters 21578 dataset. We also suggest that the rules may have a number of other uses beyond classification and provide a basis for text mining applications

Crossref

Sheffield Hallam University Research Archive

Knowledge discovery out of text data: a systematic review via text mining

Author: Pironti Marco
Publication venue
Publication date: 01/01/2018
Field of study

Institutional Research Information System University of Turin

Natural Language Processing (NLP) – A Solution for Knowledge Extraction from Patent Unstructured Data

Author: Cavallucci Denis
Rousselot François
Souili Achille
Publication venue: The Authors. Published by Elsevier Ltd.
Publication date: 31/12/2015
Field of study

AbstractPatents are valuable source of knowledge and are extremely important for assisting engineers and decisions makers through the inventive process. This paper describes a new approach of automatic extraction of IDM (Inventive Design Method) related knowledge from patent documents. IDM derives from TRIZ, the theory of Inventive problem solving, which is largely based on patent's observation to theorize the act of inventing. Our method mainly consists in using natural language techniques (NLP) to match and extract knowledge relevant to IDM Ontology. The purpose of this paper is to investigate on the contribution of NLP techniques to effective knowledge extraction from patent documents. We propose in this paper to firstly report on progress made so far in data mining before describing our approach

Elsevier - Publisher Connector

A Literature Survey on Web Content Mining

Author: V. David Martin, Dr. T. N. Ravi
Publication venue: 'Auricle Technologies, Pvt., Ltd.'
Publication date: 31/10/2016
Field of study

Web is an accumulation of inter related documents on one or more web servers while web mining implies extricating important data from web databases. Web mining is one of the data mining spaces where data mining methods are utilized for extricating data from the web servers. The web information incorporates site pages, web links, questions on the web and web logs. Web mining is utilized to comprehend the client behavior, assess a specific site in view of the data which is stored in web log documents. Web mining is assessed by utilizing data mining strategies, specifically Association Rules, Classification and Clustering. It has some helpful regions or applications, for example, Electronic trade, E-learning, E-government, E-arrangements, E-majority rules system, Electronic business, security, crime examination and computerized library. Recovering the required web page from the web productively and adequately becomes a challenging task since web is comprised of unstructured information, which conveys the substantial measure of data and increment the unpredictability of managing data from various web service providers. The accumulation of data turns out to be elusive, extract, channel or assess the significant data for the clients. In this paper, we have considered the essential ideas of web mining, classification, procedures and issues. Notwithstanding this, this paper likewise broke down the web mining research challenges

International Journal on Recent and Innovation Trends in Computing and Communication

Automatic pattern-taxonomy extraction for web mining

Author: Chen Yi-Ping Phoebe
Li Yuefeng
Pham Binh
Wu Sheng-Tang
Xu Yue
Publication venue: IEEE Xplore
Publication date: 01/01/2004
Field of study

In this paper, we propose a model for discovering frequent sequential patterns, phrases, which can be used as profile descriptors of documents. It is indubitable that we can obtain numerous phrases using data mining algorithms. However, it is difficult to use these phrases effectively for answering what users want. Therefore, we present a pattern taxonomy extraction model which performs the task of extracting descriptive frequent sequential patterns by pruning the meaningless ones. The model then is extended and tested by applying it to the information filtering system. The results of the experiment show that pattern-based methods outperform the keyword-based methods. The results also indicate that removal of meaningless patterns not only reduces the cost of computation but also improves the effectiveness of the system. <br /

Deakin Research Online