Search CORE

9 research outputs found

Sensing the Web for Induction of Association Rules and their Composition through Ensemble Techniques

Author: Agnese Augello
Filippo Vella
Giovanni Pilato
Ignazio Infantino
Publication venue
Publication date: 01/01/2020
Field of study

Abstract Starting from geophysical data collected from heterogeneous sources, such as meteorological stations and information gathered from the web, we seek unknown connections between the sampled values through the extraction of association rules. These rules imply the co-occurrence of two or more symbols in the same representation, and the rule confidence may vary according to the collected data. We propose, starting from traditional algorithms such as FP-Growth and Apriori, the creation of complex association rules through boosting of simpler ones. The composition enables the creation of rules that are robust and let emerge a larger number of interesting rules

Open Access Repository

Partial rule match for filtering rules in associative classification

Author: Refai Mohamed Hayel
Yusof Yuhanis
Publication venue: 'Science Publications'
Publication date: 01/01/2014
Field of study

In this study, we propose a new method to enhance the accuracy of Modified Multi-class Classification based on Association Rule (MMCAR) classifier.We introduce a Partial Rule Match Filtering (PRMF) method that allows a minimal match of the items in the rule's body in order for the rule to be added into a classifier. Experiments on Reuters-21578 data sets are performed in order to evaluate the effectiveness of PRMF in MMCAR. Results show that the MMCAR classifier performs better as compared to the chosen competitors

UUM Repository

CiteSeerX

A modified multi-class association rule for text mining

Author: Al-Refai Mohammad Hayel Abdel Karim
Publication venue
Publication date: 01/01/2015
Field of study

Classification and association rule mining are significant tasks in data mining. Integrating association rule discovery and classification in data mining brings us an approach known as the associative classification. One common shortcoming of existing Association Classifiers is the huge number of rules produced in order to obtain high classification accuracy. This study proposes s a Modified Multi-class Association Rule Mining (mMCAR) that consists of three procedures; rule discovery, rule pruning and group-based class assignment. The rule discovery and rule pruning procedures are designed to reduce the number of classification rules. On the other hand, the group-based class assignment procedure contributes in improving the classification accuracy. Experiments on the structured and unstructured text datasets obtained from the UCI and Reuters repositories are performed in order to evaluate the proposed Association Classifier. The proposed mMCAR classifier is benchmarked against the traditional classifiers and existing Association Classifiers. Experimental results indicate that the proposed Association Classifier, mMCAR, produced high accuracy with a smaller number of classification rules. For the structured dataset, the mMCAR produces an average of 84.24% accuracy as compared to MCAR that obtains 84.23%. Even though the classification accuracy difference is small, the proposed mMCAR uses only 50 rules for the classification while its benchmark method involves 60 rules. On the other hand, mMCAR is at par with MCAR when unstructured dataset is utilized. Both classifiers produce 89% accuracy but mMCAR uses less number of rules for the classification. This study contributes to the text mining domain as automatic classification of huge and widely distributed textual data could facilitate the text representation and retrieval processes

Universiti Utara Malaysia: UUM eTheses

Recommended from our members

MapReduce network enabled algorithms for classification based on association rules

Author: Hammoud Suhel
Publication venue: Brunel University School of Engineering and Design PhD Theses
Publication date: 01/01/2011
Field of study

This thesis was submitted for the degree of Doctor of Philosophy and awarded by Brunel University.There is growing evidence that integrating classification and association rule mining can produce more efficient and accurate classifiers than traditional techniques. This thesis introduces a new MapReduce based association rule miner for extracting strong rules from large datasets. This miner is used later to develop a new large scale classifier. Also new MapReduce simulator was developed to evaluate the scalability of proposed algorithms on MapReduce clusters. The developed associative rule miner inherits the MapReduce scalability to huge datasets and to thousands of processing nodes. For finding frequent itemsets, it uses hybrid approach between miners that uses counting methods on horizontal datasets, and miners that use set intersections on datasets of vertical formats. The new miner generates same rules that usually generated using apriori-like algorithms because it uses the same confidence and support thresholds definitions. In the last few years, a number of associative classification algorithms have been proposed, i.e. CPAR, CMAR, MCAR, MMAC and others. This thesis also introduces a new MapReduce classifier that based MapReduce associative rule mining. This algorithm employs different approaches in rule discovery, rule ranking, rule pruning, rule prediction and rule evaluation methods. The new classifier works on multi-class datasets and is able to produce multi-label predications with probabilities for each predicted label. To evaluate the classifier 20 different datasets from the UCI data collection were used. Results show that the proposed approach is an accurate and effective classification technique, highly competitive and scalable if compared with other traditional and associative classification approaches. Also a MapReduce simulator was developed to measure the scalability of MapReduce based applications easily and quickly, and to captures the behaviour of algorithms on cluster environments. This also allows optimizing the configurations of MapReduce clusters to get better execution times and hardware utilization

Brunel University Research Archive

LC an effective classification based association rule mining algorithm

Author: Mahmood Qazafi
Publication venue
Publication date
Field of study

Classification using association rules is a research field in data mining that primarily uses association rule discovery techniques in classification benchmarks. It has been confirmed by many research studies in the literature that classification using association tends to generate more predictive classification systems than traditional classification data mining techniques like probabilistic, statistical and decision tree. In this thesis, we introduce a novel data mining algorithm based on classification using association called “Looking at the Class” (LC), which can be used in for mining a range of classification data sets. Unlike known algorithms in classification using the association approach such as Classification based on Association rule (CBA) system and Classification based on Predictive Association (CPAR) system, which merge disjoint items in the rule learning step without anticipating the class label similarity, the proposed algorithm merges only items with identical class labels. This saves too many unnecessary items combining during the rule learning step, and consequently results in large saving in computational time and memory. Furthermore, the LC algorithm uses a novel prediction procedure that employs multiple rules to make the prediction decision instead of a single rule. The proposed algorithm has been evaluated thoroughly on real world security data sets collected using an automated tool developed at Huddersfield University. The security application which we have considered in this thesis is about categorizing websites based on their features to legitimate or fake which is a typical binary classification problem. Also, experimental results on a number of UCI data sets have been conducted and the measures used for evaluation is the classification accuracy, memory usage, and others. The results show that LC algorithm outperformed traditional classification algorithms such as C4.5, PART and Naïve Bayes as well as known classification based association algorithms like CBA with respect to classification accuracy, memory usage, and execution time on most data sets we consider

University of Huddersfield Repository

MapReduce network enabled algorithms for classification based on association rules

Author: Hammoud Suhel
Li M
Publication venue
Publication date: 01/01/2011
Field of study

There is growing evidence that integrating classification and association rule mining can produce more efficient and accurate classifiers than traditional techniques. This thesis introduces a new MapReduce based association rule miner for extracting strong rules from large datasets. This miner is used later to develop a new large scale classifier. Also new MapReduce simulator was developed to evaluate the scalability of proposed algorithms on MapReduce clusters. The developed associative rule miner inherits the MapReduce scalability to huge datasets and to thousands of processing nodes. For finding frequent itemsets, it uses hybrid approach between miners that uses counting methods on horizontal datasets, and miners that use set intersections on datasets of vertical formats. The new miner generates same rules that usually generated using apriori-like algorithms because it uses the same confidence and support thresholds definitions. In the last few years, a number of associative classification algorithms have been proposed, i.e. CPAR, CMAR, MCAR, MMAC and others. This thesis also introduces a new MapReduce classifier that based MapReduce associative rule mining. This algorithm employs different approaches in rule discovery, rule ranking, rule pruning, rule prediction and rule evaluation methods. The new classifier works on multi-class datasets and is able to produce multi-label predications with probabilities for each predicted label. To evaluate the classifier 20 different datasets from the UCI data collection were used. Results show that the proposed approach is an accurate and effective classification technique, highly competitive and scalable if compared with other traditional and associative classification approaches. Also a MapReduce simulator was developed to measure the scalability of MapReduce based applications easily and quickly, and to captures the behaviour of algorithms on cluster environments. This also allows optimizing the configurations of MapReduce clusters to get better execution times and hardware utilization.EThOS - Electronic Theses Online ServiceGBUnited Kingdo

OpenGrey Repository

Pertanika Journal of Science & Technology

Author: Universiti Putra Malaysia Press
Publication venue: Universiti Putra Malaysia Press
Publication date: 01/01/2013
Field of study

Universiti Putra Malaysia Institutional Repository

Large-scale Text Categorization Based on Boosting Association Rules

Author: 윤용욱
Publication venue: 포항공과대학교
Publication date
Field of study

Doctor연관규칙을 이용한 분류에서, 많은 단어를 가진 규칙은 단어패턴과 범주 사이의연관을 보다 정확하게 표현할수 있다. 그러나, 생성되는 다단어 규칙의 수가지수적으로 증가하기 때문에 다단어 규칙을 추출하는 작업은 매우 많은 시간을필요로 한다. 따라서, 과거 연구에서 대부분의 연관 분류기는 규칙에 들어가는단어수를 줄임으로써 생성되는 규칙의 수를 줄여 왔고, 테스트 문서들을 분류하는최종 분류기에 보다 적은 수의 고신뢰도 규칙들을 포함시켜 왔다.우리는 분류기의 학습에 있어서 다른 접근방법을 제안하는데, 적어도 임의선택보다는 정확도가 뛰어난 저신뢰도 규칙들을 될수 있으면 많이 포함시키는것이다. 우리가 새로이 고안한 성능부양 알고리듬을 적용하여, 많은 수의 초기생성된 규칙들로부터 적은 수의 규칙들을 선별하여서 최종 분류기를 구성한다.그렇게 생성된 최종 분류기는 학습 에러및 일반화 에러에 있어 매우 향상된 성능을보여준다.분류기의 성능을 극대화하기 위해, 우리의 방법은 최소 지지도와 최소 신뢰도 문턱을낮춤으로써 아주 많은 수의 연관 규칙을 캐내는데, 이는 시험 문서들에 대한처리도를 향상시킨다. 우리는 또한 규칙 추출과 성능부양 과정에 있어서 계산효율성을 증진시키는 두개의 새로운 알고리듬을 제안한다. 잘 알려진 성능평가용데이타와 대용량 문서집합을 가지고 철저한 실험을 통해, 우리의 방법론은 계산효율성뿐 아니라 분류 정확도에 있어서 뛰어난 성과를 나타낸다.In the associative classification, high-order rules can represent more exactly the association between pattern and class label. But mining high-order rules is very time consuming because the number of generated rules grows exponentially. Thus, most of associative classifiers in the previous studies have decreased the number of generated rules by reducing the number of features or lowering the order of rules, and have selected a small number of high-confidence rules for the final hypotheses for test instances. We propose an alternative approach in which the training of the classifier starts with as many rules as possible including those which have low confidence values but are better than random guessing. Controlled by our new boosting algorithm, a smaller number of rules are selected from that large number of generated rules and make up a final classifier. The resulting final classifier shows a greatly improved performance for both the training error and the generalization error. To maximize the classifier performance, our approach mines a huge number of association rules by lowering the minimum support and the minimum confidence thresholds, which helps to raise the coverage for test instances. We also propose two new algorithms to enhance the computational efficiency during the processes of rule generation and boosting. By conducting thorough experiments using well-known benchmark databases and large-scale text corpora, our method achieves outstanding classification accuracy and computational efficiency as well

포항공과대학교