Search CORE

9,304 research outputs found

A review of associative classification mining

Author: Thabtah Fadi
Publication venue
Publication date: 01/01/2007
Field of study

Associative classification mining is a promising approach in data mining that utilizes the association rule discovery techniques to construct classification systems, also known as associative classifiers. In the last few years, a number of associative classification algorithms have been proposed, i.e. CPAR, CMAR, MCAR, MMAC and others. These algorithms employ several different rule discovery, rule ranking, rule pruning, rule prediction and rule evaluation methods. This paper focuses on surveying and comparing the state-of-the-art associative classification techniques with regards to the above criteria. Finally, future directions in associative classification, such as incremental learning and mining low-quality data sets, are also highlighted in this paper

CiteSeerX

University of Huddersfield Repository

QCBA: Postoptimization of Quantitative Attributes in Classifiers based on Association Rules

Author: Kliegr Tomas
Publication venue
Publication date: 18/10/2019
Field of study

The need to prediscretize numeric attributes before they can be used in association rule learning is a source of inefficiencies in the resulting classifier. This paper describes several new rule tuning steps aiming to recover information lost in the discretization of numeric (quantitative) attributes, and a new rule pruning strategy, which further reduces the size of the classification models. We demonstrate the effectiveness of the proposed methods on postoptimization of models generated by three state-of-the-art association rule classification algorithms: Classification based on Associations (Liu, 1998), Interpretable Decision Sets (Lakkaraju et al, 2016), and Scalable Bayesian Rule Lists (Yang, 2017). Benchmarks on 22 datasets from the UCI repository show that the postoptimized models are consistently smaller -- typically by about 50% -- and have better classification performance on most datasets

arXiv.org e-Print Archive

Evolving Ensemble Fuzzy Classifier

Author: Lughofer Edwin
Pedrycz Witold
Pratama Mahardhika
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2018
Field of study

The concept of ensemble learning offers a promising avenue in learning from data streams under complex environments because it addresses the bias and variance dilemma better than its single model counterpart and features a reconfigurable structure, which is well suited to the given context. While various extensions of ensemble learning for mining non-stationary data streams can be found in the literature, most of them are crafted under a static base classifier and revisits preceding samples in the sliding window for a retraining step. This feature causes computationally prohibitive complexity and is not flexible enough to cope with rapidly changing environments. Their complexities are often demanding because it involves a large collection of offline classifiers due to the absence of structural complexities reduction mechanisms and lack of an online feature selection mechanism. A novel evolving ensemble classifier, namely Parsimonious Ensemble pENsemble, is proposed in this paper. pENsemble differs from existing architectures in the fact that it is built upon an evolving classifier from data streams, termed Parsimonious Classifier pClass. pENsemble is equipped by an ensemble pruning mechanism, which estimates a localized generalization error of a base classifier. A dynamic online feature selection scenario is integrated into the pENsemble. This method allows for dynamic selection and deselection of input features on the fly. pENsemble adopts a dynamic ensemble structure to output a final classification decision where it features a novel drift detection scenario to grow the ensemble structure. The efficacy of the pENsemble has been numerically demonstrated through rigorous numerical studies with dynamic and evolving data streams where it delivers the most encouraging performance in attaining a tradeoff between accuracy and complexity.Comment: this paper has been published by IEEE Transactions on Fuzzy System

arXiv.org e-Print Archive

DR-NTU (Digital Repository of NTU)

Online Tool Condition Monitoring Based on Parsimonious Ensemble+

Author: Dimla Eric
Lughofer Edwin
Pedrycz Witold
Pratama Mahardhika
Tjahjowidowo Tegoeh
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 07/12/2019
Field of study

Accurate diagnosis of tool wear in metal turning process remains an open challenge for both scientists and industrial practitioners because of inhomogeneities in workpiece material, nonstationary machining settings to suit production requirements, and nonlinear relations between measured variables and tool wear. Common methodologies for tool condition monitoring still rely on batch approaches which cannot cope with a fast sampling rate of metal cutting process. Furthermore they require a retraining process to be completed from scratch when dealing with a new set of machining parameters. This paper presents an online tool condition monitoring approach based on Parsimonious Ensemble+, pENsemble+. The unique feature of pENsemble+ lies in its highly flexible principle where both ensemble structure and base-classifier structure can automatically grow and shrink on the fly based on the characteristics of data streams. Moreover, the online feature selection scenario is integrated to actively sample relevant input attributes. The paper presents advancement of a newly developed ensemble learning algorithm, pENsemble+, where online active learning scenario is incorporated to reduce operator labelling effort. The ensemble merging scenario is proposed which allows reduction of ensemble complexity while retaining its diversity. Experimental studies utilising real-world manufacturing data streams and comparisons with well known algorithms were carried out. Furthermore, the efficacy of pENsemble was examined using benchmark concept drift data streams. It has been found that pENsemble+ incurs low structural complexity and results in a significant reduction of operator labelling effort.Comment: this paper has been published by IEEE Transactions on Cybernetic

arXiv.org e-Print Archive

DR-NTU (Digital Repository of NTU)

Asymmetric Pruning for Learning Cascade Detectors

Author: Hengel Anton van den
Paisitkriangkrai Sakrapee
Shen Chunhua
Publication venue
Publication date: 01/01/2014
Field of study

Cascade classifiers are one of the most important contributions to real-time object detection. Nonetheless, there are many challenging problems arising in training cascade detectors. One common issue is that the node classifier is trained with a symmetric classifier. Having a low misclassification error rate does not guarantee an optimal node learning goal in cascade classifiers, i.e., an extremely high detection rate with a moderate false positive rate. In this work, we present a new approach to train an effective node classifier in a cascade detector. The algorithm is based on two key observations: 1) Redundant weak classifiers can be safely discarded; 2) The final detector should satisfy the asymmetric learning objective of the cascade architecture. To achieve this, we separate the classifier training into two steps: finding a pool of discriminative weak classifiers/features and training the final classifier by pruning weak classifiers which contribute little to the asymmetric learning criterion (asymmetric classifier construction). Our model reduction approach helps accelerate the learning time while achieving the pre-determined learning objective. Experimental results on both face and car data sets verify the effectiveness of the proposed algorithm. On the FDDB face data sets, our approach achieves the state-of-the-art performance, which demonstrates the advantage of our approach.Comment: 14 page

arXiv.org e-Print Archive

Adelaide Research & Scholarship

Fame for sale: efficient detection of fake Twitter followers

Author: Cresci Stefano
Di Pietro Roberto
Petrocchi Marinella
Spognardi Angelo
Tesconi Maurizio
Publication venue: 'Elsevier BV'
Publication date: 01/01/2015
Field of study

\textit{Fake followers}

are those Twitter accounts specifically created to inflate the number of followers of a target account. Fake followers are dangerous for the social platform and beyond, since they may alter concepts like popularity and influence in the Twittersphere - hence impacting on economy, politics, and society. In this paper, we contribute along different dimensions. First, we review some of the most relevant existing features and rules (proposed by Academia and Media) for anomalous Twitter accounts detection. Second, we create a baseline dataset of verified human and fake follower accounts. Such baseline dataset is publicly available to the scientific community. Then, we exploit the baseline dataset to train a set of machine-learning classifiers built over the reviewed rules and features. Our results show that most of the rules proposed by Media provide unsatisfactory performance in revealing fake followers, while features proposed in the past by Academia for spam detection provide good results. Building on the most promising features, we revise the classifiers both in terms of reduction of overfitting and cost for gathering the data needed to compute the features. The final result is a novel

\textit{Class A}

classifier, general enough to thwart overfitting, lightweight thanks to the usage of the less costly features, and still able to correctly classify more than 95% of the accounts of the original training set. We ultimately perform an information fusion-based sensitivity analysis, to assess the global sensitivity of each of the features employed by the classifier. The findings reported in this paper, other than being supported by a thorough experimental methodology and interesting on their own, also pave the way for further investigation on the novel issue of fake Twitter followers

arXiv.org e-Print Archive

PUblication MAnagement

Archivio della ricerca- Università di Roma La Sapienza

Online Research Database In Technology

Archivio istituzionale della ricerca - Università di Padova

Preceding rule induction with instance reduction methods

Author: A. Lukasz
D. Gamberger
D.L. Wilson
D.R. Wilsson
D.R. Wilsson
D.T. Pham
D.W. Aha
G.L. Ritter
G.W. Gates
I. Tomek
J. Fürnkranz
K. Grudzinski
K. Grudziński
K. Hindi El
K.P. Zhao
O. Othman
P. Clark
P. Clark
P.E. Hart
R. Kohavi
R. Schapire
S. Weiss
T.M. Mitchell
W. Cohen
W. Cohen
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2013
Field of study

A new prepruning technique for rule induction is presented which applies instance reduction before rule induction. An empirical evaluation records the predictive accuracy and size of rule-sets generated from 24 datasets from the UCI Machine Learning Repository. Three instance reduction algorithms (Edited Nearest Neighbour, AllKnn and DROP5) are compared. Each one is used to reduce the size of the training set, prior to inducing a set of rules using Clark and Boswell's modification of CN2. A hybrid instance reduction algorithm (comprised of AllKnn and DROP5) is also tested. For most of the datasets, pruning the training set using ENN, AllKnn or the hybrid significantly reduces the number of rules generated by CN2, without adversely affecting the predictive performance. The hybrid achieves the highest average predictive accuracy

CiteSeerX

University of Salford Institutional Repository

Crossref