Search CORE

661 research outputs found

QCBA: Postoptimization of Quantitative Attributes in Classifiers based on Association Rules

Author: Kliegr Tomas
Publication venue
Publication date: 18/10/2019
Field of study

The need to prediscretize numeric attributes before they can be used in association rule learning is a source of inefficiencies in the resulting classifier. This paper describes several new rule tuning steps aiming to recover information lost in the discretization of numeric (quantitative) attributes, and a new rule pruning strategy, which further reduces the size of the classification models. We demonstrate the effectiveness of the proposed methods on postoptimization of models generated by three state-of-the-art association rule classification algorithms: Classification based on Associations (Liu, 1998), Interpretable Decision Sets (Lakkaraju et al, 2016), and Scalable Bayesian Rule Lists (Yang, 2017). Benchmarks on 22 datasets from the UCI repository show that the postoptimized models are consistently smaller -- typically by about 50% -- and have better classification performance on most datasets

arXiv.org e-Print Archive

I-prune: Item selection for associative classification

Author: Baralis
Coenen
Coenen
Guyon
Hall
Li
Quinlan
Rak
Tan
Wang
Wang
Zaïane
Publication venue: John Wiley & Sons, Inc.
Publication date: 01/01/2012
Field of study

Associative classification is characterized by accurate models and high model generation time. Most time is spent in extracting and postprocessing a large set of irrelevant rules, which are eventually pruned.We propose I-prune, an item-pruning approach that selects uninteresting items by means of an interestingness measure and prunes them as soon as they are detected. Thus, the number of extracted rules is reduced and model generation time decreases correspondingly. A wide set of experiments on real and synthetic data sets has been performed to evaluate I-prune and select the appropriate interestingness measure. The experimental results show that I-prune allows a significant reduction in model generation time, while increasing (or at worst preserving) model accuracy. Experimental evaluation also points to the chi-square measure as the most effective interestingness measure for item pruning

Crossref

Archivio istituzionale della ricerca - Politecnico di Milano

PORTO@iris (Publications Open Repository TOrino - Politecnico di Torino)

PORTO Publications Open Repository TOrino

Classifier PGN: Classification with High Confidence Rules

Author: Depaire Benoit
Ivanova Krassimira
Mitov Iliya
Vanhoof Koen
Publication venue: Institute of Mathematics and Informatics Bulgarian Academy of Sciences
Publication date: 01/01/2013
Field of study

ACM Computing Classification System (1998): H.2.8, H.3.3.Associative classifiers use a set of class association rules, generated from a given training set, to classify new instances. Typically, these techniques set a minimal support to make a first selection of appropriate rules and discriminate subsequently between high and low quality rules by means of a quality measure such as confidence. As a result, the final set of class association rules have a support equal or greater than a predefined threshold, but many of them have confidence levels below 100%. PGN is a novel associative classifier which turns the traditional approach around and uses a confidence level of 100% as a first selection criterion, prior to maximizing the support. This article introduces PGN and evaluates the strength and limitations of PGN empirically. The results are promising and show that PGN is competitive with other well-known classifiers

Bulgarian Digital Mathematics Library at IMI-BAS

Scaling associative classification for very large datasets

Author: Baralis Elena
Garza Paolo
Venturini Luca
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2017
Field of study

Supervised learning algorithms are nowadays successfully scaling up to datasets that are very large in volume, leveraging the potential of in-memory cluster-computing Big Data frameworks. Still, massive datasets with a number of large-domain categorical features are a difficult challenge for any classifier. Most off-the-shelf solutions cannot cope with this problem. In this work we introduce DAC, a Distributed Associative Classifier. DAC exploits ensemble learning to distribute the training of an associative classifier among parallel workers and improve the final quality of the model. Furthermore, it adopts several novel techniques to reach high scalability without sacrificing quality, among which a preventive pruning of classification rules in the extraction phase based on Gini impurity. We ran experiments on Apache Spark, on a real large-scale dataset with more than 4 billion records and 800 million distinct categories. The results showed that DAC improves on a state-of-the-art solution in both prediction quality and execution time. Since the generated model is human-readable, it can not only classify new records, but also allow understanding both the logic behind the prediction and the properties of the model, becoming a useful aid for decision makers

arXiv.org e-Print Archive

Directory of Open Access Journals

PORTO@iris (Publications Open Repository TOrino - Politecnico di Torino)

NEUROSURGERY ENTHUSIASTIC WOMEN SOCIETY

PORTO Publications Open Repository TOrino

MPGN – An Approach for Discovering Class Association Rules

Author: Mitov Iliya
Publication venue: Institute of Mathematics and Informatics Bulgarian Academy of Sciences
Publication date: 01/01/2011
Field of study

his article presents some of the results of the Ph.D. thesis Class Association Rule Mining Using MultiDimensional Numbered Information Spaces by Iliya Mitov (Institute of Mathematics and Informatics, BAS), successfully defended at Hasselt University, Faculty of Science on 15 November 2011 in BelgiumThe article briefly presents some results achieved within the PhD project R1876Intelligent Systems’ Memory Structuring Using Multidimensional Numbered Information Spaces, successfully defended at Hasselt University. The main goal of this article is to show the possibilities of using multidimensional numbered information spaces in data mining processes on the example of the implementation of one associative classifier, called MPGN (Multilayer Pyramidal Growing Networks)

Bulgarian Digital Mathematics Library at IMI-BAS

LODE: A distance-based classifier built on ensembles of positive and negative observations

Author: Bakar D.
Ienco Dino
Meo Rosa
Publication venue: 'Elsevier BV'
Publication date: 01/01/2012
Field of study

Institutional Research Information System University of Turin

A MapReduce solution for associative classification of big data

Author: BECHINI ALESSIO
MARCELLONI FRANCESCO
SEGATORI ARMANDO
Publication venue: 'Elsevier BV'
Publication date: 01/01/2016
Field of study

Associative classifiers have proven to be very effective in classification problems. Unfortunately, the algorithms used for learning these classifiers are not able to adequately manage big data because of time complexity and memory constraints. To overcome such drawbacks, we propose a distributed association rule-based classification scheme shaped according to the MapReduce programming model. The scheme mines classification association rules (CARs) using a properly enhanced, distributed version of the well-known FP-Growth algorithm. Once CARs have been mined, the proposed scheme performs a distributed rule pruning. The set of survived CARs is used to classify unlabeled patterns. The memory usage and time complexity for each phase of the learning process are discussed, and the scheme is evaluated on seven real-world big datasets on the Hadoop framework, characterizing its scalability and achievable speedup on small computer clusters. The proposed solution for associative classifiers turns to be suitable to practically address big datasets even with modest hardware support. Comparisons with two state-of-the-art distributed learning algorithms are also discussed in terms of accuracy, model complexity, and computation time

Archivio della Ricerca - Università di Pisa