661 research outputs found
QCBA: Postoptimization of Quantitative Attributes in Classifiers based on Association Rules
The need to prediscretize numeric attributes before they can be used in
association rule learning is a source of inefficiencies in the resulting
classifier. This paper describes several new rule tuning steps aiming to
recover information lost in the discretization of numeric (quantitative)
attributes, and a new rule pruning strategy, which further reduces the size of
the classification models. We demonstrate the effectiveness of the proposed
methods on postoptimization of models generated by three state-of-the-art
association rule classification algorithms: Classification based on
Associations (Liu, 1998), Interpretable Decision Sets (Lakkaraju et al, 2016),
and Scalable Bayesian Rule Lists (Yang, 2017). Benchmarks on 22 datasets from
the UCI repository show that the postoptimized models are consistently smaller
-- typically by about 50% -- and have better classification performance on most
datasets
I-prune: Item selection for associative classification
Associative classification is characterized by accurate models and high model generation time. Most time is spent in extracting and postprocessing a large set of irrelevant rules, which are eventually pruned.We propose I-prune, an item-pruning approach that selects uninteresting items by means of an interestingness measure and prunes them as soon as they are detected. Thus, the number of extracted rules is reduced and model generation time decreases correspondingly. A wide set of experiments on real and synthetic data sets has been performed to evaluate I-prune and select the appropriate interestingness measure. The experimental results show that I-prune allows a significant reduction in model generation time, while increasing (or at worst preserving) model accuracy. Experimental evaluation also points to the chi-square measure as the most effective interestingness measure for item pruning
Classifier PGN: Classification with High Confidence Rules
ACM Computing Classification System (1998): H.2.8, H.3.3.Associative classifiers use a set of class association rules, generated from a given training set, to classify new instances. Typically, these techniques set a minimal support to make a first selection of appropriate rules and discriminate subsequently between high and low quality rules by means of a quality measure such as confidence. As a result, the final set of class association rules have a support equal or greater than a predefined threshold, but many of them have confidence levels below 100%. PGN is a novel associative classifier which turns the traditional approach around and uses a confidence level of 100% as a first selection criterion, prior to maximizing the support. This article introduces PGN and evaluates the strength and limitations of PGN empirically. The results are promising and show that PGN is competitive with other well-known classifiers
Scaling associative classification for very large datasets
Supervised learning algorithms are nowadays successfully scaling up to
datasets that are very large in volume, leveraging the potential of in-memory
cluster-computing Big Data frameworks. Still, massive datasets with a number of
large-domain categorical features are a difficult challenge for any classifier.
Most off-the-shelf solutions cannot cope with this problem. In this work we
introduce DAC, a Distributed Associative Classifier. DAC exploits ensemble
learning to distribute the training of an associative classifier among parallel
workers and improve the final quality of the model. Furthermore, it adopts
several novel techniques to reach high scalability without sacrificing quality,
among which a preventive pruning of classification rules in the extraction
phase based on Gini impurity. We ran experiments on Apache Spark, on a real
large-scale dataset with more than 4 billion records and 800 million distinct
categories. The results showed that DAC improves on a state-of-the-art solution
in both prediction quality and execution time. Since the generated model is
human-readable, it can not only classify new records, but also allow
understanding both the logic behind the prediction and the properties of the
model, becoming a useful aid for decision makers
MPGN – An Approach for Discovering Class Association Rules
his article presents some of the results of the Ph.D. thesis Class Association Rule Mining
Using MultiDimensional Numbered Information Spaces by Iliya Mitov (Institute of Mathematics
and Informatics, BAS), successfully defended at Hasselt University, Faculty of Science on 15
November 2011 in BelgiumThe article briefly presents some results achieved within the PhD project R1876Intelligent Systems’ Memory Structuring Using Multidimensional Numbered Information Spaces, successfully defended at Hasselt University. The main goal of this article is to show the possibilities of using multidimensional numbered information spaces in data mining processes on the example of the implementation of one associative classifier, called MPGN (Multilayer Pyramidal Growing Networks)
A MapReduce solution for associative classification of big data
Associative classifiers have proven to be very effective in classification problems. Unfortunately, the algorithms used for learning these classifiers are not able to adequately manage big data because of time complexity and memory constraints. To overcome such drawbacks, we propose a distributed association rule-based classification scheme shaped according to the MapReduce programming model. The scheme mines classification association rules (CARs) using a properly enhanced, distributed version of the well-known FP-Growth algorithm. Once CARs have been mined, the proposed scheme performs a distributed rule pruning. The set of survived CARs is used to classify unlabeled patterns. The memory usage and time complexity for each phase of the learning process are discussed, and the scheme is evaluated on seven real-world big datasets on the Hadoop framework, characterizing its scalability and achievable speedup on small computer clusters. The proposed solution for associative classifiers turns to be suitable to practically address big datasets even with modest hardware support. Comparisons with two state-of-the-art distributed learning algorithms are also discussed in terms of accuracy, model complexity, and computation time
- …