55,707 research outputs found
Improving the Interpretability of Classification Rules Discovered by an Ant Colony Algorithm: Extended Results
The vast majority of Ant Colony Optimization (ACO) algorithms for inducing classification rules use an ACO-based procedure to create a rule in an one-at-a-time fashion. An improved search strategy has been proposed in the cAnt-MinerPB algorithm, where an ACO-based procedure is used to create a complete list of rules (ordered rules)-i.e., the ACO search is guided by the quality of a list of rules, instead of an individual rule. In this paper we propose an extension of the cAnt-MinerPB algorithm to discover a set of rules (unordered rules). The main motivations for this work are to improve the interpretation of individual rules by discovering a set of rules and to evaluate the impact on the predictive accuracy of the algorithm. We also propose a new measure to evaluate the interpretability of the discovered rules to mitigate the fact that the commonly-used model size measure ignores how the rules are used to make a class prediction. Comparisons with state-of-the-art rule induction algorithms, support vector machines and the cAnt-MinerPB producing ordered rules are also presented
Recommended from our members
Improving music genre classification using automatically induced harmony rules
We present a new genre classification framework using both low-level signal-based features and high-level harmony features. A state-of-the-art statistical genre classifier based on timbral features is extended using a first-order random forest containing for each genre rules derived from harmony or chord sequences. This random forest has been automatically induced, using the first-order logic induction algorithm TILDE, from a dataset, in which for each chord the degree and chord category are identified, and covering classical, jazz and pop genre classes. The audio descriptor-based genre classifier contains 206 features, covering spectral, temporal, energy, and pitch characteristics of the audio signal. The fusion of the harmony-based classifier with the extracted feature vectors is tested on three-genre subsets of the GTZAN and ISMIR04 datasets, which contain 300 and 448 recordings, respectively. Machine learning classifiers were tested using 5 Ă— 5-fold cross-validation and feature selection. Results indicate that the proposed harmony-based rules combined with the timbral descriptor-based genre classification system lead to improved genre classification rates
Recommended from our members
Improving music genre classification using automatically induced harmony rules
We present a new genre classification framework using both low-level signal-based features and high-level harmony features. A state-of-the-art statistical genre classifier based on timbral features is extended using a first-order random forest containing for each genre rules derived from harmony or chord sequences. This random forest has been automatically induced, using the first-order logic induction algorithm TILDE, from a dataset, in which for each chord the degree and chord category are identified, and covering classical, jazz and pop genre classes. The audio descriptor-based genre classifier contains 206 features, covering spectral, temporal, energy, and pitch characteristics of the audio signal. The fusion of the harmony-based classifier with the extracted feature vectors is tested on three-genre subsets of the GTZAN and ISMIR04 datasets, which contain 300 and 448 recordings, respectively. Machine learning classifiers were tested using 5 Ă— 5-fold cross-validation and feature selection. Results indicate that the proposed harmony-based rules combined with the timbral descriptor-based genre classification system lead to improved genre classification rates
QCBA: Postoptimization of Quantitative Attributes in Classifiers based on Association Rules
The need to prediscretize numeric attributes before they can be used in
association rule learning is a source of inefficiencies in the resulting
classifier. This paper describes several new rule tuning steps aiming to
recover information lost in the discretization of numeric (quantitative)
attributes, and a new rule pruning strategy, which further reduces the size of
the classification models. We demonstrate the effectiveness of the proposed
methods on postoptimization of models generated by three state-of-the-art
association rule classification algorithms: Classification based on
Associations (Liu, 1998), Interpretable Decision Sets (Lakkaraju et al, 2016),
and Scalable Bayesian Rule Lists (Yang, 2017). Benchmarks on 22 datasets from
the UCI repository show that the postoptimized models are consistently smaller
-- typically by about 50% -- and have better classification performance on most
datasets
Highly Relevant Routing Recommendation Systems for Handling Few Data Using MDL Principle and Embedded Relevance Boosting Factors
A route recommendation system can provide better recommendation if it also
takes collected user reviews into account, e.g. places that generally get
positive reviews may be preferred. However, to classify sentiment, many
classification algorithms existing today suffer in handling small data items
such as short written reviews. In this paper we propose a model for a strongly
relevant route recommendation system that is based on an MDL-based (Minimum
Description Length) sentiment classification and show that such a system is
capable of handling small data items (short user reviews). Another highlight of
the model is the inclusion of a set of boosting factors in the relevance
calculation to improve the relevance in any recommendation system that
implements the model.Comment: ACM SIGIR 2018 Workshop on Learning from Limited or Noisy Data for
Information Retrieval (LND4IR'18), July 12, 2018, Ann Arbor, Michigan, USA, 8
pages, 9 figure
GENESIM : genetic extraction of a single, interpretable model
Models obtained by decision tree induction techniques excel in being
interpretable.However, they can be prone to overfitting, which results in a low
predictive performance. Ensemble techniques are able to achieve a higher
accuracy. However, this comes at a cost of losing interpretability of the
resulting model. This makes ensemble techniques impractical in applications
where decision support, instead of decision making, is crucial.
To bridge this gap, we present the GENESIM algorithm that transforms an
ensemble of decision trees to a single decision tree with an enhanced
predictive performance by using a genetic algorithm. We compared GENESIM to
prevalent decision tree induction and ensemble techniques using twelve publicly
available data sets. The results show that GENESIM achieves a better predictive
performance on most of these data sets than decision tree induction techniques
and a predictive performance in the same order of magnitude as the ensemble
techniques. Moreover, the resulting model of GENESIM has a very low complexity,
making it very interpretable, in contrast to ensemble techniques.Comment: Presented at NIPS 2016 Workshop on Interpretable Machine Learning in
Complex System
An Experimental Evaluation of Nearest Neighbour Time Series Classification
Data mining research into time series classification (TSC) has focussed on alternative distance measures for nearest neighbour classifiers. It is standard practice to use 1-NN with Euclidean or dynamic time warping (DTW) distance as a straw man for comparison. As part of a wider investigation into elastic distance measures for TSC~\cite{lines14elastic}, we perform a series of experiments to test whether this standard practice is valid. Specifically, we compare 1-NN classifiers with Euclidean and DTW distance to standard classifiers, examine whether the performance of 1-NN Euclidean approaches that of 1-NN DTW as the number of cases increases, assess whether there is any benefit of setting for -NN through cross validation whether it is worth setting the warping path for DTW through cross validation and finally is it better to use a window or weighting for DTW. Based on experiments on 77 problems, we conclude that 1-NN with Euclidean distance is fairly easy to beat but 1-NN with DTW is not, if window size is set through cross validation
- …