6 research outputs found
Efficient Discovery of Expressive Multi-label Rules using Relaxed Pruning
Being able to model correlations between labels is considered crucial in
multi-label classification. Rule-based models enable to expose such
dependencies, e.g., implications, subsumptions, or exclusions, in an
interpretable and human-comprehensible manner. Albeit the number of possible
label combinations increases exponentially with the number of available labels,
it has been shown that rules with multiple labels in their heads, which are a
natural form to model local label dependencies, can be induced efficiently by
exploiting certain properties of rule evaluation measures and pruning the label
search space accordingly. However, experiments have revealed that multi-label
heads are unlikely to be learned by existing methods due to their
restrictiveness. To overcome this limitation, we propose a plug-in approach
that relaxes the search space pruning used by existing methods in order to
introduce a bias towards larger multi-label heads resulting in more expressive
rules. We further demonstrate the effectiveness of our approach empirically and
show that it does not come with drawbacks in terms of training time or
predictive performance.Comment: Preprint version. To appear in Proceedings of the 22nd International
Conference on Discovery Science, 201
Fast rule-based bioactivity prediction using associative classification mining
<p>Abstract</p> <p>Relating chemical features to bioactivities is critical in molecular design and is used extensively in the lead discovery and optimization process. A variety of techniques from statistics, data mining and machine learning have been applied to this process. In this study, we utilize a collection of methods, called <it>associative classification mining</it> (<it>ACM</it>), which are popular in the data mining community, but so far have not been applied widely in cheminformatics. More specifically, classification based on predictive association rules (CPAR), classification based on multiple association rules (CMAR) and classification based on association rules (CBA) are employed on three datasets using various descriptor sets. Experimental evaluations on anti-tuberculosis (antiTB), mutagenicity and hERG (the human Ether-a-go-go-Related Gene) blocker datasets show that these three methods are computationally scalable and appropriate for high speed mining. Additionally, they provide comparable accuracy and efficiency to the commonly used Bayesian and support vector machines (SVM) methods, and produce highly interpretable models.</p