Search CORE

1,653 research outputs found

Evolving artificial datasets to improve interpretable classifiers

Author: Mayo Michael
Sun Quan
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2014
Field of study

Differential Evolution can be used to construct effective and compact artificial training datasets for machine learning algorithms. In this paper, a series of comparative experiments are performed in which two simple interpretable supervised classifiers (specifically, Naive Bayes and linear Support Vector Machines) are trained (i) directly on “real” data, as would be the normal case, and (ii) indirectly, using special artificial datasets derived from real data via evolutionary optimization. The results across several challenging test problems show that supervised classifiers trained indirectly using our novel evolution-based approach produce models with superior predictive classification performance. Besides presenting the accuracy of the learned models, we also analyze the sensitivity of our artificial data optimization process to Differential Evolution's parameters, and then we examine the statistical characteristics of the artificial data that is evolved

On explaining machine learning models by evolving crucial and compact features

Author: Alderliesten T. (Tanja)
Bosman P.A.N. (Peter)
Virgolin M. (Marco)
Publication venue: 'Elsevier BV'
Publication date: 10/01/2020
Field of study

Feature construction can substantially improve the accuracy of Machine Learning (ML) algorithms. Genetic Programming (GP) has been proven to be effective at this task by evolving non-linear combinations of input features. GP additionally has the potential to improve ML explainability since explicit expressions are evolved. Yet, in most GP works the complexity of evolved features is not explicitly bound or minimized though this is arguably key for explainability. In this article, we assess to what extent GP still performs favorably at feature construction when constructing features that are (1) Of small-enough number, to enable visualization of the behavior of the ML model; (2) Of small-enough size, to enable interpretability of the features themselves; (3) Of sufficient informative power, to retain or even improve the performance of the ML algorithm. We consider a simple feature construction scheme using three different GP algorithms, as well as random search, to evolve features for five ML algorithms, including support vector machines and random forest. Our results on 21 datasets pertaining to classification and regression problems show that constructing only two compact features can be sufficient to rival the use of the entire original feature set. We further find that a modern GP algorithm, GP-GOMEA, performs best overall. These results, combined with examples that we provide of readable constructed features and of 2D visualizations of ML behavior, lead us to positively conclude that GP-based feature construction still works well when explicitly searching for compact features, making it extremely helpful to explain ML models

arXiv.org e-Print Archive

A survey on utilization of data mining approaches for dermatological (skin) diseases prediction

Author: Adibi N
Ahmadzadeh MR
Barati E
Mohammadi A
Saraee MH
Publication venue: Cyber Journals
Publication date: 01/03/2011
Field of study

Due to recent technology advances, large volumes of medical data is obtained. These data contain valuable information. Therefore data mining techniques can be used to extract useful patterns. This paper is intended to introduce data mining and its various techniques and a survey of the available literature on medical data mining. We emphasize mainly on the application of data mining on skin diseases. A categorization has been provided based on the different data mining techniques. The utility of the various data mining methodologies is highlighted. Generally association mining is suitable for extracting rules. It has been used especially in cancer diagnosis. Classification is a robust method in medical mining. In this paper, we have summarized the different uses of classification in dermatology. It is one of the most important methods for diagnosis of erythemato-squamous diseases. There are different methods like Neural Networks, Genetic Algorithms and fuzzy classifiaction in this topic. Clustering is a useful method in medical images mining. The purpose of clustering techniques is to find a structure for the given data by finding similarities between data according to data characteristics. Clustering has some applications in dermatology. Besides introducing different mining methods, we have investigated some challenges which exist in mining skin data

University of Salford Institutional Repository