40,051 research outputs found
Opening black box data mining models using sensitivity analysis
There are several supervised learning Data Mining (DM) methods, such as Neural Networks (NN), Support Vector Machines (SVM) and ensembles, that often attain high quality predictions, although the obtained models are difficult to inter- pret by humans. In this paper, we open these black box DM models by using a novel visualization approach that is based on a Sensitivity Analysis (SA) method. In particular, we propose a Global SA (GSA), which extends the applicability of previous SA methods (e.g. to classification tasks), and several visualization techniques (e.g. variable effect characteristic curve), for assessing input relevance and effects on the model’s responses. We show the GSA capabilities by conducting several experiments, using a NN ensemble and SVM model, in both synthetic and real-world datasets.(undefined
A survey of methods for explaining black box models
In recent years, many accurate decision support systems have been constructed as black boxes, that is as systems that hide their internal logic to the user. This lack of explanation constitutes both a practical and an ethical issue. The literature reports many approaches aimed at overcoming this crucial weakness, sometimes at the cost of sacrificing accuracy for interpretability. The applications in which black box decision systems can be used are various, and each approach is typically developed to provide a solution for a specific problem and, as a consequence, it explicitly or implicitly delineates its own definition of interpretability and explanation. The aim of this article is to provide a classification of the main problems addressed in the literature with respect to the notion of explanation and the type of black box system. Given a problem definition, a black box type, and a desired explanation, this survey should help the researcher to find the proposals more useful for his own work. The proposed classification of approaches to open black box models should also be useful for putting the many research open questions in perspective
A Survey Of Methods For Explaining Black Box Models
In the last years many accurate decision support systems have been
constructed as black boxes, that is as systems that hide their internal logic
to the user. This lack of explanation constitutes both a practical and an
ethical issue. The literature reports many approaches aimed at overcoming this
crucial weakness sometimes at the cost of scarifying accuracy for
interpretability. The applications in which black box decision systems can be
used are various, and each approach is typically developed to provide a
solution for a specific problem and, as a consequence, delineating explicitly
or implicitly its own definition of interpretability and explanation. The aim
of this paper is to provide a classification of the main problems addressed in
the literature with respect to the notion of explanation and the type of black
box system. Given a problem definition, a black box type, and a desired
explanation this survey should help the researcher to find the proposals more
useful for his own work. The proposed classification of approaches to open
black box models should also be useful for putting the many research open
questions in perspective.Comment: This work is currently under review on an international journa
Temporally-aware algorithms for the classification of anuran sounds
Several authors have shown that the sounds of anurans can be used as an indicator of
climate change. Hence, the recording, storage and further processing of a huge
number of anuran sounds, distributed over time and space, are required in order to
obtain this indicator. Furthermore, it is desirable to have algorithms and tools for
the automatic classification of the different classes of sounds. In this paper, six
classification methods are proposed, all based on the data-mining domain, which
strive to take advantage of the temporal character of the sounds. The definition and
comparison of these classification methods is undertaken using several approaches.
The main conclusions of this paper are that: (i) the sliding window method attained
the best results in the experiments presented, and even outperformed the hidden
Markov models usually employed in similar applications; (ii) noteworthy overall
classification performance has been obtained, which is an especially striking result
considering that the sounds analysed were affected by a highly noisy background;
(iii) the instance selection for the determination of the sounds in the training dataset
offers better results than cross-validation techniques; and (iv) the temporally-aware
classifiers have revealed that they can obtain better performance than their nontemporally-aware
counterparts.Consejería de Innovación, Ciencia y Empresa (Junta de Andalucía, Spain): excellence eSAPIENS number TIC 570
Inverse Classification for Comparison-based Interpretability in Machine Learning
In the context of post-hoc interpretability, this paper addresses the task of
explaining the prediction of a classifier, considering the case where no
information is available, neither on the classifier itself, nor on the
processed data (neither the training nor the test data). It proposes an
instance-based approach whose principle consists in determining the minimal
changes needed to alter a prediction: given a data point whose classification
must be explained, the proposed method consists in identifying a close
neighbour classified differently, where the closeness definition integrates a
sparsity constraint. This principle is implemented using observation generation
in the Growing Spheres algorithm. Experimental results on two datasets
illustrate the relevance of the proposed approach that can be used to gain
knowledge about the classifier.Comment: preprin
Using sensitivity analysis and visualization techniques to open black box data mining models
In this paper, we propose a new visualization approach based on a Sen- sitivity Analysis (SA) to extract human understandable knowledge from su- pervised learning black box data mining models, such as Neural Networks (NN), Support Vector Machines (SVM) and ensembles, including Random Forests (RF). Five SA methods (three of which are purely new) and four mea- sures of input importance (one novel) are presented. Also, the SA approach is adapted to handle discrete variables and to aggregate multiple sensitivity responses. Moreover, several visualizations for the SA results are introduced, such as input pair importance color matrix and variable effect characteristic surface. A wide range of experiments was performed in order to test the SA methods and measures by fitting four well-known models (NN, SVM, RF and decision trees) to synthetic datasets (five regression and five classification tasks). In addition, the visualization capabilities of the SA are demonstrated using four real-world datasets (e.g., bank direct marketing and white wine quality).The work of P. Cortez was funded by FEDER, through the program COMPETE and the Portuguese Foundation for Science and Technology (FCT), within the project FCOMP-01-0124-FEDER-022674. Also, the au- thors wish to thank the anonymous reviewers for their helpful comments
- …