Search CORE

43 research outputs found

Comprehensible credit scoring models using rule extraction from support vector machines.

Author: Baesens Bart
Martens David
Van Gestel Tony
Vanthienen Jan
Publication venue
Publication date
Field of study

In recent years, Support Vector Machines (SVMs) were successfully applied to a wide range of applications. Their good performance is achieved by an implicit non-linear transformation of the original problem to a high-dimensional (possibly infinite) feature space in which a linear decision hyperplane is constructed that yields a nonlinear classifier in the input space. However, since the classifier is described as a complex mathematical function, it is rather incomprehensible for humans. This opacity property prevents them from being used in many real- life applications where both accuracy and comprehensibility are required, such as medical diagnosis and credit risk evaluation. To overcome this limitation, rules can be extracted from the trained SVM that are interpretable by humans and keep as much of the accuracy of the SVM as possible. In this paper, we will provide an overview of the recently proposed rule extraction techniques for SVMs and introduce two others taken from the artificial neural networks domain, being Trepan and G-REX. The described techniques are compared using publicly avail- able datasets, such as Ripley's synthetic dataset and the multi-class iris dataset. We will also look at medical diagnosis and credit scoring where comprehensibility is a key requirement and even a regulatory recommendation. Our experiments show that the SVM rule extraction techniques lose only a small percentage in performance compared to SVMs and therefore rank at the top of comprehensible classification techniques.Credit; Credit scoring; Models; Model; Applications; Performance; Space; Decision; Yield; Real life; Risk; Evaluation; Rules; Neural networks; Networks; Classification; Research;

Research Papers in Economics

Using rule extraction to improve the comprehensibility of predictive models.

Author: Baesens Bart
Huysmans Johan
Vanthienen Jan
Publication venue
Publication date
Field of study

Whereas newer machine learning techniques, like artifficial neural net-works and support vector machines, have shown superior performance in various benchmarking studies, the application of these techniques remains largely restricted to research environments. A more widespread adoption of these techniques is foiled by their lack of explanation capability which is required in some application areas, like medical diagnosis or credit scoring. To overcome this restriction, various algorithms have been proposed to extract a meaningful description of the underlying `blackbox' models. These algorithms' dual goal is to mimic the behavior of the black box as closely as possible while at the same time they have to ensure that the extracted description is maximally comprehensible. In this research report, we first develop a formal definition of`rule extraction and comment on the inherent trade-off between accuracy and comprehensibility. Afterwards, we develop a taxonomy by which rule extraction algorithms can be classiffied and discuss some criteria by which these algorithms can be evaluated. Finally, an in-depth review of the most important algorithms is given.This report is concluded by pointing out some general shortcomings of existing techniques and opportunities for future research.Models; Model; Algorithms; Criteria; Opportunities; Research; Learning; Neural networks; Networks; Performance; Benchmarking; Studies; Area; Credit; Credit scoring; Behavior; Time;

Research Papers in Economics

A Comparison Study on Rule Extraction from Neural Network Ensembles, Boosted Shallow Trees, and SVMs

Author: Guido Bologna
Yoichi Hayashi
Publication venue: 'Hindawi Limited'
Publication date: 01/01/2018
Field of study

Crossref

Recommended from our members

Accuracy and interpretability trade-offs in machine learning applied to safer gambling

Author: Dragicevic S.
Garcez A.
Percy C.
Sarkar S.
Slabaugh G. G.
Weyde T.
Publication venue: CEUR Workshop Proceedings
Publication date: 26/12/2016
Field of study

Responsible gambling is an area of research and industry which seeks to understand the pathways to harm from gambling and implement programmes to reduce or prevent harm that gambling might cause. There is a growing body of research that has used gambling behavioural data to model and predict harmful gambling, and the industry is showing increasing interest in technologies that can help gambling operators to better predict harm and prevent it through appropriate interventions. However, industry surveys and feedback clearly indicate that in order to enable wider adoption of such data-driven methods, industry and policy makers require a greater understanding of how machine learning methods make these predictions. In this paper, we make use of the TREPAN algorithm for extracting decision trees from Neural Networks and Random Forests. We present the first comparative evaluation of predictive performance and tree properties for extracted trees, which is also the first comparative evaluation of knowledge extraction for safer gambling. Results indicate that TREPAN extracts better performing trees than direct learning of decision trees from the data. Overall, trees extracted with TREPAN from different models offer a good compromise between prediction accuracy and interpretability. TREPAN can produce decision trees with extended tests rules of different forms, so that interpretability depends on multiple factors. We present detailed results and a discussion of the trade-offs with regard to performance and interpretability and use in the gambling industry

City Research Online

Recommended from our members

The Need for Knowledge Extraction: Understanding Harmful Gambling Behavior with Neural Networks

Author: Dragicevic S.
França M. V. M.
Garcez A.
Percy C.
Slabaugh G. G.
Weyde T.
Publication venue: 'IOS Press'
Publication date: 01/01/2016
Field of study

Responsible gambling is a field of study that involves supporting gamblers so as to reduce the harm that their gambling activity might cause. Recently in the literature, machine learning algorithms have been introduced as a way to predict potentially harmful gambling based on patterns of gambling behavior, such as trends in amounts wagered and the time spent gambling. In this paper, neural network models are analyzed to help predict the outcome of a partial proxy for harmful gambling behavior: when a gambler “self-excludes”, requesting a gambling operator to prevent them from accessing gambling opportunities. Drawing on survey and interview insights from industry and public officials as to the importance of interpretability, a variant of the knowledge extraction algorithm TREPAN is proposed which can produce compact, human-readable logic rules efficiently, given a neural network trained on gambling data. To the best of our knowledge, this paper reports the first industrial-strength application of knowledge extraction from neural networks, which otherwise are black-boxes unable to provide the explanatory insights which are crucially required in this area of application. We show that through knowledge extraction one can explore and validate the kinds of behavioral and demographic profiles that best predict self-exclusion, while developing a machine learning approach with greater potential for adoption by industry and treatment providers. Experimental results reported in this paper indicate that the rules extracted can achieve high fidelity to the trained neural network while maintaining competitive accuracy and providing useful insight to domain experts in responsible gambling

City Research Online

A Survey Of Methods For Explaining Black Box Models

Author: Giannotti Fosca
Guidotti Riccardo
Monreale Anna
Pedreschi Dino
Ruggieri Salvatore
Turini Franco
Publication venue
Publication date: 01/01/2018
Field of study

In the last years many accurate decision support systems have been constructed as black boxes, that is as systems that hide their internal logic to the user. This lack of explanation constitutes both a practical and an ethical issue. The literature reports many approaches aimed at overcoming this crucial weakness sometimes at the cost of scarifying accuracy for interpretability. The applications in which black box decision systems can be used are various, and each approach is typically developed to provide a solution for a specific problem and, as a consequence, delineating explicitly or implicitly its own definition of interpretability and explanation. The aim of this paper is to provide a classification of the main problems addressed in the literature with respect to the notion of explanation and the type of black box system. Given a problem definition, a black box type, and a desired explanation this survey should help the researcher to find the proposals more useful for his own work. The proposed classification of approaches to open black box models should also be useful for putting the many research open questions in perspective.Comment: This work is currently under review on an international journa

arXiv.org e-Print Archive

Archivio istituzionale della Ricerca - Scuola Normale Superiore

Archivio della Ricerca - Università di Pisa

A survey of methods for explaining black box models

Author: Giannotti Fosca
Guidotti Riccardo
Monreale Anna
Pedreschi Dino
Ruggieri Salvatore
Turini Franco
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2018
Field of study

In recent years, many accurate decision support systems have been constructed as black boxes, that is as systems that hide their internal logic to the user. This lack of explanation constitutes both a practical and an ethical issue. The literature reports many approaches aimed at overcoming this crucial weakness, sometimes at the cost of sacrificing accuracy for interpretability. The applications in which black box decision systems can be used are various, and each approach is typically developed to provide a solution for a specific problem and, as a consequence, it explicitly or implicitly delineates its own definition of interpretability and explanation. The aim of this article is to provide a classification of the main problems addressed in the literature with respect to the notion of explanation and the type of black box system. Given a problem definition, a black box type, and a desired explanation, this survey should help the researcher to find the proposals more useful for his own work. The proposed classification of approaches to open black box models should also be useful for putting the many research open questions in perspective

Archivio della Ricerca - Università di Pisa

Data mining techniques for protein sequence analysis

Author: Hamby Stephen Edward
Publication venue
Publication date: 09/12/2010
Field of study

This thesis concerns two areas of bioinformatics related by their role in protein structure and function: protein structure prediction and post translational modification of proteins. The dihedral angles Ψ and Φ are predicted using support vector regression. For the prediction of Ψ dihedral angles the addition of structural information is examined and the normalisation of Ψ and Φ dihedral angles is examined. An application of the dihedral angles is investigated. The relationship between dihedral angles and three bond J couplings determined from NMR experiments is described by the Karplus equation. We investigate the determination of the correct solution of the Karplus equation using predicted Φ dihedral angles. Glycosylation is an important post translational modification of proteins involved in many different facets of biology. The work here investigates the prediction of N-linked and O-linked glycosylation sites using the random forest machine learning algorithm and pairwise patterns in the data. This methodology produces more accurate results when compared to state of the art prediction methods. The black box nature of random forest is addressed by using the trepan algorithm to generate a decision tree with comprehensible rules that represents the decision making process of random forest. The prediction of our program GPP does not distinguish between glycans at a given glycosylation site. We use farthest first clustering, with the idea of classifying each glycosylation site by the sugar linking the glycan to protein. This thesis demonstrates the prediction of protein backbone torsion angles and improves the current state of the art for the prediction of glycosylation sites. It also investigates potential applications and the interpretation of these methods

Nottingham eTheses

Data mining techniques for protein sequence analysis

Author: Hamby Stephen Edward
Publication venue
Publication date
Field of study

Nottingham ePrints