83 research outputs found
TE2Rules: Extracting Rule Lists from Tree Ensembles
Tree Ensemble (TE) models (e.g. Gradient Boosted Trees and Random Forests)
often provide higher prediction performance compared to single decision trees.
However, TE models generally lack transparency and interpretability, as humans
have difficulty understanding their decision logic. This paper presents a novel
approach to convert a TE trained for a binary classification task, to a rule
list (RL) that is a global equivalent to the TE and is comprehensible for a
human. This RL captures all necessary and sufficient conditions for decision
making by the TE. Experiments on benchmark datasets demonstrate that, compared
to state-of-the-art methods, (i) predictions from the RL generated by TE2Rules
have high fidelity with respect to the original TE, (ii) the RL from TE2Rules
has high interpretability measured by the number and the length of the decision
rules, (iii) the run-time of TE2Rules algorithm can be reduced significantly at
the cost of a slightly lower fidelity, and (iv) the RL is a fast alternative to
the state-of-the-art rule-based instance-level outcome explanation techniques
A comparison among interpretative proposals for Random Forests
The growing success of Machine Learning (ML) is making significant improvements to predictive models, facilitating their integration in various application fields. Despite its growing success, there are some limitations and disadvantages: the most significant is the lack of interpretability that does not allow users to understand how particular decisions are made. Our study focus on one of the best performing and most used models in the Machine Learning framework, the Random Forest model. It is known as an efficient model of ensemble learning, as it ensures high predictive precision, flexibility, and immediacy; it is recognized as an intuitive and understandable approach to the construction process, but it is also considered a Black Box model due to the large number of deep decision trees produced within it.
The aim of this research is twofold. We present a survey about interpretative proposal for Random Forest and then we perform a machine learning experiment providing a comparison between two methodologies, inTrees, and NodeHarvest, that represent the main approaches in the rule extraction framework. The proposed experiment compares methods performance on six real datasets covering different data characteristics: n. of observations, balanced/unbalanced response, the presence of categorical and numerical predictors. This study contributes to picture a review of the methods and tools proposed for ensemble tree interpretation, and identify, in the class of rule extraction approaches, the best proposal
ASSOCIATION RULES IN RANDOM FOREST FOR THE MOST INTERPRETABLE MODEL
Random forest is one of the most popular ensemble methods and has many advantages. However, random forest is a "black-box" model, so the model is difficult to interpret. This study discusses the interpretation of random forest with association rules technique using rules extracted from each decision tree in the random forest model. This analysis involves simulation and empirical data, to determine the factors that affect the poverty status of households in Tasikmalaya. The empirical data was sourced from Badan Pusat Statistik (BPS), the National Socio-Economic Survey (SUSENAS) data for West Java Province in 2019. The results obtained are based on simulation data, the association rules technique can extract the set of rules that characterize the target variable. The application of interpretable random forest to empirical data shows that the rules that most distinguish the poverty status of households in Tasikmalaya are house wall materials and the main source of drinking water, house wall materials and cooking fuel, as well as house wall materials and motorcycle ownership
Explaining Random Forest Predictions with Association Rules
Random forests frequently achieve state-of-the-art predictive performance. However, the logic behind their predictions cannot be easily understood, since they are the result of averaging often hundreds or thousands of, possibly conflicting, individual predictions. Instead of presenting all the individual predictions, an alternative is proposed, by which the predictions are explained using association rules generated from itemsets representing paths in the trees of the forest. An empirical investigation is presented, in which alternative ways of generating the association rules are compared with respect to explainability, as measured by the fraction of predictions for which there is no applicable rule and by the fraction of predictions for which there is at least one applicable rule that conflicts with the forest prediction. For the considered datasets, it can be seen that most predictions can be explained by the discovered association rules, which have a high level of agreement with the underlying forest. The results do not single out a clear winner of the considered alternatives in terms of unexplained and disagreement rates, but show that they are associated with substantial differences in computational cost
Transparent computational intelligence models for pharmaceutical tableting process
Purpose Pharmaceutical industry is tightly regulated owing to health concerns. Over the years, the use of computational intelligence (CI) tools has increased in pharmaceutical research and development, manufacturing, and quality control. Quality characteristics of tablets like tensile strength are important indicators of expected tablet performance. Predictive, yet transparent, CI models which can be analysed for insights into the formulation and development process. Methods This work uses data from a galenical tableting study and computational intelligence methods like decision trees, random forests, fuzzy systems, artificial neural networks, and symbolic regression to establish models for the outcome of tensile strength. Data was divided in training and test fold according to ten fold cross validation scheme and RMSE was used as an evaluation metric. Tree based ensembles and symbolic regression methods are presented as transparent models with extracted rules and mathematical formula, respectively, explaining the CI models in greater detail. Results CI models for tensile strength of tablets based on the formulation design and process parameters have been established. Best models exhibit normalized RMSE of 7 %. Rules from fuzzy systems and random forests are shown to increase transparency of CI models. A mathematical formula generated by symbolic regression is presented as a transparent model. Conclusions CI models explain the variation of tensile strength according to formulation and manufacturing process characteristics. CI models can be further analyzed to extract actionable knowledge making the artificial learning process more transparent and acceptable for use in pharmaceutical quality and safety domains
A Survey Of Methods For Explaining Black Box Models
In the last years many accurate decision support systems have been
constructed as black boxes, that is as systems that hide their internal logic
to the user. This lack of explanation constitutes both a practical and an
ethical issue. The literature reports many approaches aimed at overcoming this
crucial weakness sometimes at the cost of scarifying accuracy for
interpretability. The applications in which black box decision systems can be
used are various, and each approach is typically developed to provide a
solution for a specific problem and, as a consequence, delineating explicitly
or implicitly its own definition of interpretability and explanation. The aim
of this paper is to provide a classification of the main problems addressed in
the literature with respect to the notion of explanation and the type of black
box system. Given a problem definition, a black box type, and a desired
explanation this survey should help the researcher to find the proposals more
useful for his own work. The proposed classification of approaches to open
black box models should also be useful for putting the many research open
questions in perspective.Comment: This work is currently under review on an international journa
- …