83 research outputs found

    TE2Rules: Extracting Rule Lists from Tree Ensembles

    Full text link
    Tree Ensemble (TE) models (e.g. Gradient Boosted Trees and Random Forests) often provide higher prediction performance compared to single decision trees. However, TE models generally lack transparency and interpretability, as humans have difficulty understanding their decision logic. This paper presents a novel approach to convert a TE trained for a binary classification task, to a rule list (RL) that is a global equivalent to the TE and is comprehensible for a human. This RL captures all necessary and sufficient conditions for decision making by the TE. Experiments on benchmark datasets demonstrate that, compared to state-of-the-art methods, (i) predictions from the RL generated by TE2Rules have high fidelity with respect to the original TE, (ii) the RL from TE2Rules has high interpretability measured by the number and the length of the decision rules, (iii) the run-time of TE2Rules algorithm can be reduced significantly at the cost of a slightly lower fidelity, and (iv) the RL is a fast alternative to the state-of-the-art rule-based instance-level outcome explanation techniques

    A comparison among interpretative proposals for Random Forests

    Get PDF
    The growing success of Machine Learning (ML) is making significant improvements to predictive models, facilitating their integration in various application fields. Despite its growing success, there are some limitations and disadvantages: the most significant is the lack of interpretability that does not allow users to understand how particular decisions are made. Our study focus on one of the best performing and most used models in the Machine Learning framework, the Random Forest model. It is known as an efficient model of ensemble learning, as it ensures high predictive precision, flexibility, and immediacy; it is recognized as an intuitive and understandable approach to the construction process, but it is also considered a Black Box model due to the large number of deep decision trees produced within it. The aim of this research is twofold. We present a survey about interpretative proposal for Random Forest and then we perform a machine learning experiment providing a comparison between two methodologies, inTrees, and NodeHarvest, that represent the main approaches in the rule extraction framework. The proposed experiment compares methods performance on six real datasets covering different data characteristics: n. of observations, balanced/unbalanced response, the presence of categorical and numerical predictors. This study contributes to picture a review of the methods and tools proposed for ensemble tree interpretation, and identify, in the class of rule extraction approaches, the best proposal

    ASSOCIATION RULES IN RANDOM FOREST FOR THE MOST INTERPRETABLE MODEL

    Get PDF
    Random forest is one of the most popular ensemble methods and has many advantages. However, random forest is a "black-box" model, so the model is difficult to interpret. This study discusses the interpretation of random forest with association rules technique using rules extracted from each decision tree in the random forest model. This analysis involves simulation and empirical data, to determine the factors that affect the poverty status of households in Tasikmalaya. The empirical data was sourced from Badan Pusat Statistik (BPS), the National Socio-Economic Survey (SUSENAS) data for West Java Province in 2019.  The results obtained are based on simulation data, the association rules technique can extract the set of rules that characterize the target variable. The application of interpretable random forest to empirical data shows that the rules that most distinguish the poverty status of households in Tasikmalaya are house wall materials and the main source of drinking water, house wall materials and cooking fuel, as well as house wall materials and motorcycle ownership

    Explaining Random Forest Predictions with Association Rules

    Get PDF
    Random forests frequently achieve state-of-the-art predictive performance. However, the logic behind their predictions cannot be easily understood, since they are the result of averaging often hundreds or thousands of, possibly conflicting, individual predictions. Instead of presenting all the individual predictions, an alternative is proposed, by which the predictions are explained using association rules generated from itemsets representing paths in the trees of the forest. An empirical investigation is presented, in which alternative ways of generating the association rules are compared with respect to explainability, as measured by the fraction of predictions for which there is no applicable rule and by the fraction of predictions for which there is at least one applicable rule that conflicts with the forest prediction. For the considered datasets, it can be seen that most predictions can be explained by the discovered association rules, which have a high level of agreement with the underlying forest. The results do not single out a clear winner of the considered alternatives in terms of unexplained and disagreement rates, but show that they are associated with substantial differences in computational cost

    Transparent computational intelligence models for pharmaceutical tableting process

    Get PDF
    Purpose Pharmaceutical industry is tightly regulated owing to health concerns. Over the years, the use of computational intelligence (CI) tools has increased in pharmaceutical research and development, manufacturing, and quality control. Quality characteristics of tablets like tensile strength are important indicators of expected tablet performance. Predictive, yet transparent, CI models which can be analysed for insights into the formulation and development process. Methods This work uses data from a galenical tableting study and computational intelligence methods like decision trees, random forests, fuzzy systems, artificial neural networks, and symbolic regression to establish models for the outcome of tensile strength. Data was divided in training and test fold according to ten fold cross validation scheme and RMSE was used as an evaluation metric. Tree based ensembles and symbolic regression methods are presented as transparent models with extracted rules and mathematical formula, respectively, explaining the CI models in greater detail. Results CI models for tensile strength of tablets based on the formulation design and process parameters have been established. Best models exhibit normalized RMSE of 7 %. Rules from fuzzy systems and random forests are shown to increase transparency of CI models. A mathematical formula generated by symbolic regression is presented as a transparent model. Conclusions CI models explain the variation of tensile strength according to formulation and manufacturing process characteristics. CI models can be further analyzed to extract actionable knowledge making the artificial learning process more transparent and acceptable for use in pharmaceutical quality and safety domains

    A Survey Of Methods For Explaining Black Box Models

    Get PDF
    In the last years many accurate decision support systems have been constructed as black boxes, that is as systems that hide their internal logic to the user. This lack of explanation constitutes both a practical and an ethical issue. The literature reports many approaches aimed at overcoming this crucial weakness sometimes at the cost of scarifying accuracy for interpretability. The applications in which black box decision systems can be used are various, and each approach is typically developed to provide a solution for a specific problem and, as a consequence, delineating explicitly or implicitly its own definition of interpretability and explanation. The aim of this paper is to provide a classification of the main problems addressed in the literature with respect to the notion of explanation and the type of black box system. Given a problem definition, a black box type, and a desired explanation this survey should help the researcher to find the proposals more useful for his own work. The proposed classification of approaches to open black box models should also be useful for putting the many research open questions in perspective.Comment: This work is currently under review on an international journa
    • …
    corecore