39,181 research outputs found

    Enhancing Decision Tree based Interpretation of Deep Neural Networks through L1-Orthogonal Regularization

    Full text link
    One obstacle that so far prevents the introduction of machine learning models primarily in critical areas is the lack of explainability. In this work, a practicable approach of gaining explainability of deep artificial neural networks (NN) using an interpretable surrogate model based on decision trees is presented. Simply fitting a decision tree to a trained NN usually leads to unsatisfactory results in terms of accuracy and fidelity. Using L1-orthogonal regularization during training, however, preserves the accuracy of the NN, while it can be closely approximated by small decision trees. Tests with different data sets confirm that L1-orthogonal regularization yields models of lower complexity and at the same time higher fidelity compared to other regularizers.Comment: 8 pages, 18th IEEE International Conference on Machine Learning and Applications (ICMLA) 201

    Fisher’s decision tree

    Get PDF
    Univariate decision trees are classifiers currently used in many data mining applications. This classifier discovers partitions in the input space via hyperplanes that are orthogonal to the axes of attributes, producing a model that can be understood by human experts. One disadvantage of univariate decision trees is that they produce complex and inaccurate models when decision boundaries are not orthogonal to axes. In this paper we introduce the Fisher’s Tree, it is a classifier that takes advantage of dimensionality reduction of Fisher’s linear discriminant and uses the decomposition strategy of decision trees, to come up with an oblique decision tree. Our proposal generates an artificial attribute that is used to split the data in a recursive way. The Fisher’s decision tree induces oblique trees whose accuracy, size, number of leaves and training time are competitive with respect to other decision trees reported in the literature. We use more than ten public available data sets to demonstrate the effectiveness of our method

    The Route towards the ultimate network topology

    Get PDF
    In this talk I will try to summarize our quest for a realizable network topology that optimizes performance, cost, power consumption and partitionability. We have explored Fat Trees, Dragonflies, variations of dragonflies, Orthogonal Fat Trees, multi-layer HyperX's, Multi-layer Full Meshes and close-to Moore's (graph) bound topologies in an attempt to decide, with the best routing we could find, for a reasonable task-placement, and for a collection of workloads (synthetic and real-world), which topology to choose. Whereas a final decision for a single 'ultimate' topology remains elusive, the route towards it took us to unexpected paths that lead to the discovery of new insights in topology design and properties and in design of routing schemes

    Dynamic Choice Under Ambiguity

    Get PDF
    This paper analyzes sophisticated dynamic choice for ambiguity-sensitive decision makers. It characterizes Consistent Planning via axioms on preferences over decision trees. Furthermore, it shows how to elicit conditional preferences from prior preferences. The key axiom is a weakening of Dynamic Consistency, deemed Sophistication. The analysis accommodates arbitrary decision models and updating rules. Hence, the results indicate that (i) ambiguity attitudes, (ii) updating rules, and (iii) sophisticated dynamic choice are mutually orthogonal aspects of preferences. As an example, a characterization of prior-by-prior Bayesian updating and Consistent Planning for arbitrary maxmin-expected utility preferences is presented. The resulting sophisticated MEU preferences are then used to analyze the value of information under ambiguity; a basic trade-off between information acquisition and commitment is highlighted.

    Solving for multi-class using orthogonal coding matrices

    Full text link
    A common method of generalizing binary to multi-class classification is the error correcting code (ECC). ECCs may be optimized in a number of ways, for instance by making them orthogonal. Here we test two types of orthogonal ECCs on seven different datasets using three types of binary classifier and compare them with three other multi-class methods: 1 vs. 1, one-versus-the-rest and random ECCs. The first type of orthogonal ECC, in which the codes contain no zeros, admits a fast and simple method of solving for the probabilities. Orthogonal ECCs are always more accurate than random ECCs as predicted by recent literature. Improvments in uncertainty coefficient (U.C.) range between 0.4--17.5% (0.004--0.139, absolute), while improvements in Brier score between 0.7--10.7%. Unfortunately, orthogonal ECCs are rarely more accurate than 1 vs. 1. Disparities are worst when the methods are paired with logistic regression, with orthogonal ECCs never beating 1 vs. 1. When the methods are paired with SVM, the losses are less significant, peaking at 1.5%, relative, 0.011 absolute in uncertainty coefficient and 6.5% in Brier scores. Orthogonal ECCs are always the fastest of the five multi-class methods when paired with linear classifiers. When paired with a piecewise linear classifier, whose classification speed does not depend on the number of training samples, classifications using orthogonal ECCs were always more accurate than the the remaining three methods and also faster than 1 vs. 1. Losses against 1 vs. 1 here were higher, peaking at 1.9% (0.017, absolute), in U.C. and 39% in Brier score. Gains in speed ranged between 1.1% and over 100%. Whether the speed increase is worth the penalty in accuracy will depend on the application
    corecore