39,181 research outputs found
Enhancing Decision Tree based Interpretation of Deep Neural Networks through L1-Orthogonal Regularization
One obstacle that so far prevents the introduction of machine learning models
primarily in critical areas is the lack of explainability. In this work, a
practicable approach of gaining explainability of deep artificial neural
networks (NN) using an interpretable surrogate model based on decision trees is
presented. Simply fitting a decision tree to a trained NN usually leads to
unsatisfactory results in terms of accuracy and fidelity. Using L1-orthogonal
regularization during training, however, preserves the accuracy of the NN,
while it can be closely approximated by small decision trees. Tests with
different data sets confirm that L1-orthogonal regularization yields models of
lower complexity and at the same time higher fidelity compared to other
regularizers.Comment: 8 pages, 18th IEEE International Conference on Machine Learning and
Applications (ICMLA) 201
Fisher’s decision tree
Univariate decision trees are classifiers currently used in many data mining applications. This classifier discovers partitions in the input space via hyperplanes that are orthogonal to the axes of attributes, producing a model that can be understood by human experts. One disadvantage of univariate decision trees is that they produce complex and inaccurate models when decision boundaries are not orthogonal to axes. In this paper we introduce the Fisher’s Tree, it is a classifier that takes advantage of dimensionality reduction of Fisher’s linear discriminant and uses the decomposition strategy of decision trees, to come up with an oblique decision tree. Our proposal generates an artificial attribute that is used to split the data in a recursive way. The Fisher’s decision tree induces oblique trees whose accuracy, size, number of leaves and training time are competitive with respect to other decision trees reported in the literature. We use more than ten public available data sets to demonstrate the effectiveness of our method
The Route towards the ultimate network topology
In this talk I will try to summarize our quest for a realizable network topology that optimizes performance, cost, power consumption and partitionability. We have explored Fat Trees, Dragonflies, variations of dragonflies, Orthogonal Fat Trees, multi-layer HyperX's, Multi-layer Full Meshes and close-to Moore's (graph) bound topologies in an attempt to decide, with the best routing we could find, for a reasonable task-placement, and for a collection of workloads (synthetic and real-world), which topology to choose.
Whereas a final decision for a single 'ultimate' topology remains elusive, the route towards it took us to unexpected paths that lead to the discovery of new insights in topology design and properties and in design of routing schemes
Dynamic Choice Under Ambiguity
This paper analyzes sophisticated dynamic choice for ambiguity-sensitive decision makers. It characterizes Consistent Planning via axioms on preferences over decision trees. Furthermore, it shows how to elicit conditional preferences from prior preferences. The key axiom is a weakening of Dynamic Consistency, deemed Sophistication. The analysis accommodates arbitrary decision models and updating rules. Hence, the results indicate that (i) ambiguity attitudes, (ii) updating rules, and (iii) sophisticated dynamic choice are mutually orthogonal aspects of preferences. As an example, a characterization of prior-by-prior Bayesian updating and Consistent Planning for arbitrary maxmin-expected utility preferences is presented. The resulting sophisticated MEU preferences are then used to analyze the value of information under ambiguity; a basic trade-off between information acquisition and commitment is highlighted.
Solving for multi-class using orthogonal coding matrices
A common method of generalizing binary to multi-class classification is the
error correcting code (ECC). ECCs may be optimized in a number of ways, for
instance by making them orthogonal. Here we test two types of orthogonal ECCs
on seven different datasets using three types of binary classifier and compare
them with three other multi-class methods: 1 vs. 1, one-versus-the-rest and
random ECCs. The first type of orthogonal ECC, in which the codes contain no
zeros, admits a fast and simple method of solving for the probabilities.
Orthogonal ECCs are always more accurate than random ECCs as predicted by
recent literature. Improvments in uncertainty coefficient (U.C.) range between
0.4--17.5% (0.004--0.139, absolute), while improvements in Brier score between
0.7--10.7%. Unfortunately, orthogonal ECCs are rarely more accurate than 1 vs.
1. Disparities are worst when the methods are paired with logistic regression,
with orthogonal ECCs never beating 1 vs. 1. When the methods are paired with
SVM, the losses are less significant, peaking at 1.5%, relative, 0.011 absolute
in uncertainty coefficient and 6.5% in Brier scores. Orthogonal ECCs are always
the fastest of the five multi-class methods when paired with linear
classifiers. When paired with a piecewise linear classifier, whose
classification speed does not depend on the number of training samples,
classifications using orthogonal ECCs were always more accurate than the the
remaining three methods and also faster than 1 vs. 1. Losses against 1 vs. 1
here were higher, peaking at 1.9% (0.017, absolute), in U.C. and 39% in Brier
score. Gains in speed ranged between 1.1% and over 100%. Whether the speed
increase is worth the penalty in accuracy will depend on the application
- …