10,147 research outputs found
The Grammar of Interactive Explanatory Model Analysis
The growing need for in-depth analysis of predictive models leads to a series
of new methods for explaining their local and global properties. Which of these
methods is the best? It turns out that this is an ill-posed question. One
cannot sufficiently explain a black-box machine learning model using a single
method that gives only one perspective. Isolated explanations are prone to
misunderstanding, which inevitably leads to wrong or simplistic reasoning. This
problem is known as the Rashomon effect and refers to diverse, even
contradictory interpretations of the same phenomenon. Surprisingly, the
majority of methods developed for explainable machine learning focus on a
single aspect of the model behavior. In contrast, we showcase the problem of
explainability as an interactive and sequential analysis of a model. This paper
presents how different Explanatory Model Analysis (EMA) methods complement each
other and why it is essential to juxtapose them together. The introduced
process of Interactive EMA (IEMA) derives from the algorithmic side of
explainable machine learning and aims to embrace ideas developed in cognitive
sciences. We formalize the grammar of IEMA to describe potential human-model
dialogues. IEMA is implemented in the human-centered framework that adopts
interactivity, customizability and automation as its main traits. Combined,
these methods enhance the responsible approach to predictive modeling.Comment: 17 pages, 10 figures, 3 table
Boosting insights in insurance tariff plans with tree-based machine learning methods
Pricing actuaries typically operate within the framework of generalized
linear models (GLMs). With the upswing of data analytics, our study puts focus
on machine learning methods to develop full tariff plans built from both the
frequency and severity of claims. We adapt the loss functions used in the
algorithms such that the specific characteristics of insurance data are
carefully incorporated: highly unbalanced count data with excess zeros and
varying exposure on the frequency side combined with scarce, but potentially
long-tailed data on the severity side. A key requirement is the need for
transparent and interpretable pricing models which are easily explainable to
all stakeholders. We therefore focus on machine learning with decision trees:
starting from simple regression trees, we work towards more advanced ensembles
such as random forests and boosted trees. We show how to choose the optimal
tuning parameters for these models in an elaborate cross-validation scheme, we
present visualization tools to obtain insights from the resulting models and
the economic value of these new modeling approaches is evaluated. Boosted trees
outperform the classical GLMs, allowing the insurer to form profitable
portfolios and to guard against potential adverse risk selection
The Inverse Shapley Value Problem
For a weighted voting scheme used by voters to choose between two
candidates, the \emph{Shapley-Shubik Indices} (or {\em Shapley values}) of
provide a measure of how much control each voter can exert over the overall
outcome of the vote. Shapley-Shubik indices were introduced by Lloyd Shapley
and Martin Shubik in 1954 \cite{SS54} and are widely studied in social choice
theory as a measure of the "influence" of voters. The \emph{Inverse Shapley
Value Problem} is the problem of designing a weighted voting scheme which
(approximately) achieves a desired input vector of values for the
Shapley-Shubik indices. Despite much interest in this problem no provably
correct and efficient algorithm was known prior to our work.
We give the first efficient algorithm with provable performance guarantees
for the Inverse Shapley Value Problem. For any constant \eps > 0 our
algorithm runs in fixed poly time (the degree of the polynomial is
independent of \eps) and has the following performance guarantee: given as
input a vector of desired Shapley values, if any "reasonable" weighted voting
scheme (roughly, one in which the threshold is not too skewed) approximately
matches the desired vector of values to within some small error, then our
algorithm explicitly outputs a weighted voting scheme that achieves this vector
of Shapley values to within error \eps. If there is a "reasonable" voting
scheme in which all voting weights are integers at most \poly(n) that
approximately achieves the desired Shapley values, then our algorithm runs in
time \poly(n) and outputs a weighted voting scheme that achieves the target
vector of Shapley values to within error $\eps=n^{-1/8}.
CLEAR: Covariant LEAst-square Re-fitting with applications to image restoration
In this paper, we propose a new framework to remove parts of the systematic
errors affecting popular restoration algorithms, with a special focus for image
processing tasks. Generalizing ideas that emerged for regularization,
we develop an approach re-fitting the results of standard methods towards the
input data. Total variation regularizations and non-local means are special
cases of interest. We identify important covariant information that should be
preserved by the re-fitting method, and emphasize the importance of preserving
the Jacobian (w.r.t. the observed signal) of the original estimator. Then, we
provide an approach that has a "twicing" flavor and allows re-fitting the
restored signal by adding back a local affine transformation of the residual
term. We illustrate the benefits of our method on numerical simulations for
image restoration tasks
Multiclass Learning with Simplex Coding
In this paper we discuss a novel framework for multiclass learning, defined
by a suitable coding/decoding strategy, namely the simplex coding, that allows
to generalize to multiple classes a relaxation approach commonly used in binary
classification. In this framework, a relaxation error analysis can be developed
avoiding constraints on the considered hypotheses class. Moreover, we show that
in this setting it is possible to derive the first provably consistent
regularized method with training/tuning complexity which is independent to the
number of classes. Tools from convex analysis are introduced that can be used
beyond the scope of this paper
A Mathematical Formalization of Hierarchical Temporal Memory's Spatial Pooler
Hierarchical temporal memory (HTM) is an emerging machine learning algorithm,
with the potential to provide a means to perform predictions on spatiotemporal
data. The algorithm, inspired by the neocortex, currently does not have a
comprehensive mathematical framework. This work brings together all aspects of
the spatial pooler (SP), a critical learning component in HTM, under a single
unifying framework. The primary learning mechanism is explored, where a maximum
likelihood estimator for determining the degree of permanence update is
proposed. The boosting mechanisms are studied and found to be only relevant
during the initial few iterations of the network. Observations are made
relating HTM to well-known algorithms such as competitive learning and
attribute bagging. Methods are provided for using the SP for classification as
well as dimensionality reduction. Empirical evidence verifies that given the
proper parameterizations, the SP may be used for feature learning.Comment: This work was submitted for publication and is currently under
review. For associated code, see https://github.com/tehtechguy/mHT
Optimization by gradient boosting
Gradient boosting is a state-of-the-art prediction technique that
sequentially produces a model in the form of linear combinations of simple
predictors---typically decision trees---by solving an infinite-dimensional
convex optimization problem. We provide in the present paper a thorough
analysis of two widespread versions of gradient boosting, and introduce a
general framework for studying these algorithms from the point of view of
functional optimization. We prove their convergence as the number of iterations
tends to infinity and highlight the importance of having a strongly convex risk
functional to minimize. We also present a reasonable statistical context
ensuring consistency properties of the boosting predictors as the sample size
grows. In our approach, the optimization procedures are run forever (that is,
without resorting to an early stopping strategy), and statistical
regularization is basically achieved via an appropriate penalization of
the loss and strong convexity arguments
- …