5,007 research outputs found
A Model Explanation System: Latest Updates and Extensions
We propose a general model explanation system (MES) for "explaining" the
output of black box classifiers. This paper describes extensions to Turner
(2015), which is referred to frequently in the text. We use the motivating
example of a classifier trained to detect fraud in a credit card transaction
history. The key aspect is that we provide explanations applicable to a single
prediction, rather than provide an interpretable set of parameters. We focus on
explaining positive predictions (alerts). However, the presented methodology is
symmetrically applicable to negative predictions.Comment: Presented at 2016 ICML Workshop on Human Interpretability in Machine
Learning (WHI 2016), New York, N
Stochastic Multiple Choice Learning for Training Diverse Deep Ensembles
Many practical perception systems exist within larger processes that include
interactions with users or additional components capable of evaluating the
quality of predicted solutions. In these contexts, it is beneficial to provide
these oracle mechanisms with multiple highly likely hypotheses rather than a
single prediction. In this work, we pose the task of producing multiple outputs
as a learning problem over an ensemble of deep networks -- introducing a novel
stochastic gradient descent based approach to minimize the loss with respect to
an oracle. Our method is simple to implement, agnostic to both architecture and
loss function, and parameter-free. Our approach achieves lower oracle error
compared to existing methods on a wide range of tasks and deep architectures.
We also show qualitatively that the diverse solutions produced often provide
interpretable representations of task ambiguity
DCSVM: Fast Multi-class Classification using Support Vector Machines
We present DCSVM, an efficient algorithm for multi-class classification using
Support Vector Machines. DCSVM is a divide and conquer algorithm which relies
on data sparsity in high dimensional space and performs a smart partitioning of
the whole training data set into disjoint subsets that are easily separable. A
single prediction performed between two partitions eliminates at once one or
more classes in one partition, leaving only a reduced number of candidate
classes for subsequent steps. The algorithm continues recursively, reducing the
number of classes at each step, until a final binary decision is made between
the last two classes left in the competition. In the best case scenario, our
algorithm makes a final decision between classes in decision
steps and in the worst case scenario DCSVM makes a final decision in
steps, which is not worse than the existent techniques
Thyroid Cancer Malignancy Prediction From Whole Slide Cytopathology Images
We consider preoperative prediction of thyroid cancer based on
ultra-high-resolution whole-slide cytopathology images. Inspired by how human
experts perform diagnosis, our approach first identifies and classifies
diagnostic image regions containing informative thyroid cells, which only
comprise a tiny fraction of the entire image. These local estimates are then
aggregated into a single prediction of thyroid malignancy. Several unique
characteristics of thyroid cytopathology guide our deep-learning-based
approach. While our method is closely related to multiple-instance learning, it
deviates from these methods by using a supervised procedure to extract
diagnostically relevant regions. Moreover, we propose to simultaneously predict
thyroid malignancy, as well as a diagnostic score assigned by a human expert,
which further allows us to devise an improved training strategy. Experimental
results show that the proposed algorithm achieves performance comparable to
human experts, and demonstrate the potential of using the algorithm for
screening and as an assistive tool for the improved diagnosis of indeterminate
cases
Interpret Federated Learning with Shapley Values
Federated Learning is introduced to protect privacy by distributing training
data into multiple parties. Each party trains its own model and a meta-model is
constructed from the sub models. In this way the details of the data are not
disclosed in between each party. In this paper we investigate the model
interpretation methods for Federated Learning, specifically on the measurement
of feature importance of vertical Federated Learning where feature space of the
data is divided into two parties, namely host and guest. For host party to
interpret a single prediction of vertical Federated Learning model, the
interpretation results, namely the feature importance, are very likely to
reveal the protected data from guest party. We propose a method to balance the
model interpretability and data privacy in vertical Federated Learning by using
Shapley values to reveal detailed feature importance for host features and a
unified importance value for federated guest features. Our experiments indicate
robust and informative results for interpreting Federated Learning models
The Strategy of Experts for Repeated Predictions
We investigate the behavior of experts who seek to make predictions with
maximum impact on an audience. At a known future time, a certain continuous
random variable will be realized. A public prediction gradually converges to
the outcome, and an expert has access to a more accurate prediction. We study
when the expert should reveal his information, when his reward is based on a
proper scoring rule (e.g., is proportional to the change in log-likelihood of
the outcome).
In Azar et. al. (2016), we analyzed the case where the expert may make a
single prediction. In this paper, we analyze the case where the expert is
allowed to revise previous predictions. This leads to a rather different set of
dilemmas for the strategic expert. We find that it is optimal for the expert to
always tell the truth, and to make a new prediction whenever he has a new
signal. We characterize the expert's expectation for his total reward, and show
asymptotic limitsComment: To appear in WINE 201
Global Model Interpretation via Recursive Partitioning
In this work, we propose a simple but effective method to interpret black-box
machine learning models globally. That is, we use a compact binary tree, the
interpretation tree, to explicitly represent the most important decision rules
that are implicitly contained in the black-box machine learning models. This
tree is learned from the contribution matrix which consists of the
contributions of input variables to predicted scores for each single
prediction. To generate the interpretation tree, a unified process recursively
partitions the input variable space by maximizing the difference in the average
contribution of the split variable between the divided spaces. We demonstrate
the effectiveness of our method in diagnosing machine learning models on
multiple tasks. Also, it is useful for new knowledge discovery as such insights
are not easily identifiable when only looking at single predictions. In
general, our work makes it easier and more efficient for human beings to
understand machine learning models.Comment: Accepted by The 4th IEEE International Conference on Data Science and
Systems (DSS-2018
A similarity-based implementation of the Schaake shuffle
Contemporary weather forecasts are typically based on ensemble prediction
systems, which consist of multiple runs of numerical weather prediction models
that vary with respect to in the initial conditions and/or the the
parameterization of the atmosphere. Ensemble forecasts are frequently biased
and show dispersion errors and thus need to be statistically postprocessed.
However, current postprocessing approaches are often univariate and apply to a
single weather quantity at a single location and for a single prediction
horizon only, thereby failing to account for potentially crucial dependence
structures. Non-parametric multivariate postprocessing methods based on
empirical copulas, such as ensemble copula coupling or the Schaake shuffle, can
address this shortcoming. A specific implementation of the Schaake shuffle,
called the SimSchaake approach, is introduced. The SimSchaake method aggregates
univariately postprocessed ensemble forecasts using dependence patterns from
past observations. Specifically, the observations are taken from historical
dates at which the ensemble forecasts resembled the current ensemble prediction
with respect to a specific similarity criterion. The SimSchaake ensemble
outperforms all reference ensembles in an application to ensemble forecasts for
surface temperature from the European Centre for Medium-Range Weather
Forecasts
Improving Object Detection from Scratch via Gated Feature Reuse
In this paper, we present a simple and parameter-efficient drop-in module for
one-stage object detectors like SSD when learning from scratch (i.e., without
pre-trained models). We call our module GFR (Gated Feature Reuse), which
exhibits two main advantages. First, we introduce a novel gate-controlled
prediction strategy enabled by Squeeze-and-Excitation to adaptively enhance or
attenuate supervision at different scales based on the input object size. As a
result, our model is more effective in detecting diverse sizes of objects.
Second, we propose a feature-pyramids structure to squeeze rich spatial and
semantic features into a single prediction layer, which strengthens feature
representation and reduces the number of parameters to learn. We apply the
proposed structure on DSOD and SSD detection frameworks, and evaluate the
performance on PASCAL VOC 2007, 2012 and COCO datasets. With fewer model
parameters, GFR-DSOD outperforms the baseline DSOD by 1.4%, 1.1%, 1.7% and
0.6%, respectively. GFR-SSD also outperforms the original SSD and SSD with
dense prediction by 3.6% and 2.8% on VOC 2007 dataset. Code is available at:
https://github.com/szq0214/GFR-DSOD .Comment: Accepted in BMVC 2019. Code: https://github.com/szq0214/GFR-DSO
Learning with Feature Evolvable Streams
Learning with streaming data has attracted much attention during the past few
years. Though most studies consider data stream with fixed features, in real
practice the features may be evolvable. For example, features of data gathered
by limited-lifespan sensors will change when these sensors are substituted by
new ones. In this paper, we propose a novel learning paradigm: \emph{Feature
Evolvable Streaming Learning} where old features would vanish and new features
would occur. Rather than relying on only the current features, we attempt to
recover the vanished features and exploit it to improve performance.
Specifically, we learn two models from the recovered features and the current
features, respectively. To benefit from the recovered features, we develop two
ensemble methods. In the first method, we combine the predictions from two
models and theoretically show that with the assistance of old features, the
performance on new features can be improved. In the second approach, we
dynamically select the best single prediction and establish a better
performance guarantee when the best model switches. Experiments on both
synthetic and real data validate the effectiveness of our proposal
- …