27,029 research outputs found
Evidential Deep Learning to Quantify Classification Uncertainty
Deterministic neural nets have been shown to learn effective predictors on a
wide range of machine learning problems. However, as the standard approach is
to train the network to minimize a prediction loss, the resultant model remains
ignorant to its prediction confidence. Orthogonally to Bayesian neural nets
that indirectly infer prediction uncertainty through weight uncertainties, we
propose explicit modeling of the same using the theory of subjective logic. By
placing a Dirichlet distribution on the class probabilities, we treat
predictions of a neural net as subjective opinions and learn the function that
collects the evidence leading to these opinions by a deterministic neural net
from data. The resultant predictor for a multi-class classification problem is
another Dirichlet distribution whose parameters are set by the continuous
output of a neural net. We provide a preliminary analysis on how the
peculiarities of our new loss function drive improved uncertainty estimation.
We observe that our method achieves unprecedented success on detection of
out-of-distribution queries and endurance against adversarial perturbations
BEBP: An Poisoning Method Against Machine Learning Based IDSs
In big data era, machine learning is one of fundamental techniques in
intrusion detection systems (IDSs). However, practical IDSs generally update
their decision module by feeding new data then retraining learning models in a
periodical way. Hence, some attacks that comprise the data for training or
testing classifiers significantly challenge the detecting capability of machine
learning-based IDSs. Poisoning attack, which is one of the most recognized
security threats towards machine learning-based IDSs, injects some adversarial
samples into the training phase, inducing data drifting of training data and a
significant performance decrease of target IDSs over testing data. In this
paper, we adopt the Edge Pattern Detection (EPD) algorithm to design a novel
poisoning method that attack against several machine learning algorithms used
in IDSs. Specifically, we propose a boundary pattern detection algorithm to
efficiently generate the points that are near to abnormal data but considered
to be normal ones by current classifiers. Then, we introduce a Batch-EPD
Boundary Pattern (BEBP) detection algorithm to overcome the limitation of the
number of edge pattern points generated by EPD and to obtain more useful
adversarial samples. Based on BEBP, we further present a moderate but effective
poisoning method called chronic poisoning attack. Extensive experiments on
synthetic and three real network data sets demonstrate the performance of the
proposed poisoning method against several well-known machine learning
algorithms and a practical intrusion detection method named FMIFS-LSSVM-IDS.Comment: 7 pages,5figures, conferenc
Unsupervised and Semi-supervised Learning with Categorical Generative Adversarial Networks
In this paper we present a method for learning a discriminative classifier
from unlabeled or partially labeled data. Our approach is based on an objective
function that trades-off mutual information between observed examples and their
predicted categorical class distribution, against robustness of the classifier
to an adversarial generative model. The resulting algorithm can either be
interpreted as a natural generalization of the generative adversarial networks
(GAN) framework or as an extension of the regularized information maximization
(RIM) framework to robust classification against an optimal adversary. We
empirically evaluate our method - which we dub categorical generative
adversarial networks (or CatGAN) - on synthetic data as well as on challenging
image classification tasks, demonstrating the robustness of the learned
classifiers. We further qualitatively assess the fidelity of samples generated
by the adversarial generator that is learned alongside the discriminative
classifier, and identify links between the CatGAN objective and discriminative
clustering algorithms (such as RIM)
On the Art and Science of Machine Learning Explanations
This text discusses several popular explanatory methods that go beyond the
error measurements and plots traditionally used to assess machine learning
models. Some of the explanatory methods are accepted tools of the trade while
others are rigorously derived and backed by long-standing theory. The methods,
decision tree surrogate models, individual conditional expectation (ICE) plots,
local interpretable model-agnostic explanations (LIME), partial dependence
plots, and Shapley explanations, vary in terms of scope, fidelity, and suitable
application domain. Along with descriptions of these methods, this text
presents real-world usage recommendations supported by a use case and public,
in-depth software examples for reproducibility.Comment: This manuscript is a preprint of the text for an invited talk at the
2019 KDD XAI workshop. A previous version has also appeared in the
proceedings of the Joint Statistical Meetings. Errata and updates available
here: https://github.com/jphall663/kdd_2019. Version 2 incorporated reviewer
feedback. Version 3 includes a minor adjustment to Figure 1. Version 4
corrects a minor typ
Simultaneous Adversarial Training - Learn from Others Mistakes
Adversarial examples are maliciously tweaked images that can easily fool
machine learning techniques, such as neural networks, but they are normally not
visually distinguishable for human beings. One of the main approaches to solve
this problem is to retrain the networks using those adversarial examples,
namely adversarial training. However, standard adversarial training might not
actually change the decision boundaries but cause the problem of gradient
masking, resulting in a weaker ability to generate adversarial examples.
Therefore, it cannot alleviate the problem of black-box attacks, where
adversarial examples generated from other networks can transfer to the targeted
one. In order to reduce the problem of black-box attacks, we propose a novel
method that allows two networks to learn from each others' adversarial examples
and become resilient to black-box attacks. We also combine this method with a
simple domain adaptation to further improve the performance
Making Classifier Chains Resilient to Class Imbalance
Class imbalance is an intrinsic characteristic of multi-label data. Most of
the labels in multi-label data sets are associated with a small number of
training examples, much smaller compared to the size of the data set. Class
imbalance poses a key challenge that plagues most multi-label learning methods.
Ensemble of Classifier Chains (ECC), one of the most prominent multi-label
learning methods, is no exception to this rule, as each of the binary models it
builds is trained from all positive and negative examples of a label. To make
ECC resilient to class imbalance, we first couple it with random undersampling.
We then present two extensions of this basic approach, where we build a varying
number of binary models per label and construct chains of different sizes, in
order to improve the exploitation of majority examples with approximately the
same computational budget. Experimental results on 16 multi-label datasets
demonstrate the effectiveness of the proposed approaches in a variety of
evaluation metrics
Cost-effective Object Detection: Active Sample Mining with Switchable Selection Criteria
Though quite challenging, leveraging large-scale unlabeled or partially
labeled data in learning systems (e.g., model/classifier training) has
attracted increasing attentions due to its fundamental importance. To address
this problem, many active learning (AL) methods have been proposed that employ
up-to-date detectors to retrieve representative minority samples according to
predefined confidence or uncertainty thresholds. However, these AL methods
cause the detectors to ignore the remaining majority samples (i.e., those with
low uncertainty or high prediction confidence). In this work, by developing a
principled active sample mining (ASM) framework, we demonstrate that
cost-effectively mining samples from these unlabeled majority data is key to
training more powerful object detectors while minimizing user effort.
Specifically, our ASM framework involves a switchable sample selection
mechanism for determining whether an unlabeled sample should be manually
annotated via AL or automatically pseudo-labeled via a novel self-learning
process. The proposed process can be compatible with mini-batch based training
(i.e., using a batch of unlabeled or partially labeled data as a one-time
input) for object detection. In addition, a few samples with low-confidence
predictions are selected and annotated via AL. Notably, our method is suitable
for object categories that are not seen in the unlabeled data during the
learning process. Extensive experiments clearly demonstrate that our ASM
framework can achieve performance comparable to that of alternative methods but
with significantly fewer annotations.Comment: Automatically determining whether an unlabeled sample should be
manually annotated or pseudo-labeled via a novel self-learning process
(Accepted by TNNLS 2018) The source code is available at
http://kezewang.com/codes/ASM_ver1.zi
An Interpretable Model with Globally Consistent Explanations for Credit Risk
We propose a possible solution to a public challenge posed by the Fair Isaac
Corporation (FICO), which is to provide an explainable model for credit risk
assessment. Rather than present a black box model and explain it afterwards, we
provide a globally interpretable model that is as accurate as other neural
networks. Our "two-layer additive risk model" is decomposable into subscales,
where each node in the second layer represents a meaningful subscale, and all
of the nonlinearities are transparent. We provide three types of explanations
that are simpler than, but consistent with, the global model. One of these
explanation methods involves solving a minimum set cover problem to find
high-support globally-consistent explanations. We present a new online
visualization tool to allow users to explore the global model and its
explanations
Distilling Knowledge from Deep Networks with Applications to Healthcare Domain
Exponential growth in Electronic Healthcare Records (EHR) has resulted in new
opportunities and urgent needs for discovery of meaningful data-driven
representations and patterns of diseases in Computational Phenotyping research.
Deep Learning models have shown superior performance for robust prediction in
computational phenotyping tasks, but suffer from the issue of model
interpretability which is crucial for clinicians involved in decision-making.
In this paper, we introduce a novel knowledge-distillation approach called
Interpretable Mimic Learning, to learn interpretable phenotype features for
making robust prediction while mimicking the performance of deep learning
models. Our framework uses Gradient Boosting Trees to learn interpretable
features from deep learning models such as Stacked Denoising Autoencoder and
Long Short-Term Memory. Exhaustive experiments on a real-world clinical
time-series dataset show that our method obtains similar or better performance
than the deep learning models, and it provides interpretable phenotypes for
clinical decision making
Chittron: An Automatic Bangla Image Captioning System
Automatic image caption generation aims to produce an accurate description of
an image in natural language automatically. However, Bangla, the fifth most
widely spoken language in the world, is lagging considerably in the research
and development of such domain. Besides, while there are many established data
sets to related to image annotation in English, no such resource exists for
Bangla yet. Hence, this paper outlines the development of "Chittron", an
automatic image captioning system in Bangla. Moreover, to address the data set
availability issue, a collection of 16,000 Bangladeshi contextual images has
been accumulated and manually annotated in Bangla. This data set is then used
to train a model which integrates a pre-trained VGG16 image embedding model
with stacked LSTM layers. The model is trained to predict the caption when the
input is an image, one word at a time. The results show that the model has
successfully been able to learn a working language model and to generate
captions of images quite accurately in many cases. The results are evaluated
mainly qualitatively. However, BLEU scores are also reported. It is expected
that a better result can be obtained with a bigger and more varied data set
- …