25 research outputs found
Classification Uncertainty of Deep Neural Networks Based on Gradient Information
We study the quantification of uncertainty of Convolutional Neural Networks
(CNNs) based on gradient metrics. Unlike the classical softmax entropy, such
metrics gather information from all layers of the CNN. We show for the EMNIST
digits data set that for several such metrics we achieve the same meta
classification accuracy -- i.e. the task of classifying predictions as correct
or incorrect without knowing the actual label -- as for entropy thresholding.
We apply meta classification to unknown concepts (out-of-distribution samples)
-- EMNIST/Omniglot letters, CIFAR10 and noise -- and demonstrate that meta
classification rates for unknown concepts can be increased when using entropy
together with several gradient based metrics as input quantities for a meta
classifier. Meta classifiers only trained on the uncertainty metrics of known
concepts, i.e. EMNIST digits, usually do not perform equally well for all
unknown concepts. If we however allow the meta classifier to be trained on
uncertainty metrics for some out-of-distribution samples, meta classification
for concepts remote from EMNIST digits (then termed known unknowns) can be
improved considerably
Towards a Simple Approach to Multi-step Model-based Reinforcement Learning
When environmental interaction is expensive, model-based reinforcement
learning offers a solution by planning ahead and avoiding costly mistakes.
Model-based agents typically learn a single-step transition model. In this
paper, we propose a multi-step model that predicts the outcome of an action
sequence with variable length. We show that this model is easy to learn, and
that the model can make policy-conditional predictions. We report preliminary
results that show a clear advantage for the multi-step model compared to its
one-step counterpart
Uncertainty Measures and Prediction Quality Rating for the Semantic Segmentation of Nested Multi Resolution Street Scene Images
In the semantic segmentation of street scenes the reliability of the
prediction and therefore uncertainty measures are of highest interest. We
present a method that generates for each input image a hierarchy of nested
crops around the image center and presents these, all re-scaled to the same
size, to a neural network for semantic segmentation. The resulting softmax
outputs are then post processed such that we can investigate mean and variance
over all image crops as well as mean and variance of uncertainty heat maps
obtained from pixel-wise uncertainty measures, like the entropy, applied to
each crop's softmax output. In our tests, we use the publicly available
DeepLabv3+ MobilenetV2 network (trained on the Cityscapes dataset) and
demonstrate that the incorporation of crops improves the quality of the
prediction and that we obtain more reliable uncertainty measures. These are
then aggregated over predicted segments for either classifying between IoU=0
and IoU>0 (meta classification) or predicting the IoU via linear regression
(meta regression). The latter yields reliable performance estimates for
segmentation networks, in particular useful in the absence of ground truth. For
the task of meta classification we obtain a classification accuracy of
and an AUROC of . For meta regression we obtain an
value of . These results yield significant improvements compared to
other approaches
Prediction Error Meta Classification in Semantic Segmentation: Detection via Aggregated Dispersion Measures of Softmax Probabilities
We present a method that "meta" classifies whether seg-ments predicted by a
semantic segmentation neural networkintersect with the ground truth. For this
purpose, we employ measures of dispersion for predicted pixel-wise class
probability distributions, like classification entropy, that yield heat maps of
the input scene's size. We aggregate these dispersion measures segment-wise and
derive metrics that are well-correlated with the segment-wise IoU of prediction
and ground truth. This procedure yields an almost plug and play post-processing
tool to rate the prediction quality of semantic segmentation networks on
segment level. This is especially relevant for monitoring neural networks in
online applications like automated driving or medical imaging where reliability
is of utmost importance. In our tests, we use publicly available
state-of-the-art networks trained on the Cityscapes dataset and the BraTS2017
dataset and analyze the predictive power of different metrics as well as
different sets of metrics. To this end, we compute logistic LASSO regression
fits for the task of classifying IoU=0 vs. IoU>0 per segment and obtain AUROC
values of up to 91.55%. We complement these tests with linear regression fits
to predict the segment-wise IoU and obtain prediction standard deviations of
down to 0.130 as well as values of up to 84.15%. We show that these
results clearly outperform standard approaches
The NiuTrans End-to-End Speech Translation System for IWSLT 2021 Offline Task
This paper describes the submission of the NiuTrans end-to-end speech
translation system for the IWSLT 2021 offline task, which translates from the
English audio to German text directly without intermediate transcription. We
use the Transformer-based model architecture and enhance it by Conformer,
relative position encoding, and stacked acoustic and textual encoding. To
augment the training data, the English transcriptions are translated to German
translations. Finally, we employ ensemble decoding to integrate the predictions
from several models trained with the different datasets. Combining these
techniques, we achieve 33.84 BLEU points on the MuST-C En-De test set, which
shows the enormous potential of the end-to-end model.Comment: IWSLT 202
Non-Markovian Control with Gated End-to-End Memory Policy Networks
Partially observable environments present an important open challenge in the
domain of sequential control learning with delayed rewards. Despite numerous
attempts during the two last decades, the majority of reinforcement learning
algorithms and associated approximate models, applied to this context, still
assume Markovian state transitions. In this paper, we explore the use of a
recently proposed attention-based model, the Gated End-to-End Memory Network,
for sequential control. We call the resulting model the Gated End-to-End Memory
Policy Network. More precisely, we use a model-free value-based algorithm to
learn policies for partially observed domains using this memory-enhanced neural
network. This model is end-to-end learnable and it features unbounded memory.
Indeed, because of its attention mechanism and associated non-parametric
memory, the proposed model allows us to define an attention mechanism over the
observation stream unlike recurrent models. We show encouraging results that
illustrate the capability of our attention-based model in the context of the
continuous-state non-stationary control problem of stock trading. We also
present an OpenAI Gym environment for simulated stock exchange and explain its
relevance as a benchmark for the field of non-Markovian decision process
learning.Comment: 11 pages, 1 figure, 1 tabl
Improving LSTM-CTC based ASR performance in domains with limited training data
This paper addresses the observed performance gap between automatic speech
recognition (ASR) systems based on Long Short Term Memory (LSTM) neural
networks trained with the connectionist temporal classification (CTC) loss
function and systems based on hybrid Deep Neural Networks (DNNs) trained with
the cross entropy (CE) loss function on domains with limited data. We step
through a number of experiments that show incremental improvements on a
baseline EESEN toolkit based LSTM-CTC ASR system trained on the Librispeech
100hr (train-clean-100) corpus. Our results show that with effective
combination of data augmentation and regularization, a LSTM-CTC based system
can exceed the performance of a strong Kaldi based baseline trained on the same
data.Comment: 13 pages Revised Figure
Incremental Learning of Discrete Planning Domains from Continuous Perceptions
We propose a framework for learning discrete deterministic planning domains.
In this framework, an agent learns the domain by observing the action effects
through continuous features that describe the state of the environment after
the execution of each action. Besides, the agent learns its perception
function, i.e., a probabilistic mapping between state variables and sensor data
represented as a vector of continuous random variables called perception
variables. We define an algorithm that updates the planning domain and the
perception function by (i) introducing new states, either by extending the
possible values of state variables, or by weakening their constraints; (ii)
adapts the perception function to fit the observed data (iii) adapts the
transition function on the basis of the executed actions and the effects
observed via the perception function. The framework is able to deal with
exogenous events that happen in the environment.Comment: Corrected lines 12 and 19 of algorithm 1: AL
Incremental learning abstract discrete planning domains and mappings to continuous perceptions
Most of the works on planning and learning, e.g., planning by (model based)
reinforcement learning, are based on two main assumptions: (i) the set of
states of the planning domain is fixed; (ii) the mapping between the
observations from the real word and the states is implicitly assumed or learned
offline, and it is not part of the planning domain. Consequently, the focus is
on learning the transitions between states. In this paper, we drop such
assumptions. We provide a formal framework in which (i) the agent can learn
dynamically new states of the planning domain; (ii) the mapping between
abstract states and the perception from the real world, represented by
continuous variables, is part of the planning domain; (iii) such mapping is
learned and updated along the "life" of the agent. We define an algorithm that
interleaves planning, acting, and learning, and allows the agent to update the
planning domain depending on how much it trusts the model w.r.t. the new
experiences learned by executing actions. We define a measure of coherence
between the planning domain and the real world as perceived by the agent. We
test our approach showing that the agent learns increasingly coherent models,
and that the system can scale to deal with models with an order of
states
The Utility of Abstaining in Binary Classification
We explore the problem of binary classification in machine learning, with a
twist - the classifier is allowed to abstain on any datum, professing ignorance
about the true class label without committing to any prediction. This is
directly motivated by applications like medical diagnosis and fraud risk
assessment, in which incorrect predictions have potentially calamitous
consequences. We focus on a recent spate of theoretically driven work in this
area that characterizes how allowing abstentions can lead to fewer errors in
very general settings. Two areas are highlighted: the surprising possibility of
zero-error learning, and the fundamental tradeoff between predicting
sufficiently often and avoiding incorrect predictions. We review efficient
algorithms with provable guarantees for each of these areas. We also discuss
connections to other scenarios, notably active learning, as they suggest
promising directions of further inquiry in this emerging field.Comment: Short surve