6 research outputs found
DOS: Diverse Outlier Sampling for Out-of-Distribution Detection
Modern neural networks are known to give overconfident prediction for
out-of-distribution inputs when deployed in the open world. It is common
practice to leverage a surrogate outlier dataset to regularize the model during
training, and recent studies emphasize the role of uncertainty in designing the
sampling strategy for outlier dataset. However, the OOD samples selected solely
based on predictive uncertainty can be biased towards certain types, which may
fail to capture the full outlier distribution. In this work, we empirically
show that diversity is critical in sampling outliers for OOD detection
performance. Motivated by the observation, we propose a straightforward and
novel sampling strategy named DOS (Diverse Outlier Sampling) to select diverse
and informative outliers. Specifically, we cluster the normalized features at
each iteration, and the most informative outlier from each cluster is selected
for model training with absent category loss. With DOS, the sampled outliers
efficiently shape a globally compact decision boundary between ID and OOD data.
Extensive experiments demonstrate the superiority of DOS, reducing the average
FPR95 by up to 25.79% on CIFAR-100 with TI-300K
Robust Uncertainty Estimation for Classification of Maritime Objects
We explore the use of uncertainty estimation in the maritime domain, showing
the efficacy on toy datasets (CIFAR10) and proving it on an in-house dataset,
SHIPS. We present a method joining the intra-class uncertainty achieved using
Monte Carlo Dropout, with recent discoveries in the field of outlier detection,
to gain more holistic uncertainty measures. We explore the relationship between
the introduced uncertainty measures and examine how well they work on CIFAR10
and in a real-life setting. Our work improves the FPR95 by 8% compared to the
current highest-performing work when the models are trained without
out-of-distribution data. We increase the performance by 77% compared to a
vanilla implementation of the Wide ResNet. We release the SHIPS dataset and
show the effectiveness of our method by improving the FPR95 by 44.2% with
respect to the baseline. Our approach is model agnostic, easy to implement, and
often does not require model retraining
SCOUT: Self-aware Discriminant Counterfactual Explanations
The problem of counterfactual visual explanations is considered. A new family
of discriminant explanations is introduced. These produce heatmaps that
attribute high scores to image regions informative of a classifier prediction
but not of a counter class. They connect attributive explanations, which are
based on a single heat map, to counterfactual explanations, which account for
both predicted class and counter class. The latter are shown to be computable
by combination of two discriminant explanations, with reversed class pairs. It
is argued that self-awareness, namely the ability to produce classification
confidence scores, is important for the computation of discriminant
explanations, which seek to identify regions where it is easy to discriminate
between prediction and counter class. This suggests the computation of
discriminant explanations by the combination of three attribution maps. The
resulting counterfactual explanations are optimization free and thus much
faster than previous methods. To address the difficulty of their evaluation, a
proxy task and set of quantitative metrics are also proposed. Experiments under
this protocol show that the proposed counterfactual explanations outperform the
state of the art while achieving much higher speeds, for popular networks. In a
human-learning machine teaching experiment, they are also shown to improve mean
student accuracy from chance level to 95\%.Comment: Accepted to CVPR202
Optimal Parameter and Neuron Pruning for Out-of-Distribution Detection
For a machine learning model deployed in real world scenarios, the ability of
detecting out-of-distribution (OOD) samples is indispensable and challenging.
Most existing OOD detection methods focused on exploring advanced training
skills or training-free tricks to prevent the model from yielding overconfident
confidence score for unknown samples. The training-based methods require
expensive training cost and rely on OOD samples which are not always available,
while most training-free methods can not efficiently utilize the prior
information from the training data. In this work, we propose an
\textbf{O}ptimal \textbf{P}arameter and \textbf{N}euron \textbf{P}runing
(\textbf{OPNP}) approach, which aims to identify and remove those parameters
and neurons that lead to over-fitting. The main method is divided into two
steps. In the first step, we evaluate the sensitivity of the model parameters
and neurons by averaging gradients over all training samples. In the second
step, the parameters and neurons with exceptionally large or close to zero
sensitivities are removed for prediction. Our proposal is training-free,
compatible with other post-hoc methods, and exploring the information from all
training data. Extensive experiments are performed on multiple OOD detection
tasks and model architectures, showing that our proposed OPNP consistently
outperforms the existing methods by a large margin.Comment: Accepted by NeurIPS 2023. 19 page
Data Optimization in Deep Learning: A Survey
Large-scale, high-quality data are considered an essential factor for the
successful application of many deep learning techniques. Meanwhile, numerous
real-world deep learning tasks still have to contend with the lack of
sufficient amounts of high-quality data. Additionally, issues such as model
robustness, fairness, and trustworthiness are also closely related to training
data. Consequently, a huge number of studies in the existing literature have
focused on the data aspect in deep learning tasks. Some typical data
optimization techniques include data augmentation, logit perturbation, sample
weighting, and data condensation. These techniques usually come from different
deep learning divisions and their theoretical inspirations or heuristic
motivations may seem unrelated to each other. This study aims to organize a
wide range of existing data optimization methodologies for deep learning from
the previous literature, and makes the effort to construct a comprehensive
taxonomy for them. The constructed taxonomy considers the diversity of split
dimensions, and deep sub-taxonomies are constructed for each dimension. On the
basis of the taxonomy, connections among the extensive data optimization
methods for deep learning are built in terms of four aspects. We probe into
rendering several promising and interesting future directions. The constructed
taxonomy and the revealed connections will enlighten the better understanding
of existing methods and the design of novel data optimization techniques.
Furthermore, our aspiration for this survey is to promote data optimization as
an independent subdivision of deep learning. A curated, up-to-date list of
resources related to data optimization in deep learning is available at
\url{https://github.com/YaoRujing/Data-Optimization}
Apprentissage supervisés sous contraintes
As supervised learning occupies a larger and larger place in our everyday life, it is met with more and more constrained settings. Dealing with those constraints is a key to fostering new progress in the field, expanding ever further the limit of machine learning---a likely necessary step to reach artificial general intelligence.
Supervised learning is an inductive paradigm in which time and data are refined into knowledge, in the form of predictive models. Models which can sometimes be, it must be conceded, opaque, memory demanding and energy consuming. Given this setting, a constraint can mean any number of things. Essentially, a constraint is anything that stand in the way of supervised learning, be it the lack of time, of memory, of data, or of understanding.
Additionally, the scope of applicability of supervised learning is so vast it can appear daunting. Usefulness can be found in areas including medical analysis and autonomous driving---areas for which strong guarantees are required.
All those constraints (time, memory, data, interpretability, reliability) might somewhat conflict with the traditional goal of supervised learning. In such a case, finding a balance between the constraints and the standard objective is problem-dependent, thus requiring generic solutions. Alternatively, concerns might arise after learning, in which case solutions must be developed under sub-optimal conditions, resulting in constraints adding up. An example of such situations is trying to enforce reliability once the data is no longer available.
After detailing the background (what is supervised learning and why is it difficult, what algorithms will be used, where does it land in the broader scope of knowledge) in which this thesis integrates itself, we will discuss four different scenarios.
The first one is about trying to learn a good decision forest model of a limited size, without learning first a large model and then compressing it. For that, we have developed the Globally Induced Forest (GIF) algorithm, which mixes local and global optimizations to produce accurate predictions under memory constraints in reasonable time. More specifically, the global part allows to sidestep the redundancy inherent in traditional decision forests.
It is shown that the proposed method is more than competitive with standard tree-based ensembles under corresponding constraints, and can sometimes even surpass much larger models.
The second scenario corresponds to the example given above: trying to enforce reliability without data. More specifically, the focus in on out-of-distribution (OOD) detection: recognizing samples which do not come from the original distribution the model was learned from. Tackling this problem with utter lack of data is challenging. Our investigation focuses on image classification with convolutional neural networks. Indicators which can be computed alongside the prediction with little additional cost are proposed. These indicators prove useful, stable and complementary for OOD detection. We also introduce a surprisingly simple, yet effective summary indicator, shown to perform well across several networks and datasets. It can easily be tuned further as soon as samples become available. Overall, interesting results can be reached in all but the most severe settings, for which it was a priori doubtful to come up with a data-free solution.
The third scenario relates to transferring the knowledge of a large model in a smaller one in the absence of data. To do so, we propose to leverage a collection of unlabeled data which are easy to come up with in domains such as image classification. Two schemes are proposed (and then analyzed) to provide optimal transfer. Firstly, we proposed a biasing mechanism in the choice of unlabeled data to use so that the focus is on the more relevant samples. Secondly, we designed a teaching mechanism, applicable for almost all pairs of large and small networks, which allows for a much better knowledge transfer between the networks. Overall, good results are obtainable in decent time provided the collection of data actually contains relevant samples.
The fourth scenario tackles the problem of interpretability: what knowledge can be gleaned more or less indirectly from data. We discuss two subproblems. The first one is to showcase that GIFs (cf. supra) can be used to derive intrinsically interpretable models. The second consists in a comparative study between methods and types of models (namely decision forests and neural networks) for the specific purpose of quantifying how much each variable is important in a given problem. After a preliminary study on benchmark datasets, the analysis turns to a concrete biological problem: inferring gene regulatory network from data. An ambivalent conclusion is reached: neural networks can be made to perform better than decision forests at predicting in almost all instances but struggle to identify the relevant variables in some situations. It would seem that better (motivated) methods need to be proposed for neural networks, especially in the face of highly non-linear problems