2,483 research outputs found
Recommended from our members
Explainable and Advisable Learning for Self-driving Vehicles
Deep neural perception and control networks are likely to be a key component of self-driving vehicles. These models need to be explainable - they should provide easy-to-interpret rationales for their behavior - so that passengers, insurance companies, law enforcement, developers, etc., can understand what triggered a particular behavior. Explanations may be triggered by the neural controller, namely introspective explanations, or informed by the neural controller's output, namely rationalizations. Our work has focused on the challenge of generating introspective explanations of deep models for self-driving vehicles. In Chapter 3, we begin by exploring the use of visual explanations. These explanations take the form of real-time highlighted regions of an image that causally influence the network's output (steering control). In the first stage, we use a visual attention model to train a convolution network end-to-end from images to steering angle. The attention model highlights image regions that potentially influence the network's output. Some of these are true influences, but some are spurious. We then apply a causal filtering step to determine which input regions actually influence the output. This produces more succinct visual explanations and more accurately exposes the network's behavior. In Chapter 4, we add an attention-based video-to-text model to produce textual explanations of model actions, e.g. "the car slows down because the road is wet". The attention maps of controller and explanation model are aligned so that explanations are grounded in the parts of the scene that mattered to the controller. We explore two approaches to attention alignment, strong- and weak-alignment. These explainable systems represent an externalization of tacit knowledge. The network's opaque reasoning is simplified to a situation-specific dependence on a visible object in the image. This makes them brittle and potentially unsafe in situations that do not match training data. In Chapter 5, we propose to address this issue by augmenting training data with natural language advice from a human. Advice includes guidance about what to do and where to attend. We present the first step toward advice-giving, where we train an end-to-end vehicle controller that accepts advice. The controller adapts the way it attends to the scene (visual attention) and the control (steering and speed). Further, in Chapter 6, we propose a new approach that learns vehicle control with the help of long-term (global) human advice. Specifically, our system learns to summarize its visual observations in natural language, predict an appropriate action response (e.g. "I see a pedestrian crossing, so I stop"), and predict the controls, accordingly
Learning Support and Trivial Prototypes for Interpretable Image Classification
Prototypical part network (ProtoPNet) methods have been designed to achieve
interpretable classification by associating predictions with a set of training
prototypes, which we refer to as trivial prototypes because they are trained to
lie far from the classification boundary in the feature space. Note that it is
possible to make an analogy between ProtoPNet and support vector machine (SVM)
given that the classification from both methods relies on computing similarity
with a set of training points (i.e., trivial prototypes in ProtoPNet, and
support vectors in SVM). However, while trivial prototypes are located far from
the classification boundary, support vectors are located close to this
boundary, and we argue that this discrepancy with the well-established SVM
theory can result in ProtoPNet models with inferior classification accuracy. In
this paper, we aim to improve the classification of ProtoPNet with a new method
to learn support prototypes that lie near the classification boundary in the
feature space, as suggested by the SVM theory. In addition, we target the
improvement of classification results with a new model, named ST-ProtoPNet,
which exploits our support prototypes and the trivial prototypes to provide
more effective classification. Experimental results on CUB-200-2011, Stanford
Cars, and Stanford Dogs datasets demonstrate that ST-ProtoPNet achieves
state-of-the-art classification accuracy and interpretability results. We also
show that the proposed support prototypes tend to be better localised in the
object of interest rather than in the background region
I saw, I conceived, I concluded: Progressive Concepts as Bottlenecks
Concept bottleneck models (CBMs) include a bottleneck of human-interpretable
concepts providing explainability and intervention during inference by
correcting the predicted, intermediate concepts. This makes CBMs attractive for
high-stakes decision-making. In this paper, we take the quality assessment of
fetal ultrasound scans as a real-life use case for CBM decision support in
healthcare. For this case, simple binary concepts are not sufficiently
reliable, as they are mapped directly from images of highly variable quality,
for which variable model calibration might lead to unstable binarized concepts.
Moreover, scalar concepts do not provide the intuitive spatial feedback
requested by users.
To address this, we design a hierarchical CBM imitating the sequential expert
decision-making process of "seeing", "conceiving" and "concluding". Our model
first passes through a layer of visual, segmentation-based concepts, and next a
second layer of property concepts directly associated with the decision-making
task. We note that experts can intervene on both the visual and property
concepts during inference. Additionally, we increase the bottleneck capacity by
considering task-relevant concept interaction.
Our application of ultrasound scan quality assessment is challenging, as it
relies on balancing the (often poor) image quality against an assessment of the
visibility and geometric properties of standardized image content. Our
validation shows that -- in contrast with previous CBM models -- our CBM models
actually outperform equivalent concept-free models in terms of predictive
performance. Moreover, we illustrate how interventions can further improve our
performance over the state-of-the-art
PDiscoNet: Semantically consistent part discovery for fine-grained recognition
Fine-grained classification often requires recognizing specific object parts,
such as beak shape and wing patterns for birds. Encouraging a fine-grained
classification model to first detect such parts and then using them to infer
the class could help us gauge whether the model is indeed looking at the right
details better than with interpretability methods that provide a single
attribution map. We propose PDiscoNet to discover object parts by using only
image-level class labels along with priors encouraging the parts to be:
discriminative, compact, distinct from each other, equivariant to rigid
transforms, and active in at least some of the images. In addition to using the
appropriate losses to encode these priors, we propose to use part-dropout,
where full part feature vectors are dropped at once to prevent a single part
from dominating in the classification, and part feature vector modulation,
which makes the information coming from each part distinct from the perspective
of the classifier. Our results on CUB, CelebA, and PartImageNet show that the
proposed method provides substantially better part discovery performance than
previous methods while not requiring any additional hyper-parameter tuning and
without penalizing the classification performance. The code is available at
https://github.com/robertdvdk/part_detection.Comment: 9 pages, 8 figures, ICC
Deep filter banks for texture recognition, description, and segmentation
Visual textures have played a key role in image understanding because they
convey important semantics of images, and because texture representations that
pool local image descriptors in an orderless manner have had a tremendous
impact in diverse applications. In this paper we make several contributions to
texture understanding. First, instead of focusing on texture instance and
material category recognition, we propose a human-interpretable vocabulary of
texture attributes to describe common texture patterns, complemented by a new
describable texture dataset for benchmarking. Second, we look at the problem of
recognizing materials and texture attributes in realistic imaging conditions,
including when textures appear in clutter, developing corresponding benchmarks
on top of the recently proposed OpenSurfaces dataset. Third, we revisit classic
texture representations, including bag-of-visual-words and the Fisher vectors,
in the context of deep learning and show that these have excellent efficiency
and generalization properties if the convolutional layers of a deep model are
used as filter banks. We obtain in this manner state-of-the-art performance in
numerous datasets well beyond textures, an efficient method to apply deep
features to image regions, as well as benefit in transferring features from one
domain to another.Comment: 29 pages; 13 figures; 8 table
- …