5,930 research outputs found
Background Adaptive Faster R-CNN for Semi-Supervised Convolutional Object Detection of Threats in X-Ray Images
Recently, progress has been made in the supervised training of Convolutional
Object Detectors (e.g. Faster R-CNN) for threat recognition in carry-on luggage
using X-ray images. This is part of the Transportation Security
Administration's (TSA's) mission to protect air travelers in the United States.
While more training data with threats may reliably improve performance for this
class of deep algorithm, it is expensive to stage in realistic contexts. By
contrast, data from the real world can be collected quickly with minimal cost.
In this paper, we present a semi-supervised approach for threat recognition
which we call Background Adaptive Faster R-CNN. This approach is a training
method for two-stage object detectors which uses Domain Adaptation methods from
the field of deep learning. The data sources described earlier make two
"domains": a hand-collected data domain of images with threats, and a
real-world domain of images assumed without threats. Two domain discriminators,
one for discriminating object proposals and one for image features, are
adversarially trained to prevent encoding domain-specific information. Without
this penalty a Convolutional Neural Network (CNN) can learn to identify domains
based on superficial characteristics, and minimize a supervised loss function
without improving its ability to recognize objects. For the hand-collected
data, only object proposals and image features from backgrounds are used. The
losses for these domain-adaptive discriminators are added to the Faster R-CNN
losses of images from both domains. This can reduce threat detection false
alarm rates by matching the statistics of extracted features from
hand-collected backgrounds to real world data. Performance improvements are
demonstrated on two independently-collected datasets of labeled threats
f-VAEGAN-D2: A Feature Generating Framework for Any-Shot Learning
When labeled training data is scarce, a promising data augmentation approach
is to generate visual features of unknown classes using their attributes. To
learn the class conditional distribution of CNN features, these models rely on
pairs of image features and class attributes. Hence, they can not make use of
the abundance of unlabeled data samples. In this paper, we tackle any-shot
learning problems i.e. zero-shot and few-shot, in a unified feature generating
framework that operates in both inductive and transductive learning settings.
We develop a conditional generative model that combines the strength of VAE and
GANs and in addition, via an unconditional discriminator, learns the marginal
feature distribution of unlabeled images. We empirically show that our model
learns highly discriminative CNN features for five datasets, i.e. CUB, SUN, AWA
and ImageNet, and establish a new state-of-the-art in any-shot learning, i.e.
inductive and transductive (generalized) zero- and few-shot learning settings.
We also demonstrate that our learned features are interpretable: we visualize
them by inverting them back to the pixel space and we explain them by
generating textual arguments of why they are associated with a certain label.Comment: Accepted at CVPR 201
Label-Assemble: Leveraging Multiple Datasets with Partial Labels
The success of deep learning relies heavily on large and diverse datasets
with extensive labels, but we often only have access to several small datasets
associated with partial labels. In this paper, we start a new initiative,
"Label-Assemble", that aims to unleash the full potential of partially labeled
data from an assembly of public datasets. Specifically, we introduce a new
dynamic adapter to encode different visual tasks, which addresses the
challenges of incomparable, heterogeneous, or even conflicting labeling
protocols. We also employ pseudo-labeling and consistency constraints to
harness data with missing labels and to mitigate the domain gap across
datasets. From rigorous evaluations on three natural imaging and six medical
imaging tasks, we discover that learning from "negative examples" facilitates
both classification and segmentation of classes of interest. This sheds new
light on the computer-aided diagnosis of rare diseases and emerging pandemics,
wherein "positive examples" are hard to collect, yet "negative examples" are
relatively easier to assemble. Apart from exceeding prior arts in the ChestXray
benchmark, our model is particularly strong in identifying diseases of minority
classes, yielding over 3-point improvement on average. Remarkably, when using
existing partial labels, our model performance is on-par with that using full
labels, eliminating the need for an additional 40% of annotation costs. Code
will be made available at https://github.com/MrGiovanni/LabelAssemble
Online learning and detection of faces with low human supervision
The final publication is available at link.springer.comWe present an efficient,online,and interactive approach for computing a classifier, called Wild Lady Ferns (WiLFs), for face learning and detection using small human supervision. More precisely, on the one hand, WiLFs combine online boosting and extremely randomized trees (Random Ferns) to compute progressively an efficient and discriminative classifier. On the other hand, WiLFs use an interactive human-machine approach that combines two complementary learning strategies to reduce considerably the degree of human supervision during learning. While the first strategy corresponds to query-by-boosting active learning, that requests human assistance over difficult samples in function of the classifier confidence, the second strategy refers to a memory-based learning which uses ¿ Exemplar-based Nearest Neighbors (¿ENN) to assist automatically the classifier. A pre-trained Convolutional Neural Network (CNN) is used to perform ¿ENN with high-level feature descriptors. The proposed approach is therefore fast (WilFs run in 1 FPS using a code not fully optimized), accurate (we obtain detection rates over 82% in complex datasets), and labor-saving (human assistance percentages of less than 20%).
As a byproduct, we demonstrate that WiLFs also perform semi-automatic annotation during learning, as while the classifier is being computed, WiLFs are discovering faces instances in input images which are used subsequently for training online the classifier. The advantages of our approach are demonstrated in synthetic and publicly available databases, showing comparable detection rates as offline approaches that require larger amounts of handmade training data.Peer ReviewedPostprint (author's final draft
Latent Space Regularization for Unsupervised Domain Adaptation in Semantic Segmentation
Deep convolutional neural networks for semantic segmentation achieve
outstanding accuracy, however they also have a couple of major drawbacks:
first, they do not generalize well to distributions slightly different from the
one of the training data; second, they require a huge amount of labeled data
for their optimization. In this paper, we introduce feature-level space-shaping
regularization strategies to reduce the domain discrepancy in semantic
segmentation. In particular, for this purpose we jointly enforce a clustering
objective, a perpendicularity constraint and a norm alignment goal on the
feature vectors corresponding to source and target samples. Additionally, we
propose a novel measure able to capture the relative efficacy of an adaptation
strategy compared to supervised training. We verify the effectiveness of such
methods in the autonomous driving setting achieving state-of-the-art results in
multiple synthetic-to-real road scenes benchmarks.Comment: Accepted at CVPR-WAD 2021, 11 pages, 7 figures, 1 table
- …