5,930 research outputs found

    Background Adaptive Faster R-CNN for Semi-Supervised Convolutional Object Detection of Threats in X-Ray Images

    Full text link
    Recently, progress has been made in the supervised training of Convolutional Object Detectors (e.g. Faster R-CNN) for threat recognition in carry-on luggage using X-ray images. This is part of the Transportation Security Administration's (TSA's) mission to protect air travelers in the United States. While more training data with threats may reliably improve performance for this class of deep algorithm, it is expensive to stage in realistic contexts. By contrast, data from the real world can be collected quickly with minimal cost. In this paper, we present a semi-supervised approach for threat recognition which we call Background Adaptive Faster R-CNN. This approach is a training method for two-stage object detectors which uses Domain Adaptation methods from the field of deep learning. The data sources described earlier make two "domains": a hand-collected data domain of images with threats, and a real-world domain of images assumed without threats. Two domain discriminators, one for discriminating object proposals and one for image features, are adversarially trained to prevent encoding domain-specific information. Without this penalty a Convolutional Neural Network (CNN) can learn to identify domains based on superficial characteristics, and minimize a supervised loss function without improving its ability to recognize objects. For the hand-collected data, only object proposals and image features from backgrounds are used. The losses for these domain-adaptive discriminators are added to the Faster R-CNN losses of images from both domains. This can reduce threat detection false alarm rates by matching the statistics of extracted features from hand-collected backgrounds to real world data. Performance improvements are demonstrated on two independently-collected datasets of labeled threats

    f-VAEGAN-D2: A Feature Generating Framework for Any-Shot Learning

    Full text link
    When labeled training data is scarce, a promising data augmentation approach is to generate visual features of unknown classes using their attributes. To learn the class conditional distribution of CNN features, these models rely on pairs of image features and class attributes. Hence, they can not make use of the abundance of unlabeled data samples. In this paper, we tackle any-shot learning problems i.e. zero-shot and few-shot, in a unified feature generating framework that operates in both inductive and transductive learning settings. We develop a conditional generative model that combines the strength of VAE and GANs and in addition, via an unconditional discriminator, learns the marginal feature distribution of unlabeled images. We empirically show that our model learns highly discriminative CNN features for five datasets, i.e. CUB, SUN, AWA and ImageNet, and establish a new state-of-the-art in any-shot learning, i.e. inductive and transductive (generalized) zero- and few-shot learning settings. We also demonstrate that our learned features are interpretable: we visualize them by inverting them back to the pixel space and we explain them by generating textual arguments of why they are associated with a certain label.Comment: Accepted at CVPR 201

    Label-Assemble: Leveraging Multiple Datasets with Partial Labels

    Full text link
    The success of deep learning relies heavily on large and diverse datasets with extensive labels, but we often only have access to several small datasets associated with partial labels. In this paper, we start a new initiative, "Label-Assemble", that aims to unleash the full potential of partially labeled data from an assembly of public datasets. Specifically, we introduce a new dynamic adapter to encode different visual tasks, which addresses the challenges of incomparable, heterogeneous, or even conflicting labeling protocols. We also employ pseudo-labeling and consistency constraints to harness data with missing labels and to mitigate the domain gap across datasets. From rigorous evaluations on three natural imaging and six medical imaging tasks, we discover that learning from "negative examples" facilitates both classification and segmentation of classes of interest. This sheds new light on the computer-aided diagnosis of rare diseases and emerging pandemics, wherein "positive examples" are hard to collect, yet "negative examples" are relatively easier to assemble. Apart from exceeding prior arts in the ChestXray benchmark, our model is particularly strong in identifying diseases of minority classes, yielding over 3-point improvement on average. Remarkably, when using existing partial labels, our model performance is on-par with that using full labels, eliminating the need for an additional 40% of annotation costs. Code will be made available at https://github.com/MrGiovanni/LabelAssemble

    Online learning and detection of faces with low human supervision

    Get PDF
    The final publication is available at link.springer.comWe present an efficient,online,and interactive approach for computing a classifier, called Wild Lady Ferns (WiLFs), for face learning and detection using small human supervision. More precisely, on the one hand, WiLFs combine online boosting and extremely randomized trees (Random Ferns) to compute progressively an efficient and discriminative classifier. On the other hand, WiLFs use an interactive human-machine approach that combines two complementary learning strategies to reduce considerably the degree of human supervision during learning. While the first strategy corresponds to query-by-boosting active learning, that requests human assistance over difficult samples in function of the classifier confidence, the second strategy refers to a memory-based learning which uses ¿ Exemplar-based Nearest Neighbors (¿ENN) to assist automatically the classifier. A pre-trained Convolutional Neural Network (CNN) is used to perform ¿ENN with high-level feature descriptors. The proposed approach is therefore fast (WilFs run in 1 FPS using a code not fully optimized), accurate (we obtain detection rates over 82% in complex datasets), and labor-saving (human assistance percentages of less than 20%). As a byproduct, we demonstrate that WiLFs also perform semi-automatic annotation during learning, as while the classifier is being computed, WiLFs are discovering faces instances in input images which are used subsequently for training online the classifier. The advantages of our approach are demonstrated in synthetic and publicly available databases, showing comparable detection rates as offline approaches that require larger amounts of handmade training data.Peer ReviewedPostprint (author's final draft

    Latent Space Regularization for Unsupervised Domain Adaptation in Semantic Segmentation

    Full text link
    Deep convolutional neural networks for semantic segmentation achieve outstanding accuracy, however they also have a couple of major drawbacks: first, they do not generalize well to distributions slightly different from the one of the training data; second, they require a huge amount of labeled data for their optimization. In this paper, we introduce feature-level space-shaping regularization strategies to reduce the domain discrepancy in semantic segmentation. In particular, for this purpose we jointly enforce a clustering objective, a perpendicularity constraint and a norm alignment goal on the feature vectors corresponding to source and target samples. Additionally, we propose a novel measure able to capture the relative efficacy of an adaptation strategy compared to supervised training. We verify the effectiveness of such methods in the autonomous driving setting achieving state-of-the-art results in multiple synthetic-to-real road scenes benchmarks.Comment: Accepted at CVPR-WAD 2021, 11 pages, 7 figures, 1 table
    • …
    corecore