6,237 research outputs found
Partial Label Learning with Self-Guided Retraining
Partial label learning deals with the problem where each training instance is
assigned a set of candidate labels, only one of which is correct. This paper
provides the first attempt to leverage the idea of self-training for dealing
with partially labeled examples. Specifically, we propose a unified formulation
with proper constraints to train the desired model and perform pseudo-labeling
jointly. For pseudo-labeling, unlike traditional self-training that manually
differentiates the ground-truth label with enough high confidence, we introduce
the maximum infinity norm regularization on the modeling outputs to
automatically achieve this consideratum, which results in a convex-concave
optimization problem. We show that optimizing this convex-concave problem is
equivalent to solving a set of quadratic programming (QP) problems. By
proposing an upper-bound surrogate objective function, we turn to solving only
one QP problem for improving the optimization efficiency. Extensive experiments
on synthesized and real-world datasets demonstrate that the proposed approach
significantly outperforms the state-of-the-art partial label learning
approaches.Comment: 8 pages, accepted by AAAI-1
Grad-CAM++: Improved Visual Explanations for Deep Convolutional Networks
Over the last decade, Convolutional Neural Network (CNN) models have been
highly successful in solving complex vision problems. However, these deep
models are perceived as "black box" methods considering the lack of
understanding of their internal functioning. There has been a significant
recent interest in developing explainable deep learning models, and this paper
is an effort in this direction. Building on a recently proposed method called
Grad-CAM, we propose a generalized method called Grad-CAM++ that can provide
better visual explanations of CNN model predictions, in terms of better object
localization as well as explaining occurrences of multiple object instances in
a single image, when compared to state-of-the-art. We provide a mathematical
derivation for the proposed method, which uses a weighted combination of the
positive partial derivatives of the last convolutional layer feature maps with
respect to a specific class score as weights to generate a visual explanation
for the corresponding class label. Our extensive experiments and evaluations,
both subjective and objective, on standard datasets showed that Grad-CAM++
provides promising human-interpretable visual explanations for a given CNN
architecture across multiple tasks including classification, image caption
generation and 3D action recognition; as well as in new settings such as
knowledge distillation.Comment: 17 Pages, 15 Figures, 11 Tables. Accepted in the proceedings of IEEE
Winter Conf. on Applications of Computer Vision (WACV2018). Extended version
is under review at IEEE Transactions on Pattern Analysis and Machine
Intelligenc
WeakTr: Exploring Plain Vision Transformer for Weakly-supervised Semantic Segmentation
This paper explores the properties of the plain Vision Transformer (ViT) for
Weakly-supervised Semantic Segmentation (WSSS). The class activation map (CAM)
is of critical importance for understanding a classification network and
launching WSSS. We observe that different attention heads of ViT focus on
different image areas. Thus a novel weight-based method is proposed to
end-to-end estimate the importance of attention heads, while the self-attention
maps are adaptively fused for high-quality CAM results that tend to have more
complete objects. Besides, we propose a ViT-based gradient clipping decoder for
online retraining with the CAM results to complete the WSSS task. We name this
plain Transformer-based Weakly-supervised learning framework WeakTr. It
achieves the state-of-the-art WSSS performance on standard benchmarks, i.e.,
78.4% mIoU on the val set of PASCAL VOC 2012 and 50.3% mIoU on the val set of
COCO 2014. Code is available at https://github.com/hustvl/WeakTr.Comment: 20 pages, 11 figure
STRIDE: Structure-guided Generation for Inverse Design of Molecules
Machine learning and especially deep learning has had an increasing impact on
molecule and materials design. In particular, given the growing access to an
abundance of high-quality small molecule data for generative modeling for drug
design, results for drug discovery have been promising. However, for many
important classes of materials such as catalysts, antioxidants, and
metal-organic frameworks, such large datasets are not available. Such families
of molecules with limited samples and structural similarities are especially
prevalent for industrial applications. As is well-known, retraining and even
fine-tuning are challenging on such small datasets. Novel, practically
applicable molecules are most often derivatives of well-known molecules,
suggesting approaches to addressing data scarcity. To address this problem, we
introduce , a generative molecule workflow that generates
novel molecules with an unconditional generative model guided by known
molecules without any retraining. We generate molecules outside of the training
data from a highly specialized set of antioxidant molecules. Our generated
molecules have on average 21.7% lower synthetic accessibility scores and also
reduce ionization potential by 5.9% of generated molecules via guiding
Learning Generalized Reactive Policies using Deep Neural Networks
We present a new approach to learning for planning, where knowledge acquired
while solving a given set of planning problems is used to plan faster in
related, but new problem instances. We show that a deep neural network can be
used to learn and represent a \emph{generalized reactive policy} (GRP) that
maps a problem instance and a state to an action, and that the learned GRPs
efficiently solve large classes of challenging problem instances. In contrast
to prior efforts in this direction, our approach significantly reduces the
dependence of learning on handcrafted domain knowledge or feature selection.
Instead, the GRP is trained from scratch using a set of successful execution
traces. We show that our approach can also be used to automatically learn a
heuristic function that can be used in directed search algorithms. We evaluate
our approach using an extensive suite of experiments on two challenging
planning problem domains and show that our approach facilitates learning
complex decision making policies and powerful heuristic functions with minimal
human input. Videos of our results are available at goo.gl/Hpy4e3
Hand Keypoint Detection in Single Images using Multiview Bootstrapping
We present an approach that uses a multi-camera system to train fine-grained
detectors for keypoints that are prone to occlusion, such as the joints of a
hand. We call this procedure multiview bootstrapping: first, an initial
keypoint detector is used to produce noisy labels in multiple views of the
hand. The noisy detections are then triangulated in 3D using multiview geometry
or marked as outliers. Finally, the reprojected triangulations are used as new
labeled training data to improve the detector. We repeat this process,
generating more labeled data in each iteration. We derive a result analytically
relating the minimum number of views to achieve target true and false positive
rates for a given detector. The method is used to train a hand keypoint
detector for single images. The resulting keypoint detector runs in realtime on
RGB images and has accuracy comparable to methods that use depth sensors. The
single view detector, triangulated over multiple views, enables 3D markerless
hand motion capture with complex object interactions.Comment: CVPR 201
Unlocking Low-Light-Rainy Image Restoration by Pairwise Degradation Feature Vector Guidance
Rain in the dark is a common natural phenomenon. Photos captured in such a
condition significantly impact the performance of various nighttime activities,
such as autonomous driving, surveillance systems, and night photography. While
existing methods designed for low-light enhancement or deraining show promising
performance, they have limitations in simultaneously addressing the task of
brightening low light and removing rain. Furthermore, using a cascade approach,
such as ``deraining followed by low-light enhancement'' or vice versa, may lead
to difficult-to-handle rain patterns or excessively blurred and overexposed
images. To overcome these limitations, we propose an end-to-end network called
which can jointly handle low-light enhancement and deraining. Our
network mainly includes a Pairwise Degradation Feature Vector Extraction
Network (P-Net) and a Restoration Network (R-Net). P-Net can learn degradation
feature vectors on the dark and light areas separately, using contrastive
learning to guide the image restoration process. The R-Net is responsible for
restoring the image. We also introduce an effective Fast Fourier - ResNet
Detail Guidance Module (FFR-DG) that initially guides image restoration using
detail image that do not contain degradation information but focus on texture
detail information. Additionally, we contribute a dataset containing synthetic
and real-world low-light-rainy images. Extensive experiments demonstrate that
our outperforms existing methods in both synthetic and complex
real-world scenarios
- …