4,055 research outputs found
Weakly Supervised Point Clouds Transformer for 3D Object Detection
The annotation of 3D datasets is required for semantic-segmentation and
object detection in scene understanding. In this paper we present a framework
for the weakly supervision of a point clouds transformer that is used for 3D
object detection. The aim is to decrease the required amount of supervision
needed for training, as a result of the high cost of annotating a 3D datasets.
We propose an Unsupervised Voting Proposal Module, which learns randomly preset
anchor points and uses voting network to select prepared anchor points of high
quality. Then it distills information into student and teacher network. In
terms of student network, we apply ResNet network to efficiently extract local
characteristics. However, it also can lose much global information. To provide
the input which incorporates the global and local information as the input of
student networks, we adopt the self-attention mechanism of transformer to
extract global features, and the ResNet layers to extract region proposals. The
teacher network supervises the classification and regression of the student
network using the pre-trained model on ImageNet. On the challenging KITTI
datasets, the experimental results have achieved the highest level of average
precision compared with the most recent weakly supervised 3D object detectors.Comment: International Conference on Intelligent Transportation Systems
(ITSC), 202
Tell Me What They're Holding: Weakly-supervised Object Detection with Transferable Knowledge from Human-object Interaction
In this work, we introduce a novel weakly supervised object detection (WSOD)
paradigm to detect objects belonging to rare classes that have not many
examples using transferable knowledge from human-object interactions (HOI).
While WSOD shows lower performance than full supervision, we mainly focus on
HOI as the main context which can strongly supervise complex semantics in
images. Therefore, we propose a novel module called RRPN (relational region
proposal network) which outputs an object-localizing attention map only with
human poses and action verbs. In the source domain, we fully train an object
detector and the RRPN with full supervision of HOI. With transferred knowledge
about localization map from the trained RRPN, a new object detector can learn
unseen objects with weak verbal supervision of HOI without bounding box
annotations in the target domain. Because the RRPN is designed as an add-on
type, we can apply it not only to the object detection but also to other
domains such as semantic segmentation. The experimental results on HICO-DET
dataset show the possibility that the proposed method can be a cheap
alternative for the current supervised object detection paradigm. Moreover,
qualitative results demonstrate that our model can properly localize unseen
objects on HICO-DET and V-COCO datasets.Comment: AAAI 2020 Oral Camera Read
Adversarial Soft-detection-based Aggregation Network for Image Retrieval
In recent year, the compact representations based on activations of
Convolutional Neural Network (CNN) achieve remarkable performance in image
retrieval. However, retrieval of some interested object that only takes up a
small part of the whole image is still a challenging problem. Therefore, it is
significant to extract the discriminative representations that contain regional
information of the pivotal small object. In this paper, we propose a novel
adversarial soft-detection-based aggregation (ASDA) method free from bounding
box annotations for image retrieval, based on adversarial detector and soft
region proposal layer. Our trainable adversarial detector generates semantic
maps based on adversarial erasing strategy to preserve more discriminative and
detailed information. Computed based on semantic maps corresponding to various
discriminative patterns and semantic contents, our soft region proposal is
arbitrary shape rather than only rectangle and it reflects the significance of
objects. The aggregation based on trainable soft region proposal highlights
discriminative semantic contents and suppresses the noise of background.
We conduct comprehensive experiments on standard image retrieval datasets.
Our weakly supervised ASDA method achieves state-of-the-art performance on most
datasets. The results demonstrate that the proposed ASDA method is effective
for image retrieval.Comment: 10 pages, 6 figure
Weakly Supervised Object Detection with Segmentation Collaboration
Weakly supervised object detection aims at learning precise object detectors,
given image category labels. In recent prevailing works, this problem is
generally formulated as a multiple instance learning module guided by an image
classification loss. The object bounding box is assumed to be the one
contributing most to the classification among all proposals. However, the
region contributing most is also likely to be a crucial part or the supporting
context of an object. To obtain a more accurate detector, in this work we
propose a novel end-to-end weakly supervised detection approach, where a newly
introduced generative adversarial segmentation module interacts with the
conventional detection module in a collaborative loop. The collaboration
mechanism takes full advantages of the complementary interpretations of the
weakly supervised localization task, namely detection and segmentation tasks,
forming a more comprehensive solution. Consequently, our method obtains more
precise object bounding boxes, rather than parts or irrelevant surroundings.
Expectedly, the proposed method achieves an accuracy of 51.0% on the PASCAL VOC
2007 dataset, outperforming the state-of-the-arts and demonstrating its
superiority for weakly supervised object detection
Collaborative Learning for Weakly Supervised Object Detection
Weakly supervised object detection has recently received much attention,
since it only requires image-level labels instead of the bounding-box labels
consumed in strongly supervised learning. Nevertheless, the save in labeling
expense is usually at the cost of model accuracy. In this paper, we propose a
simple but effective weakly supervised collaborative learning framework to
resolve this problem, which trains a weakly supervised learner and a strongly
supervised learner jointly by enforcing partial feature sharing and prediction
consistency. For object detection, taking WSDDN-like architecture as weakly
supervised detector sub-network and Faster-RCNN-like architecture as strongly
supervised detector sub-network, we propose an end-to-end Weakly Supervised
Collaborative Detection Network. As there is no strong supervision available to
train the Faster-RCNN-like sub-network, a new prediction consistency loss is
defined to enforce consistency of predictions between the two sub-networks as
well as within the Faster-RCNN-like sub-networks. At the same time, the two
detectors are designed to partially share features to further guarantee the
model consistency at perceptual level. Extensive experiments on PASCAL VOC 2007
and 2012 data sets have demonstrated the effectiveness of the proposed
framework
Activity Driven Weakly Supervised Object Detection
Weakly supervised object detection aims at reducing the amount of supervision
required to train detection models. Such models are traditionally learned from
images/videos labelled only with the object class and not the object bounding
box. In our work, we try to leverage not only the object class labels but also
the action labels associated with the data. We show that the action depicted in
the image/video can provide strong cues about the location of the associated
object. We learn a spatial prior for the object dependent on the action (e.g.
"ball" is closer to "leg of the person" in "kicking ball"), and incorporate
this prior to simultaneously train a joint object detection and action
classification model. We conducted experiments on both video datasets and image
datasets to evaluate the performance of our weakly supervised object detection
model. Our approach outperformed the current state-of-the-art (SOTA) method by
more than 6% in mAP on the Charades video dataset.Comment: CVPR'19 camera read
Soft Proposal Networks for Weakly Supervised Object Localization
Weakly supervised object localization remains challenging, where only image
labels instead of bounding boxes are available during training. Object proposal
is an effective component in localization, but often computationally expensive
and incapable of joint optimization with some of the remaining modules. In this
paper, to the best of our knowledge, we for the first time integrate weakly
supervised object proposal into convolutional neural networks (CNNs) in an
end-to-end learning manner. We design a network component, Soft Proposal (SP),
to be plugged into any standard convolutional architecture to introduce the
nearly cost-free object proposal, orders of magnitude faster than
state-of-the-art methods. In the SP-augmented CNNs, referred to as Soft
Proposal Networks (SPNs), iteratively evolved object proposals are generated
based on the deep feature maps then projected back, and further jointly
optimized with network parameters, with image-level supervision only. Through
the unified learning process, SPNs learn better object-centric filters,
discover more discriminative visual evidence, and suppress background
interference, significantly boosting both weakly supervised object localization
and classification performance. We report the best results on popular
benchmarks, including PASCAL VOC, MS COCO, and ImageNet.Comment: ICCV 201
Zero-Annotation Object Detection with Web Knowledge Transfer
Object detection is one of the major problems in computer vision, and has
been extensively studied. Most of the existing detection works rely on
labor-intensive supervision, such as ground truth bounding boxes of objects or
at least image-level annotations. On the contrary, we propose an object
detection method that does not require any form of human annotation on target
tasks, by exploiting freely available web images. In order to facilitate
effective knowledge transfer from web images, we introduce a multi-instance
multi-label domain adaption learning framework with two key innovations. First
of all, we propose an instance-level adversarial domain adaptation network with
attention on foreground objects to transfer the object appearances from web
domain to target domain. Second, to preserve the class-specific semantic
structure of transferred object features, we propose a simultaneous transfer
mechanism to transfer the supervision across domains through pseudo strong
label generation. With our end-to-end framework that simultaneously learns a
weakly supervised detector and transfers knowledge across domains, we achieved
significant improvements over baseline methods on the benchmark datasets.Comment: Accepted in ECCV 201
Weakly supervised object detection using pseudo-strong labels
Object detection is an import task of computer vision.A variety of methods
have been proposed,but methods using the weak labels still do not have a
satisfactory result.In this paper,we propose a new framework that using the
weakly supervised method's output as the pseudo-strong labels to train a
strongly supervised model.One weakly supervised method is treated as black-box
to generate class-specific bounding boxes on train dataset.A de-noise method is
then applied to the noisy bounding boxes.Then the de-noised pseudo-strong
labels are used to train a strongly object detection network.The whole
framework is still weakly supervised because the entire process only uses the
image-level labels.The experiment results on PASCAL VOC 2007 prove the validity
of our framework, and we get result 43.4% on mean average precision compared to
39.5% of the previous best result and 34.5% of the initial
method,respectively.And this frame work is simple and distinct,and is promising
to be applied to other method easily.Comment: 7 page
Weakly- and Semi-Supervised Object Detection with Expectation-Maximization Algorithm
Object detection when provided image-level labels instead of instance-level
labels (i.e., bounding boxes) during training is an important problem in
computer vision, since large scale image datasets with instance-level labels
are extremely costly to obtain. In this paper, we address this challenging
problem by developing an Expectation-Maximization (EM) based object detection
method using deep convolutional neural networks (CNNs). Our method is
applicable to both the weakly-supervised and semi-supervised settings.
Extensive experiments on PASCAL VOC 2007 benchmark show that (1) in the weakly
supervised setting, our method provides significant detection performance
improvement over current state-of-the-art methods, (2) having access to a small
number of strongly (instance-level) annotated images, our method can almost
match the performace of the fully supervised Fast RCNN. We share our source
code at https://github.com/ZiangYan/EM-WSD.Comment: 9 page
- …