28 research outputs found
Activity Driven Weakly Supervised Object Detection
Weakly supervised object detection aims at reducing the amount of supervision
required to train detection models. Such models are traditionally learned from
images/videos labelled only with the object class and not the object bounding
box. In our work, we try to leverage not only the object class labels but also
the action labels associated with the data. We show that the action depicted in
the image/video can provide strong cues about the location of the associated
object. We learn a spatial prior for the object dependent on the action (e.g.
"ball" is closer to "leg of the person" in "kicking ball"), and incorporate
this prior to simultaneously train a joint object detection and action
classification model. We conducted experiments on both video datasets and image
datasets to evaluate the performance of our weakly supervised object detection
model. Our approach outperformed the current state-of-the-art (SOTA) method by
more than 6% in mAP on the Charades video dataset.Comment: CVPR'19 camera read
Learning Intelligent Dialogs for Bounding Box Annotation
We introduce Intelligent Annotation Dialogs for bounding box annotation. We
train an agent to automatically choose a sequence of actions for a human
annotator to produce a bounding box in a minimal amount of time. Specifically,
we consider two actions: box verification, where the annotator verifies a box
generated by an object detector, and manual box drawing. We explore two kinds
of agents, one based on predicting the probability that a box will be
positively verified, and the other based on reinforcement learning. We
demonstrate that (1) our agents are able to learn efficient annotation
strategies in several scenarios, automatically adapting to the image
difficulty, the desired quality of the boxes, and the detector strength; (2) in
all scenarios the resulting annotation dialogs speed up annotation compared to
manual box drawing alone and box verification alone, while also outperforming
any fixed combination of verification and drawing in most scenarios; (3) in a
realistic scenario where the detector is iteratively re-trained, our agents
evolve a series of strategies that reflect the shifting trade-off between
verification and drawing as the detector grows stronger.Comment: This paper appeared at CVPR 201
Weakly supervised coupled networks for visual sentiment analysis
Automatic assessment of sentiment from visual content
has gained considerable attention with the increasing tendency
of expressing opinions on-line. In this paper, we solve
the problem of visual sentiment analysis using the high-level
abstraction in the recognition process. Existing methods
based on convolutional neural networks learn sentiment
representations from the holistic image appearance. However,
different image regions can have a different influence
on the intended expression. This paper presents a weakly
supervised coupled convolutional network with two branches
to leverage the localized information. The first branch
detects a sentiment specific soft map by training a fully convolutional
network with the cross spatial pooling strategy,
which only requires image-level labels, thereby significantly
reducing the annotation burden. The second branch utilizes
both the holistic and localized information by coupling
the sentiment map with deep features for robust classification.
We integrate the sentiment detection and classification
branches into a unified deep framework and optimize
the network in an end-to-end manner. Extensive experiments
on six benchmark datasets demonstrate that the
proposed method performs favorably against the state-ofthe-
art methods for visual sentiment analysis