384 research outputs found
Top-down neural attention by excitation backprop
We aim to model the top-down attention of a Convolutional Neural Network (CNN) classifier for generating task-specific attention maps. Inspired by a top-down human visual attention model, we propose a new backpropagation scheme, called Excitation Backprop, to pass along top-down signals downwards in the network hierarchy via a probabilistic Winner-Take-All process. Furthermore, we introduce the concept of contrastive attention to make the top-down attention maps more discriminative. In experiments, we demonstrate the accuracy and generalizability of our method in weakly supervised localization tasks on the MS COCO, PASCAL VOC07 and ImageNet datasets. The usefulness of our method is further validated in the text-to-region association task. On the Flickr30k Entities dataset, we achieve promising performance in phrase localization by leveraging the top-down attention of a CNN model that has been trained on weakly labeled web images.https://arxiv.org/abs/1608.00507Accepted manuscrip
Object Detection in 20 Years: A Survey
Object detection, as of one the most fundamental and challenging problems in
computer vision, has received great attention in recent years. Its development
in the past two decades can be regarded as an epitome of computer vision
history. If we think of today's object detection as a technical aesthetics
under the power of deep learning, then turning back the clock 20 years we would
witness the wisdom of cold weapon era. This paper extensively reviews 400+
papers of object detection in the light of its technical evolution, spanning
over a quarter-century's time (from the 1990s to 2019). A number of topics have
been covered in this paper, including the milestone detectors in history,
detection datasets, metrics, fundamental building blocks of the detection
system, speed up techniques, and the recent state of the art detection methods.
This paper also reviews some important detection applications, such as
pedestrian detection, face detection, text detection, etc, and makes an in-deep
analysis of their challenges as well as technical improvements in recent years.Comment: This work has been submitted to the IEEE TPAMI for possible
publicatio
Fluid Annotation: A Human-Machine Collaboration Interface for Full Image Annotation
We introduce Fluid Annotation, an intuitive human-machine collaboration
interface for annotating the class label and outline of every object and
background region in an image. Fluid annotation is based on three principles:
(I) Strong Machine-Learning aid. We start from the output of a strong neural
network model, which the annotator can edit by correcting the labels of
existing regions, adding new regions to cover missing objects, and removing
incorrect regions. The edit operations are also assisted by the model. (II)
Full image annotation in a single pass. As opposed to performing a series of
small annotation tasks in isolation, we propose a unified interface for full
image annotation in a single pass. (III) Empower the annotator. We empower the
annotator to choose what to annotate and in which order. This enables
concentrating on what the machine does not already know, i.e. putting human
effort only on the errors it made. This helps using the annotation budget
effectively. Through extensive experiments on the COCO+Stuff dataset, we
demonstrate that Fluid Annotation leads to accurate annotations very
efficiently, taking three times less annotation time than the popular LabelMe
interface.Comment: ACM MultiMedia 2018. Live demo is available at fluidann.appspot.co
- …