14,255 research outputs found
A-Fast-RCNN: Hard Positive Generation via Adversary for Object Detection
How do we learn an object detector that is invariant to occlusions and
deformations? Our current solution is to use a data-driven strategy -- collect
large-scale datasets which have object instances under different conditions.
The hope is that the final classifier can use these examples to learn
invariances. But is it really possible to see all the occlusions in a dataset?
We argue that like categories, occlusions and object deformations also follow a
long-tail. Some occlusions and deformations are so rare that they hardly
happen; yet we want to learn a model invariant to such occurrences. In this
paper, we propose an alternative solution. We propose to learn an adversarial
network that generates examples with occlusions and deformations. The goal of
the adversary is to generate examples that are difficult for the object
detector to classify. In our framework both the original detector and adversary
are learned in a joint manner. Our experimental results indicate a 2.3% mAP
boost on VOC07 and a 2.6% mAP boost on VOC2012 object detection challenge
compared to the Fast-RCNN pipeline. We also release the code for this paper.Comment: CVPR 2017 Camera Read
Deep Detection of People and their Mobility Aids for a Hospital Robot
Robots operating in populated environments encounter many different types of
people, some of whom might have an advanced need for cautious interaction,
because of physical impairments or their advanced age. Robots therefore need to
recognize such advanced demands to provide appropriate assistance, guidance or
other forms of support. In this paper, we propose a depth-based perception
pipeline that estimates the position and velocity of people in the environment
and categorizes them according to the mobility aids they use: pedestrian,
person in wheelchair, person in a wheelchair with a person pushing them, person
with crutches and person using a walker. We present a fast region proposal
method that feeds a Region-based Convolutional Network (Fast R-CNN). With this,
we speed up the object detection process by a factor of seven compared to a
dense sliding window approach. We furthermore propose a probabilistic position,
velocity and class estimator to smooth the CNN's detections and account for
occlusions and misclassifications. In addition, we introduce a new hospital
dataset with over 17,000 annotated RGB-D images. Extensive experiments confirm
that our pipeline successfully keeps track of people and their mobility aids,
even in challenging situations with multiple people from different categories
and frequent occlusions. Videos of our experiments and the dataset are
available at http://www2.informatik.uni-freiburg.de/~kollmitz/MobilityAidsComment: 7 pages, ECMR 2017, dataset and videos:
http://www2.informatik.uni-freiburg.de/~kollmitz/MobilityAids
Grid Loss: Detecting Occluded Faces
Detection of partially occluded objects is a challenging computer vision
problem. Standard Convolutional Neural Network (CNN) detectors fail if parts of
the detection window are occluded, since not every sub-part of the window is
discriminative on its own. To address this issue, we propose a novel loss layer
for CNNs, named grid loss, which minimizes the error rate on sub-blocks of a
convolution layer independently rather than over the whole feature map. This
results in parts being more discriminative on their own, enabling the detector
to recover if the detection window is partially occluded. By mapping our loss
layer back to a regular fully connected layer, no additional computational cost
is incurred at runtime compared to standard CNNs. We demonstrate our method for
face detection on several public face detection benchmarks and show that our
method outperforms regular CNNs, is suitable for realtime applications and
achieves state-of-the-art performance.Comment: accepted to ECCV 201
Classification of Occluded Objects using Fast Recurrent Processing
Recurrent neural networks are powerful tools for handling incomplete data
problems in computer vision, thanks to their significant generative
capabilities. However, the computational demand for these algorithms is too
high to work in real time, without specialized hardware or software solutions.
In this paper, we propose a framework for augmenting recurrent processing
capabilities into a feedforward network without sacrificing much from
computational efficiency. We assume a mixture model and generate samples of the
last hidden layer according to the class decisions of the output layer, modify
the hidden layer activity using the samples, and propagate to lower layers. For
visual occlusion problem, the iterative procedure emulates feedforward-feedback
loop, filling-in the missing hidden layer activity with meaningful
representations. The proposed algorithm is tested on a widely used dataset, and
shown to achieve 2 improvement in classification accuracy for occluded
objects. When compared to Restricted Boltzmann Machines, our algorithm shows
superior performance for occluded object classification.Comment: arXiv admin note: text overlap with arXiv:1409.8576 by other author
- …