67 research outputs found
Deep Regionlets for Object Detection
In this paper, we propose a novel object detection framework named "Deep
Regionlets" by establishing a bridge between deep neural networks and
conventional detection schema for accurate generic object detection. Motivated
by the abilities of regionlets for modeling object deformation and multiple
aspect ratios, we incorporate regionlets into an end-to-end trainable deep
learning framework. The deep regionlets framework consists of a region
selection network and a deep regionlet learning module. Specifically, given a
detection bounding box proposal, the region selection network provides guidance
on where to select regions to learn the features from. The regionlet learning
module focuses on local feature selection and transformation to alleviate local
variations. To this end, we first realize non-rectangular region selection
within the detection framework to accommodate variations in object appearance.
Moreover, we design a "gating network" within the regionlet leaning module to
enable soft regionlet selection and pooling. The Deep Regionlets framework is
trained end-to-end without additional efforts. We perform ablation studies and
conduct extensive experiments on the PASCAL VOC and Microsoft COCO datasets.
The proposed framework outperforms state-of-the-art algorithms, such as
RetinaNet and Mask R-CNN, even without additional segmentation labels.Comment: Accepted to ECCV 201
Part Detector Discovery in Deep Convolutional Neural Networks
Current fine-grained classification approaches often rely on a robust
localization of object parts to extract localized feature representations
suitable for discrimination. However, part localization is a challenging task
due to the large variation of appearance and pose. In this paper, we show how
pre-trained convolutional neural networks can be used for robust and efficient
object part discovery and localization without the necessity to actually train
the network on the current dataset. Our approach called "part detector
discovery" (PDD) is based on analyzing the gradient maps of the network outputs
and finding activation centers spatially related to annotated semantic parts or
bounding boxes.
This allows us not just to obtain excellent performance on the CUB200-2011
dataset, but in contrast to previous approaches also to perform detection and
bird classification jointly without requiring a given bounding box annotation
during testing and ground-truth parts during training. The code is available at
http://www.inf-cv.uni-jena.de/part_discovery and
https://github.com/cvjena/PartDetectorDisovery.Comment: Accepted for publication on Asian Conference on Computer Vision
(ACCV) 201
Track, then Decide: Category-Agnostic Vision-based Multi-Object Tracking
The most common paradigm for vision-based multi-object tracking is
tracking-by-detection, due to the availability of reliable detectors for
several important object categories such as cars and pedestrians. However,
future mobile systems will need a capability to cope with rich human-made
environments, in which obtaining detectors for every possible object category
would be infeasible. In this paper, we propose a model-free multi-object
tracking approach that uses a category-agnostic image segmentation method to
track objects. We present an efficient segmentation mask-based tracker which
associates pixel-precise masks reported by the segmentation. Our approach can
utilize semantic information whenever it is available for classifying objects
at the track level, while retaining the capability to track generic unknown
objects in the absence of such information. We demonstrate experimentally that
our approach achieves performance comparable to state-of-the-art
tracking-by-detection methods for popular object categories such as cars and
pedestrians. Additionally, we show that the proposed method can discover and
robustly track a large variety of other objects.Comment: ICRA'18 submissio
SINet: A Scale-insensitive Convolutional Neural Network for Fast Vehicle Detection
Vision-based vehicle detection approaches achieve incredible success in
recent years with the development of deep convolutional neural network (CNN).
However, existing CNN based algorithms suffer from the problem that the
convolutional features are scale-sensitive in object detection task but it is
common that traffic images and videos contain vehicles with a large variance of
scales. In this paper, we delve into the source of scale sensitivity, and
reveal two key issues: 1) existing RoI pooling destroys the structure of small
scale objects, 2) the large intra-class distance for a large variance of scales
exceeds the representation capability of a single network. Based on these
findings, we present a scale-insensitive convolutional neural network (SINet)
for fast detecting vehicles with a large variance of scales. First, we present
a context-aware RoI pooling to maintain the contextual information and original
structure of small scale objects. Second, we present a multi-branch decision
network to minimize the intra-class distance of features. These lightweight
techniques bring zero extra time complexity but prominent detection accuracy
improvement. The proposed techniques can be equipped with any deep network
architectures and keep them trained end-to-end. Our SINet achieves
state-of-the-art performance in terms of accuracy and speed (up to 37 FPS) on
the KITTI benchmark and a new highway dataset, which contains a large variance
of scales and extremely small objects.Comment: Accepted by IEEE Transactions on Intelligent Transportation Systems
(T-ITS
- …