613 research outputs found
SANet: Structure-Aware Network for Visual Tracking
Convolutional neural network (CNN) has drawn increasing interest in visual
tracking owing to its powerfulness in feature extraction. Most existing
CNN-based trackers treat tracking as a classification problem. However, these
trackers are sensitive to similar distractors because their CNN models mainly
focus on inter-class classification. To address this problem, we use
self-structure information of object to distinguish it from distractors.
Specifically, we utilize recurrent neural network (RNN) to model object
structure, and incorporate it into CNN to improve its robustness to similar
distractors. Considering that convolutional layers in different levels
characterize the object from different perspectives, we use multiple RNNs to
model object structure in different levels respectively. Extensive experiments
on three benchmarks, OTB100, TC-128 and VOT2015, show that the proposed
algorithm outperforms other methods. Code is released at
http://www.dabi.temple.edu/~hbling/code/SANet/SANet.html.Comment: In CVPR Deep Vision Workshop, 201
End-to-end Projector Photometric Compensation
Projector photometric compensation aims to modify a projector input image
such that it can compensate for disturbance from the appearance of projection
surface. In this paper, for the first time, we formulate the compensation
problem as an end-to-end learning problem and propose a convolutional neural
network, named CompenNet, to implicitly learn the complex compensation
function. CompenNet consists of a UNet-like backbone network and an autoencoder
subnet. Such architecture encourages rich multi-level interactions between the
camera-captured projection surface image and the input image, and thus captures
both photometric and environment information of the projection surface. In
addition, the visual details and interaction information are carried to deeper
layers along the multi-level skip convolution layers. The architecture is of
particular importance for the projector compensation task, for which only a
small training dataset is allowed in practice. Another contribution we make is
a novel evaluation benchmark, which is independent of system setup and thus
quantitatively verifiable. Such benchmark is not previously available, to our
best knowledge, due to the fact that conventional evaluation requests the
hardware system to actually project the final results. Our key idea, motivated
from our end-to-end problem formulation, is to use a reasonable surrogate to
avoid such projection process so as to be setup-independent. Our method is
evaluated carefully on the benchmark, and the results show that our end-to-end
learning solution outperforms state-of-the-arts both qualitatively and
quantitatively by a significant margin.Comment: To appear in the 2019 IEEE Conference on Computer Vision and Pattern
Recognition (CVPR). Source code and dataset are available at
https://github.com/BingyaoHuang/compenne
M2Det: A Single-Shot Object Detector based on Multi-Level Feature Pyramid Network
Feature pyramids are widely exploited by both the state-of-the-art one-stage
object detectors (e.g., DSSD, RetinaNet, RefineDet) and the two-stage object
detectors (e.g., Mask R-CNN, DetNet) to alleviate the problem arising from
scale variation across object instances. Although these object detectors with
feature pyramids achieve encouraging results, they have some limitations due to
that they only simply construct the feature pyramid according to the inherent
multi-scale, pyramidal architecture of the backbones which are actually
designed for object classification task. Newly, in this work, we present a
method called Multi-Level Feature Pyramid Network (MLFPN) to construct more
effective feature pyramids for detecting objects of different scales. First, we
fuse multi-level features (i.e. multiple layers) extracted by backbone as the
base feature. Second, we feed the base feature into a block of alternating
joint Thinned U-shape Modules and Feature Fusion Modules and exploit the
decoder layers of each u-shape module as the features for detecting objects.
Finally, we gather up the decoder layers with equivalent scales (sizes) to
develop a feature pyramid for object detection, in which every feature map
consists of the layers (features) from multiple levels. To evaluate the
effectiveness of the proposed MLFPN, we design and train a powerful end-to-end
one-stage object detector we call M2Det by integrating it into the architecture
of SSD, which gets better detection performance than state-of-the-art one-stage
detectors. Specifically, on MS-COCO benchmark, M2Det achieves AP of 41.0 at
speed of 11.8 FPS with single-scale inference strategy and AP of 44.2 with
multi-scale inference strategy, which is the new state-of-the-art results among
one-stage detectors. The code will be made available on
\url{https://github.com/qijiezhao/M2Det.Comment: AAAI1
- …