276 research outputs found
SANet: Structure-Aware Network for Visual Tracking
Convolutional neural network (CNN) has drawn increasing interest in visual
tracking owing to its powerfulness in feature extraction. Most existing
CNN-based trackers treat tracking as a classification problem. However, these
trackers are sensitive to similar distractors because their CNN models mainly
focus on inter-class classification. To address this problem, we use
self-structure information of object to distinguish it from distractors.
Specifically, we utilize recurrent neural network (RNN) to model object
structure, and incorporate it into CNN to improve its robustness to similar
distractors. Considering that convolutional layers in different levels
characterize the object from different perspectives, we use multiple RNNs to
model object structure in different levels respectively. Extensive experiments
on three benchmarks, OTB100, TC-128 and VOT2015, show that the proposed
algorithm outperforms other methods. Code is released at
http://www.dabi.temple.edu/~hbling/code/SANet/SANet.html.Comment: In CVPR Deep Vision Workshop, 201
End-to-end Projector Photometric Compensation
Projector photometric compensation aims to modify a projector input image
such that it can compensate for disturbance from the appearance of projection
surface. In this paper, for the first time, we formulate the compensation
problem as an end-to-end learning problem and propose a convolutional neural
network, named CompenNet, to implicitly learn the complex compensation
function. CompenNet consists of a UNet-like backbone network and an autoencoder
subnet. Such architecture encourages rich multi-level interactions between the
camera-captured projection surface image and the input image, and thus captures
both photometric and environment information of the projection surface. In
addition, the visual details and interaction information are carried to deeper
layers along the multi-level skip convolution layers. The architecture is of
particular importance for the projector compensation task, for which only a
small training dataset is allowed in practice. Another contribution we make is
a novel evaluation benchmark, which is independent of system setup and thus
quantitatively verifiable. Such benchmark is not previously available, to our
best knowledge, due to the fact that conventional evaluation requests the
hardware system to actually project the final results. Our key idea, motivated
from our end-to-end problem formulation, is to use a reasonable surrogate to
avoid such projection process so as to be setup-independent. Our method is
evaluated carefully on the benchmark, and the results show that our end-to-end
learning solution outperforms state-of-the-arts both qualitatively and
quantitatively by a significant margin.Comment: To appear in the 2019 IEEE Conference on Computer Vision and Pattern
Recognition (CVPR). Source code and dataset are available at
https://github.com/BingyaoHuang/compenne
CheckerPose: Progressive Dense Keypoint Localization for Object Pose Estimation with Graph Neural Network
Estimating the 6-DoF pose of a rigid object from a single RGB image is a
crucial yet challenging task. Recent studies have shown the great potential of
dense correspondence-based solutions, yet improvements are still needed to
reach practical deployment. In this paper, we propose a novel pose estimation
algorithm named CheckerPose, which improves on three main aspects. Firstly,
CheckerPose densely samples 3D keypoints from the surface of the 3D object and
finds their 2D correspondences progressively in the 2D image. Compared to
previous solutions that conduct dense sampling in the image space, our strategy
enables the correspondence searching in a 2D grid (i.e., pixel coordinate).
Secondly, for our 3D-to-2D correspondence, we design a compact binary code
representation for 2D image locations. This representation not only allows for
progressive correspondence refinement but also converts the correspondence
regression to a more efficient classification problem. Thirdly, we adopt a
graph neural network to explicitly model the interactions among the sampled 3D
keypoints, further boosting the reliability and accuracy of the
correspondences. Together, these novel components make our CheckerPose a strong
pose estimation algorithm. When evaluated on the popular Linemod, Linemod-O,
and YCB-V object pose estimation benchmarks, CheckerPose clearly boosts the
accuracy of correspondence-based methods and achieves state-of-the-art
performances
CompenNet++: End-to-end Full Projector Compensation
Full projector compensation aims to modify a projector input image such that
it can compensate for both geometric and photometric disturbance of the
projection surface. Traditional methods usually solve the two parts separately,
although they are known to correlate with each other. In this paper, we propose
the first end-to-end solution, named CompenNet++, to solve the two problems
jointly. Our work non-trivially extends CompenNet, which was recently proposed
for photometric compensation with promising performance. First, we propose a
novel geometric correction subnet, which is designed with a cascaded
coarse-to-fine structure to learn the sampling grid directly from photometric
sampling images. Second, by concatenating the geometric correction subset with
CompenNet, CompenNet++ accomplishes full projector compensation and is
end-to-end trainable. Third, after training, we significantly simplify both
geometric and photometric compensation parts, and hence largely improves the
running time efficiency. Moreover, we construct the first setup-independent
full compensation benchmark to facilitate the study on this topic. In our
thorough experiments, our method shows clear advantages over previous arts with
promising compensation quality and meanwhile being practically convenient.Comment: To appear in ICCV 2019. High-res supplementary material:
https://www3.cs.stonybrook.edu/~hling/publication/CompenNet++_sup-high-res.pdf.
Code: https://github.com/BingyaoHuang/CompenNet-plusplu
- …