51,741 research outputs found
Distributed Representation of Geometrically Correlated Images with Compressed Linear Measurements
This paper addresses the problem of distributed coding of images whose
correlation is driven by the motion of objects or positioning of the vision
sensors. It concentrates on the problem where images are encoded with
compressed linear measurements. We propose a geometry-based correlation model
in order to describe the common information in pairs of images. We assume that
the constitutive components of natural images can be captured by visual
features that undergo local transformations (e.g., translation) in different
images. We first identify prominent visual features by computing a sparse
approximation of a reference image with a dictionary of geometric basis
functions. We then pose a regularized optimization problem to estimate the
corresponding features in correlated images given by quantized linear
measurements. The estimated features have to comply with the compressed
information and to represent consistent transformation between images. The
correlation model is given by the relative geometric transformations between
corresponding features. We then propose an efficient joint decoding algorithm
that estimates the compressed images such that they stay consistent with both
the quantized measurements and the correlation model. Experimental results show
that the proposed algorithm effectively estimates the correlation between
images in multi-view datasets. In addition, the proposed algorithm provides
effective decoding performance that compares advantageously to independent
coding solutions as well as state-of-the-art distributed coding schemes based
on disparity learning
Unsupervised Monocular Depth Estimation with Left-Right Consistency
Learning based methods have shown very promising results for the task of
depth estimation in single images. However, most existing approaches treat
depth prediction as a supervised regression problem and as a result, require
vast quantities of corresponding ground truth depth data for training. Just
recording quality depth data in a range of environments is a challenging
problem. In this paper, we innovate beyond existing approaches, replacing the
use of explicit depth data during training with easier-to-obtain binocular
stereo footage.
We propose a novel training objective that enables our convolutional neural
network to learn to perform single image depth estimation, despite the absence
of ground truth depth data. Exploiting epipolar geometry constraints, we
generate disparity images by training our network with an image reconstruction
loss. We show that solving for image reconstruction alone results in poor
quality depth images. To overcome this problem, we propose a novel training
loss that enforces consistency between the disparities produced relative to
both the left and right images, leading to improved performance and robustness
compared to existing approaches. Our method produces state of the art results
for monocular depth estimation on the KITTI driving dataset, even outperforming
supervised methods that have been trained with ground truth depth.Comment: CVPR 2017 ora
On Pairwise Costs for Network Flow Multi-Object Tracking
Multi-object tracking has been recently approached with the min-cost network
flow optimization techniques. Such methods simultaneously resolve multiple
object tracks in a video and enable modeling of dependencies among tracks.
Min-cost network flow methods also fit well within the "tracking-by-detection"
paradigm where object trajectories are obtained by connecting per-frame outputs
of an object detector. Object detectors, however, often fail due to occlusions
and clutter in the video. To cope with such situations, we propose to add
pairwise costs to the min-cost network flow framework. While integer solutions
to such a problem become NP-hard, we design a convex relaxation solution with
an efficient rounding heuristic which empirically gives certificates of small
suboptimality. We evaluate two particular types of pairwise costs and
demonstrate improvements over recent tracking methods in real-world video
sequences
- …