144,366 research outputs found
Robust and Efficient Graph Correspondence Transfer for Person Re-identification
Spatial misalignment caused by variations in poses and viewpoints is one of
the most critical issues that hinders the performance improvement in existing
person re-identification (Re-ID) algorithms. To address this problem, in this
paper, we present a robust and efficient graph correspondence transfer (REGCT)
approach for explicit spatial alignment in Re-ID. Specifically, we propose to
establish the patch-wise correspondences of positive training pairs via graph
matching. By exploiting both spatial and visual contexts of human appearance in
graph matching, meaningful semantic correspondences can be obtained. To
circumvent the cumbersome \emph{on-line} graph matching in testing phase, we
propose to transfer the \emph{off-line} learned patch-wise correspondences from
the positive training pairs to test pairs. In detail, for each test pair, the
training pairs with similar pose-pair configurations are selected as
references. The matching patterns (i.e., the correspondences) of the selected
references are then utilized to calculate the patch-wise feature distances of
this test pair. To enhance the robustness of correspondence transfer, we design
a novel pose context descriptor to accurately model human body configurations,
and present an approach to measure the similarity between a pair of pose
context descriptors. Meanwhile, to improve testing efficiency, we propose a
correspondence template ensemble method using the voting mechanism, which
significantly reduces the amount of patch-wise matchings involved in distance
calculation. With aforementioned strategies, the REGCT model can effectively
and efficiently handle the spatial misalignment problem in Re-ID. Extensive
experiments on five challenging benchmarks, including VIPeR, Road, PRID450S,
3DPES and CUHK01, evidence the superior performance of REGCT over other
state-of-the-art approaches.Comment: Tech. Report. The source code is available at
http://www.dabi.temple.edu/~hbling/code/gct.htm. arXiv admin note: text
overlap with arXiv:1804.0024
Geometric Hypergraph Learning for Visual Tracking
Graph based representation is widely used in visual tracking field by finding
correct correspondences between target parts in consecutive frames. However,
most graph based trackers consider pairwise geometric relations between local
parts. They do not make full use of the target's intrinsic structure, thereby
making the representation easily disturbed by errors in pairwise affinities
when large deformation and occlusion occur. In this paper, we propose a
geometric hypergraph learning based tracking method, which fully exploits
high-order geometric relations among multiple correspondences of parts in
consecutive frames. Then visual tracking is formulated as the mode-seeking
problem on the hypergraph in which vertices represent correspondence hypotheses
and hyperedges describe high-order geometric relations. Besides, a
confidence-aware sampling method is developed to select representative vertices
and hyperedges to construct the geometric hypergraph for more robustness and
scalability. The experiments are carried out on two challenging datasets
(VOT2014 and Deform-SOT) to demonstrate that the proposed method performs
favorable against other existing trackers
DASC: Robust Dense Descriptor for Multi-modal and Multi-spectral Correspondence Estimation
Establishing dense correspondences between multiple images is a fundamental
task in many applications. However, finding a reliable correspondence in
multi-modal or multi-spectral images still remains unsolved due to their
challenging photometric and geometric variations. In this paper, we propose a
novel dense descriptor, called dense adaptive self-correlation (DASC), to
estimate multi-modal and multi-spectral dense correspondences. Based on an
observation that self-similarity existing within images is robust to imaging
modality variations, we define the descriptor with a series of an adaptive
self-correlation similarity measure between patches sampled by a randomized
receptive field pooling, in which a sampling pattern is obtained using a
discriminative learning. The computational redundancy of dense descriptors is
dramatically reduced by applying fast edge-aware filtering. Furthermore, in
order to address geometric variations including scale and rotation, we propose
a geometry-invariant DASC (GI-DASC) descriptor that effectively leverages the
DASC through a superpixel-based representation. For a quantitative evaluation
of the GI-DASC, we build a novel multi-modal benchmark as varying photometric
and geometric conditions. Experimental results demonstrate the outstanding
performance of the DASC and GI-DASC in many cases of multi-modal and
multi-spectral dense correspondences
NM-Net: Mining Reliable Neighbors for Robust Feature Correspondences
Feature correspondence selection is pivotal to many feature-matching based
tasks in computer vision. Searching for spatially k-nearest neighbors is a
common strategy for extracting local information in many previous works.
However, there is no guarantee that the spatially k-nearest neighbors of
correspondences are consistent because the spatial distribution of false
correspondences is often irregular. To address this issue, we present a
compatibility-specific mining method to search for consistent neighbors.
Moreover, in order to extract and aggregate more reliable features from
neighbors, we propose a hierarchical network named NM-Net with a series of
convolution layers taking the generated graph as input, which is insensitive to
the order of correspondences. Our experimental results have shown the proposed
method achieves the state-of-the-art performance on four datasets with various
inlier ratios and varying numbers of feature consistencies.Comment: IEEE Conference on Computer Vision and Pattern Recognition (CVPR
2019) (oral
The Video Genome
Fast evolution of Internet technologies has led to an explosive growth of
video data available in the public domain and created unprecedented challenges
in the analysis, organization, management, and control of such content. The
problems encountered in video analysis such as identifying a video in a large
database (e.g. detecting pirated content in YouTube), putting together video
fragments, finding similarities and common ancestry between different versions
of a video, have analogous counterpart problems in genetic research and
analysis of DNA and protein sequences. In this paper, we exploit the analogy
between genetic sequences and videos and propose an approach to video analysis
motivated by genomic research. Representing video information as video DNA
sequences and applying bioinformatic algorithms allows to search, match, and
compare videos in large-scale databases. We show an application for
content-based metadata mapping between versions of annotated video
Semi-dense Stereo Matching using Dual CNNs
A robust solution for semi-dense stereo matching is presented. It utilizes
two CNN models for computing stereo matching cost and performing
confidence-based filtering, respectively. Compared to existing CNNs-based
matching cost generation approaches, our method feeds additional global
information into the network so that the learned model can better handle
challenging cases, such as lighting changes and lack of textures. Through
utilizing non-parametric transforms, our method is also more self-reliant than
most existing semi-dense stereo approaches, which rely highly on the adjustment
of parameters. The experimental results based on Middlebury Stereo dataset
demonstrate that the proposed approach outperforms the state-of-the-art
semi-dense stereo approaches
SafeDrive: Enhancing Lane Appearance for Autonomous and Assisted Driving Under Limited Visibility
Autonomous detection of lane markers improves road safety, and purely visual
tracking is desirable for widespread vehicle compatibility and reducing sensor
intrusion, cost, and energy consumption. However, visual approaches are often
ineffective because of a number of factors; e.g., occlusion, poor weather
conditions, and paint wear-off. We present an approach to enhance lane marker
appearance for assisted and autonomous driving, particularly under poor
visibility. Our method, named SafeDrive, attempts to improve visual lane
detection approaches in drastically degraded visual conditions. SafeDrive finds
lane markers in alternate imagery of the road at the vehicle's location and
reconstructs a sparse 3D model of the surroundings. By estimating the geometric
relationship between this 3D model and the current view, the lane markers are
projected onto the visual scene; any lane detection algorithm can be
subsequently used to detect lanes in the resulting image. SafeDrive does not
require additional sensors other than vision and location data. We demonstrate
the effectiveness of our approach on a number of test cases obtained from
actual driving data recorded in urban settings.Comment: arXiv admin note: text overlap with arXiv:1701.0844
Comparative evaluation of 2D feature correspondence selection algorithms
Correspondence selection aiming at seeking correct feature correspondences
from raw feature matches is pivotal for a number of feature-matching-based
tasks. Various 2D (image) correspondence selection algorithms have been
presented with decades of progress. Unfortunately, the lack of an in-depth
evaluation makes it difficult for developers to choose a proper algorithm given
a specific application. This paper fills this gap by evaluating eight 2D
correspondence selection algorithms ranging from classical methods to the most
recent ones on four standard datasets. The diversity of experimental datasets
brings various nuisances including zoom, rotation, blur, viewpoint change, JPEG
compression, light change, different rendering styles and multi-structures for
comprehensive test. To further create different distributions of initial
matches, a set of combinations of detector and descriptor is also taken into
consideration. We measure the quality of a correspondence selection algorithm
from four perspectives, i.e., precision, recall, F-measure and efficiency.
According to evaluation results, the current advantages and limitations of all
considered algorithms are aggregately summarized which could be treated as a
"user guide" for the following developers
Photo Stylistic Brush: Robust Style Transfer via Superpixel-Based Bipartite Graph
With the rapid development of social network and multimedia technology,
customized image and video stylization has been widely used for various
social-media applications. In this paper, we explore the problem of
exemplar-based photo style transfer, which provides a flexible and convenient
way to invoke fantastic visual impression. Rather than investigating some fixed
artistic patterns to represent certain styles as was done in some previous
works, our work emphasizes styles related to a series of visual effects in the
photograph, e.g. color, tone, and contrast. We propose a photo stylistic brush,
an automatic robust style transfer approach based on Superpixel-based BIpartite
Graph (SuperBIG). A two-step bipartite graph algorithm with different
granularity levels is employed to aggregate pixels into superpixels and find
their correspondences. In the first step, with the extracted hierarchical
features, a bipartite graph is constructed to describe the content similarity
for pixel partition to produce superpixels. In the second step, superpixels in
the input/reference image are rematched to form a new superpixel-based
bipartite graph, and superpixel-level correspondences are generated by a
bipartite matching. Finally, the refined correspondence guides SuperBIG to
perform the transformation in a decorrelated color space. Extensive
experimental results demonstrate the effectiveness and robustness of the
proposed method for transferring various styles of exemplar images, even for
some challenging cases, such as night images
Automated Tracking and Estimation for Control of Non-rigid Cloth
This report is a summary of research conducted on cloth tracking for
automated textile manufacturing during a two semester long research course at
Georgia Tech. This work was completed in 2009. Advances in current sensing
technology such as the Microsoft Kinect would now allow me to relax certain
assumptions and generally improve the tracking performance. This is because a
major part of my approach described in this paper was to track features in a 2D
image and use these to estimate the cloth deformation. Innovations such as the
Kinect would improve estimation due to the automatic depth information obtained
when tracking 2D pixel locations. Additionally, higher resolution camera images
would probably give better quality feature tracking. However, although I would
use different technology now to implement this tracker, the algorithm described
and implemented in this paper is still a viable approach which is why I am
publishing this as a tech report for reference. In addition, although the
related work is a bit exhaustive, it will be useful to a reader who is new to
methods for tracking and estimation as well as modeling of cloth
- …