10 research outputs found
VehicleNet: Learning Robust Visual Representation for Vehicle Re-identification
One fundamental challenge of vehicle re-identification (re-id) is to learn
robust and discriminative visual representation, given the significant
intra-class vehicle variations across different camera views. As the existing
vehicle datasets are limited in terms of training images and viewpoints, we
propose to build a unique large-scale vehicle dataset (called VehicleNet) by
harnessing four public vehicle datasets, and design a simple yet effective
two-stage progressive approach to learning more robust visual representation
from VehicleNet. The first stage of our approach is to learn the generic
representation for all domains (i.e., source vehicle datasets) by training with
the conventional classification loss. This stage relaxes the full alignment
between the training and testing domains, as it is agnostic to the target
vehicle domain. The second stage is to fine-tune the trained model purely based
on the target vehicle set, by minimizing the distribution discrepancy between
our VehicleNet and any target domain. We discuss our proposed multi-source
dataset VehicleNet and evaluate the effectiveness of the two-stage progressive
representation learning through extensive experiments. We achieve the
state-of-art accuracy of 86.07% mAP on the private test set of AICity
Challenge, and competitive results on two other public vehicle re-id datasets,
i.e., VeRi-776 and VehicleID. We hope this new VehicleNet dataset and the
learned robust representations can pave the way for vehicle re-id in the
real-world environments
Attribute Descent: Simulating Object-Centric Datasets on the Content Level and Beyond
This article aims to use graphic engines to simulate a large number of
training data that have free annotations and possibly strongly resemble to
real-world data. Between synthetic and real, a two-level domain gap exists,
involving content level and appearance level. While the latter is concerned
with appearance style, the former problem arises from a different mechanism,
i.e, content mismatch in attributes such as camera viewpoint, object placement
and lighting conditions. In contrast to the widely-studied appearance-level
gap, the content-level discrepancy has not been broadly studied. To address the
content-level misalignment, we propose an attribute descent approach that
automatically optimizes engine attributes to enable synthetic data to
approximate real-world data. We verify our method on object-centric tasks,
wherein an object takes up a major portion of an image. In these tasks, the
search space is relatively small, and the optimization of each attribute yields
sufficiently obvious supervision signals. We collect a new synthetic asset
VehicleX, and reformat and reuse existing the synthetic assets ObjectX and
PersonX. Extensive experiments on image classification and object
re-identification confirm that adapted synthetic data can be effectively used
in three scenarios: training with synthetic data only, training data
augmentation and numerically understanding dataset content.Comment: Preprint, Accepted to IEEE Trans on PAM