77 research outputs found
AI Oriented Large-Scale Video Management for Smart City: Technologies, Standards and Beyond
Deep learning has achieved substantial success in a series of tasks in
computer vision. Intelligent video analysis, which can be broadly applied to
video surveillance in various smart city applications, can also be driven by
such powerful deep learning engines. To practically facilitate deep neural
network models in the large-scale video analysis, there are still unprecedented
challenges for the large-scale video data management. Deep feature coding,
instead of video coding, provides a practical solution for handling the
large-scale video surveillance data. To enable interoperability in the context
of deep feature coding, standardization is urgent and important. However, due
to the explosion of deep learning algorithms and the particularity of feature
coding, there are numerous remaining problems in the standardization process.
This paper envisions the future deep feature coding standard for the AI
oriented large-scale video management, and discusses existing techniques,
standards and possible solutions for these open problems.Comment: 8 pages, 8 figures, 5 table
VOC-ReID: Vehicle Re-identification based on Vehicle-Orientation-Camera
Vehicle re-identification is a challenging task due to high intra-class
variances and small inter-class variances. In this work, we focus on the
failure cases caused by similar background and shape. They pose serve bias on
similarity, making it easier to neglect fine-grained information. To reduce the
bias, we propose an approach named VOC-ReID, taking the triplet
vehicle-orientation-camera as a whole and reforming background/shape similarity
as camera/orientation re-identification. At first, we train models for vehicle,
orientation and camera re-identification respectively. Then we use orientation
and camera similarity as penalty to get final similarity. Besides, we propose a
high performance baseline boosted by bag of tricks and weakly supervised data
augmentation. Our algorithm achieves the second place in vehicle
re-identification at the NVIDIA AI City Challenge 2020.Comment: AICity2020 Challenge, CVPR 2020 workshop, code avaible at github(link
in abstract
Exploring Spatial Significance via Hybrid Pyramidal Graph Network for Vehicle Re-identification
Existing vehicle re-identification methods commonly use spatial pooling
operations to aggregate feature maps extracted via off-the-shelf backbone
networks. They ignore exploring the spatial significance of feature maps,
eventually degrading the vehicle re-identification performance. In this paper,
firstly, an innovative spatial graph network (SGN) is proposed to elaborately
explore the spatial significance of feature maps. The SGN stacks multiple
spatial graphs (SGs). Each SG assigns feature map's elements as nodes and
utilizes spatial neighborhood relationships to determine edges among nodes.
During the SGN's propagation, each node and its spatial neighbors on an SG are
aggregated to the next SG. On the next SG, each aggregated node is re-weighted
with a learnable parameter to find the significance at the corresponding
location. Secondly, a novel pyramidal graph network (PGN) is designed to
comprehensively explore the spatial significance of feature maps at multiple
scales. The PGN organizes multiple SGNs in a pyramidal manner and makes each
SGN handles feature maps of a specific scale. Finally, a hybrid pyramidal graph
network (HPGN) is developed by embedding the PGN behind a ResNet-50 based
backbone network. Extensive experiments on three large scale vehicle databases
(i.e., VeRi776, VehicleID, and VeRi-Wild) demonstrate that the proposed HPGN is
superior to state-of-the-art vehicle re-identification approaches
Cross Domain Knowledge Learning with Dual-branch Adversarial Network for Vehicle Re-identification
The widespread popularization of vehicles has facilitated all people's life
during the last decades. However, the emergence of a large number of vehicles
poses the critical but challenging problem of vehicle re-identification (reID).
Till now, for most vehicle reID algorithms, both the training and testing
processes are conducted on the same annotated datasets under supervision.
However, even a well-trained model will still cause fateful performance drop
due to the severe domain bias between the trained dataset and the real-world
scenes.
To address this problem, this paper proposes a domain adaptation framework
for vehicle reID (DAVR), which narrows the cross-domain bias by fully
exploiting the labeled data from the source domain to adapt the target domain.
DAVR develops an image-to-image translation network named Dual-branch
Adversarial Network (DAN), which could promote the images from the source
domain (well-labeled) to learn the style of target domain (unlabeled) without
any annotation and preserve identity information from source domain. Then the
generated images are employed to train the vehicle reID model by a proposed
attention-based feature learning model with more reasonable styles. Through the
proposed framework, the well-trained reID model has better domain adaptation
ability for various scenes in real-world situations. Comprehensive experimental
results have demonstrated that our proposed DAVR can achieve excellent
performances on both VehicleID dataset and VeRi-776 dataset.Comment: arXiv admin note: substantial text overlap with arXiv:1903.0786
A survey of advances in vision-based vehicle re-identification
Vehicle re-identification (V-reID) has become significantly popular in the
community due to its applications and research significance. In particular, the
V-reID is an important problem that still faces numerous open challenges. This
paper reviews different V-reID methods including sensor based methods, hybrid
methods, and vision based methods which are further categorized into
hand-crafted feature based methods and deep feature based methods. The vision
based methods make the V-reID problem particularly interesting, and our review
systematically addresses and evaluates these methods for the first time. We
conduct experiments on four comprehensive benchmark datasets and compare the
performances of recent hand-crafted feature based methods and deep feature
based methods. We present the detail analysis of these methods in terms of mean
average precision (mAP) and cumulative matching curve (CMC). These analyses
provide objective insight into the strengths and weaknesses of these methods.
We also provide the details of different V-reID datasets and critically discuss
the challenges and future trends of V-reID methods.Comment: 17 pages; 21 figures; journal pape
Deep Visual Re-Identification with Confidence
Transportation systems often rely on understanding the flow of vehicles or
pedestrian. From traffic monitoring at the city scale, to commuters in train
terminals, recent progress in sensing technology make it possible to use
cameras to better understand the demand, i.e., better track moving agents
(e.g., vehicles and pedestrians). Whether the cameras are mounted on drones,
vehicles, or fixed in the built environments, they inevitably remain scatter.
We need to develop the technology to re-identify the same agents across images
captured from non-overlapping field-of-views, referred to as the visual
re-identification task. State-of-the-art methods learn a neural network based
representation trained with the cross-entropy loss function. We argue that such
loss function is not suited for the visual re-identification task hence propose
to model confidence in the representation learning framework. We show the
impact of our confidence-based learning framework with three methods: label
smoothing, confidence penalty, and deep variational information bottleneck.
They all show a boost in performance validating our claim. Our contribution is
generic to any agent of interest, i.e., vehicles or pedestrians, and outperform
highly specialized state-of-the-art methods across 5 datasets. The source code
and models are shared towards an open science mission.Comment: Show improvements on vehicle Re-ID datasets; Methods Clarifie
Discriminative Feature and Dictionary Learning with Part-aware Model for Vehicle Re-identification
With the development of smart cities, urban surveillance video analysis will
play a further significant role in intelligent transportation systems.
Identifying the same target vehicle in large datasets from non-overlapping
cameras should be highlighted, which has grown into a hot topic in promoting
intelligent transportation systems. However, vehicle re-identification (re-ID)
technology is a challenging task since vehicles of the same design or
manufacturer show similar appearance. To fill these gaps, we tackle this
challenge by proposing Triplet Center Loss based Part-aware Model (TCPM) that
leverages the discriminative features in part details of vehicles to refine the
accuracy of vehicle re-identification. TCPM base on part discovery is that
partitions the vehicle from horizontal and vertical directions to strengthen
the details of the vehicle and reinforce the internal consistency of the parts.
In addition, to eliminate intra-class differences in local regions of the
vehicle, we propose external memory modules to emphasize the consistency of
each part to learn the discriminating features, which forms a global dictionary
over all categories in dataset. In TCPM, triplet-center loss is introduced to
ensure each part of vehicle features extracted has intra-class consistency and
inter-class separability. Experimental results show that our proposed TCPM has
an enormous preference over the existing state-of-the-art methods on benchmark
datasets VehicleID and VeRi-776
Parsing-based View-aware Embedding Network for Vehicle Re-Identification
Vehicle Re-Identification is to find images of the same vehicle from various
views in the cross-camera scenario. The main challenges of this task are the
large intra-instance distance caused by different views and the subtle
inter-instance discrepancy caused by similar vehicles. In this paper, we
propose a parsing-based view-aware embedding network (PVEN) to achieve the
view-aware feature alignment and enhancement for vehicle ReID. First, we
introduce a parsing network to parse a vehicle into four different views, and
then align the features by mask average pooling. Such alignment provides a
fine-grained representation of the vehicle. Second, in order to enhance the
view-aware features, we design a common-visible attention to focus on the
common visible views, which not only shortens the distance among
intra-instances, but also enlarges the discrepancy of inter-instances. The PVEN
helps capture the stable discriminative information of vehicle under different
views. The experiments conducted on three datasets show that our model
outperforms state-of-the-art methods by a large margin.Comment: 10 pages, 6 figure
Vehicle Re-identification with Viewpoint-aware Metric Learning
This paper considers vehicle re-identification (re-ID) problem. The extreme
viewpoint variation (up to 180 degrees) poses great challenges for existing
approaches. Inspired by the behavior in human's recognition process, we propose
a novel viewpoint-aware metric learning approach. It learns two metrics for
similar viewpoints and different viewpoints in two feature spaces,
respectively, giving rise to viewpoint-aware network (VANet). During training,
two types of constraints are applied jointly. During inference, viewpoint is
firstly estimated and the corresponding metric is used. Experimental results
confirm that VANet significantly improves re-ID accuracy, especially when the
pair is observed from different viewpoints. Our method establishes the new
state-of-the-art on two benchmarks.Comment: Accepted by ICCV 201
A Two-Stream Siamese Neural Network for Vehicle Re-Identification by Using Non-Overlapping Cameras
We describe in this paper a Two-Stream Siamese Neural Network for vehicle
re-identification. The proposed network is fed simultaneously with small coarse
patches of the vehicle shape's, with 96 x 96 pixels, in one stream, and fine
features extracted from license plate patches, easily readable by humans, with
96 x 48 pixels, in the other one. Then, we combined the strengths of both
streams by merging the Siamese distance descriptors with a sequence of fully
connected layers, as an attempt to tackle a major problem in the field, false
alarms caused by a huge number of car design and models with nearly the same
appearance or by similar license plate strings. In our experiments, with 2
hours of videos containing 2982 vehicles, extracted from two low-cost cameras
in the same roadway, 546 ft away, we achieved a F-measure and accuracy of 92.6%
and 98.7%, respectively. We show that the proposed network, available at
https://github.com/icarofua/siamese-two-stream, outperforms other One-Stream
architectures, even if they use higher resolution image features.Comment: 5 pages, 6 figures, To appear in IEEE International Conference on
Image Processing (ICIP), Sept. 22-25, 2019, Taipei, Taiwa
- …