209 research outputs found
Online Multi-Object Tracking Using CNN-based Single Object Tracker with Spatial-Temporal Attention Mechanism
In this paper, we propose a CNN-based framework for online MOT. This
framework utilizes the merits of single object trackers in adapting appearance
models and searching for target in the next frame. Simply applying single
object tracker for MOT will encounter the problem in computational efficiency
and drifted results caused by occlusion. Our framework achieves computational
efficiency by sharing features and using ROI-Pooling to obtain individual
features for each target. Some online learned target-specific CNN layers are
used for adapting the appearance model for each target. In the framework, we
introduce spatial-temporal attention mechanism (STAM) to handle the drift
caused by occlusion and interaction among targets. The visibility map of the
target is learned and used for inferring the spatial attention map. The spatial
attention map is then applied to weight the features. Besides, the occlusion
status can be estimated from the visibility map, which controls the online
updating process via weighted loss on training samples with different occlusion
statuses in different frames. It can be considered as temporal attention
mechanism. The proposed algorithm achieves 34.3% and 46.0% in MOTA on
challenging MOT15 and MOT16 benchmark dataset respectively.Comment: Accepted at International Conference on Computer Vision (ICCV) 201
Detection Recovery in Online Multi-Object Tracking with Sparse Graph Tracker
In existing joint detection and tracking methods, pairwise relational
features are used to match previous tracklets to current detections. However,
the features may not be discriminative enough for a tracker to identify a
target from a large number of detections. Selecting only high-scored detections
for tracking may lead to missed detections whose confidence score is low.
Consequently, in the online setting, this results in disconnections of
tracklets which cannot be recovered. In this regard, we present Sparse Graph
Tracker (SGT), a novel online graph tracker using higher-order relational
features which are more discriminative by aggregating the features of
neighboring detections and their relations. SGT converts video data into a
graph where detections, their connections, and the relational features of two
connected nodes are represented by nodes, edges, and edge features,
respectively. The strong edge features allow SGT to track targets with tracking
candidates selected by top-K scored detections with large K. As a result, even
low-scored detections can be tracked, and the missed detections are also
recovered. The robustness of K value is shown through the extensive
experiments. In the MOT16/17/20 and HiEve Challenge, SGT outperforms the
state-of-the-art trackers with real-time inference speed. Especially, a large
improvement in MOTA is shown in the MOT20 and HiEve Challenge. Code is
available at https://github.com/HYUNJS/SGT.Comment: Accepted to WACV 2023; fix figure
Deconvolutional networks for point-cloud vehicle detection and tracking in driving scenarios
© 20xx IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.Vehicle detection and tracking is a core ingredient for developing autonomous driving applications in urban scenarios. Recent image-based Deep Learning (DL) techniques are obtaining breakthrough results in these perceptive tasks. However, DL research has not yet advanced much towards processing 3D point clouds from lidar range-finders. These sensors are very common in autonomous vehicles since, despite not providing as semantically rich information as images, their performance is more robust under harsh weather conditions than vision sensors. In this paper we present a full vehicle detection and tracking system that works with 3D lidar information only. Our detection step uses a Convolutional Neural Network (CNN) that receives as input a featured representation of the 3D information provided by a Velodyne HDL-64 sensor and returns a per-point classification of whether it belongs to a vehicle or not. The classified point cloud is then geometrically processed to generate observations for a multi-object tracking system implemented via a number of Multi-Hypothesis Extended Kalman Filters (MH-EKF) that estimate the position and velocity of the surrounding vehicles. The system is thoroughly evaluated on the KITTI tracking dataset, and we show the performance boost provided by our CNN-based vehicle detector over a standard geometric approach. Our lidar-based approach uses about a 4% of the data needed for an image-based detector with similarly competitive results.Peer ReviewedPostprint (author's final draft
Learning Non-Uniform Hypergraph for Multi-Object Tracking
The majority of Multi-Object Tracking (MOT) algorithms based on the
tracking-by-detection scheme do not use higher order dependencies among objects
or tracklets, which makes them less effective in handling complex scenarios. In
this work, we present a new near-online MOT algorithm based on non-uniform
hypergraph, which can model different degrees of dependencies among tracklets
in a unified objective. The nodes in the hypergraph correspond to the tracklets
and the hyperedges with different degrees encode various kinds of dependencies
among them. Specifically, instead of setting the weights of hyperedges with
different degrees empirically, they are learned automatically using the
structural support vector machine algorithm (SSVM). Several experiments are
carried out on various challenging datasets (i.e., PETS09, ParkingLot sequence,
SubwayFace, and MOT16 benchmark), to demonstrate that our method achieves
favorable performance against the state-of-the-art MOT methods.Comment: 11 pages, 4 figures, accepted by AAAI 201
Protein Tracking by CNN-Based Candidate Pruning and Two-Step Linking with Bayesian Network
Protein trafficking plays a vital role in understanding many biological
processes and disease. Automated tracking of protein
vesicles is challenging due to their erratic behaviour, changing
appearance, and visual clutter. In this paper we present
a novel tracking approach which utilizes a two-step linking
process that exploits a probabilistic graphical model to predict
tracklet linkage. The vesicles are initially detected with
help of a candidate selection process, where the candidates
are identified by a multi-scale spot enhancing filter. Subsequently,
these candidates are pruned and selected by a light
weight convolutional neural network. At the linking stage,
the tracklets are formed based on the distance and the detection
assignment which is implemented via combinatorial
optimization algorithm. Each tracklet is described by a number
of parameters used to evaluate the probability of tracklets
connection by the inference over the Bayesian network. The
tracking results are presented for confocal fluorescence microscopy
data of protein trafficking in epithelial cells. The
proposed method achieves a root mean square error (RMSE)
of 1.39 for the vesicle localisation and of 0.7 representing
the degree of track matching with ground truth. The presented
method is also evaluated against the state-of-the-art “Trackmate“
framework
- …