3,243 research outputs found
Multi-object Tracking via End-to-end Tracklet Searching and Ranking
Recent works in multiple object tracking use sequence model to calculate the
similarity score between the detections and the previous tracklets. However,
the forced exposure to ground-truth in the training stage leads to the
training-inference discrepancy problem, i.e., exposure bias, where association
error could accumulate in the inference and make the trajectories drift. In
this paper, we propose a novel method for optimizing tracklet consistency,
which directly takes the prediction errors into account by introducing an
online, end-to-end tracklet search training process. Notably, our methods
directly optimize the whole tracklet score instead of pairwise affinity. With
sequence model as appearance encoders of tracklet, our tracker achieves
remarkable performance gain from conventional tracklet association baseline.
Our methods have also achieved state-of-the-art in MOT15~17 challenge
benchmarks using public detection and online settings
Training Algorithms for Multiple Object Tracking
Multiple object tracking is a crucial Computer Vision Task. It aims at locating objects of interest in the image sequences, maintaining their identities, and identifying their trajectories over time. A large portion of current research focuses on tracking pedestrians, and other types of objects, that often exhibit predictable behaviours, that allow us, as humans, to track those objects. Nevertheless, most existing approaches rely solely on simple affinity or appearance cues to maintain the identities of the tracked objects, ignoring their behaviour. This presents a challenge when objects of interest are invisible or indistinguishable for a long period of time.
In this thesis, we focus on enhancing the quality of multiple object trackers by learning and exploiting the long ranging models of object behaviour. Such behaviours come in different forms, be it a physical model of the ball motion, model of interaction between the ball and the players in sports or motion patterns of pedestrians or cars, that is specific to a particular scene.
In the first part of the thesis, we begin with the task of tracking the ball and the players in team sports. We propose a model that tracks both types of objects simultaneously, while respecting the physical laws of ball motion when in free fall, and interaction constraints that appear when players are in the possession of the ball. We show that both the presence of the behaviour models and the simultaneous solution of both tasks aids the performance of tracking, in basketball, volleyball, and soccer.
In the second part of the thesis, we focus on motion models of pedestrian and car behaviour that emerge in the outdoor scenes. Such motion models are inherently global, as they determine where people starting from one location tend to end up much later in time. Imposing such global constraints while keeping the tracking problem tractable presents a challenge, which is why many approaches rely on local affinity measures. We formulate a problem of simultaneously tracking the objects and learning their behaviour patterns. We show that our approach, when applied in conjunction with a number of state-of-the-art trackers, improves their performance, by forcing their output to follow the learned motion patterns of the scene.
In the last part of the thesis, we study a new emerging class of models for multiple object tracking, that appeared recently due to availability of large scale datasets - sequence models for multiple object tracking. While such models could potentially learn arbitrarily long ranging behaviours, training them presents several challenges. We propose a training scheme and a loss function that allows to significantly improve the quality of training of such models. We demonstrate that simply using our training scheme and loss allows to learn scoring function for trajectories, which enables us to outperform state-of-the-art methods on several tracking benchmarks
Simple Unsupervised Multi-Object Tracking
Multi-object tracking has seen a lot of progress recently, albeit with
substantial annotation costs for developing better and larger labeled datasets.
In this work, we remove the need for annotated datasets by proposing an
unsupervised re-identification network, thus sidestepping the labeling costs
entirely, required for training. Given unlabeled videos, our proposed method
(SimpleReID) first generates tracking labels using SORT and trains a ReID
network to predict the generated labels using crossentropy loss. We demonstrate
that SimpleReID performs substantially better than simpler alternatives, and we
recover the full performance of its supervised counterpart consistently across
diverse tracking frameworks. The observations are unusual because unsupervised
ReID is not expected to excel in crowded scenarios with occlusions, and drastic
viewpoint changes. By incorporating our unsupervised SimpleReID with
CenterTrack trained on augmented still images, we establish a new
state-of-the-art performance on popular datasets like MOT16/17 without using
tracking supervision, beating current best (CenterTrack) by 0.2-0.3 MOTA and
4.4-4.8 IDF1 scores. We further provide evidence for limited scope for
improvement in IDF1 scores beyond our unsupervised ReID in the studied
settings. Our investigation suggests reconsideration towards more
sophisticated, supervised, end-to-end trackers by showing promise in simpler
unsupervised alternatives
A Deep Learning Bidirectional Temporal Tracking Algorithm for Automated Blood Cell Counting from Non-invasive Capillaroscopy Videos
Oblique back-illumination capillaroscopy has recently been introduced as a
method for high-quality, non-invasive blood cell imaging in human capillaries.
To make this technique practical for clinical blood cell counting, solutions
for automatic processing of acquired videos are needed. Here, we take the first
step towards this goal, by introducing a deep learning multi-cell tracking
model, named CycleTrack, which achieves accurate blood cell counting from
capillaroscopic videos. CycleTrack combines two simple online tracking models,
SORT and CenterTrack, and is tailored to features of capillary blood cell flow.
Blood cells are tracked by displacement vectors in two opposing temporal
directions (forward- and backward-tracking) between consecutive frames. This
approach yields accurate tracking despite rapidly moving and deforming blood
cells. The proposed model outperforms other baseline trackers, achieving 65.57%
Multiple Object Tracking Accuracy and 73.95% ID F1 score on test videos.
Compared to manual blood cell counting, CycleTrack achieves 96.58 2.43%
cell counting accuracy among 8 test videos with 1000 frames each compared to
93.45% and 77.02% accuracy for independent CenterTrack and SORT almost without
additional time expense. It takes 800s to track and count approximately 8000
blood cells from 9,600 frames captured in a typical one-minute video. Moreover,
the blood cell velocity measured by CycleTrack demonstrates a consistent,
pulsatile pattern within the physiological range of heart rate. Lastly, we
discuss future improvements for the CycleTrack framework, which would enable
clinical translation of the oblique back-illumination microscope towards a
real-time and non-invasive point-of-care blood cell counting and analyzing
technology.Comment: 10 pages, 6 figure
DyGLIP: A Dynamic Graph Model with Link Prediction for Accurate Multi-Camera Multiple Object Tracking
Multi-Camera Multiple Object Tracking (MC-MOT) is a significant computer
vision problem due to its emerging applicability in several real-world
applications. Despite a large number of existing works, solving the data
association problem in any MC-MOT pipeline is arguably one of the most
challenging tasks. Developing a robust MC-MOT system, however, is still highly
challenging due to many practical issues such as inconsistent lighting
conditions, varying object movement patterns, or the trajectory occlusions of
the objects between the cameras. To address these problems, this work,
therefore, proposes a new Dynamic Graph Model with Link Prediction (DyGLIP)
approach to solve the data association task. Compared to existing methods, our
new model offers several advantages, including better feature representations
and the ability to recover from lost tracks during camera transitions.
Moreover, our model works gracefully regardless of the overlapping ratios
between the cameras. Experimental results show that we outperform existing
MC-MOT algorithms by a large margin on several practical datasets. Notably, our
model works favorably on online settings but can be extended to an incremental
approach for large-scale datasets.Comment: accepted at CVPR 202
SoDA: Multi-Object Tracking with Soft Data Association
Robust multi-object tracking (MOT) is a prerequisite fora safe deployment of
self-driving cars. Tracking objects, however, remains a highly challenging
problem, especially in cluttered autonomous driving scenes in which objects
tend to interact with each other in complex ways and frequently get occluded.
We propose a novel approach to MOT that uses attention to compute track
embeddings that encode the spatiotemporal dependencies between observed
objects. This attention measurement encoding allows our model to relax hard
data associations, which may lead to unrecoverable errors. Instead, our model
aggregates information from all object detections via soft data associations.
The resulting latent space representation allows our model to learn to reason
about occlusions in a holistic data-driven way and maintain track estimates for
objects even when they are occluded. Our experimental results on the Waymo
OpenDataset suggest that our approach leverages modern large-scale datasets and
performs favorably compared to the state of the art in visual multi-object
tracking
Robust Online Multi-target Visual Tracking using a HISP Filter with Discriminative Deep Appearance Learning
We propose a novel online multi-target visual tracker based on the recently
developed Hypothesized and Independent Stochastic Population (HISP) filter. The
HISP filter combines advantages of traditional tracking approaches like MHT and
point-process-based approaches like PHD filter, and it has linear complexity
while maintaining track identities. We apply this filter for tracking multiple
targets in video sequences acquired under varying environmental conditions and
targets density using a tracking-by-detection approach. We also adopt deep CNN
appearance representation by training a verification-identification network
(VerIdNet) on large-scale person re-identification data sets. We construct an
augmented likelihood in a principled manner using this deep CNN appearance
features and spatio-temporal information. Furthermore, we solve the problem of
two or more targets having identical label considering the weight propagated
with each confirmed hypothesis. Extensive experiments on MOT16 and MOT17
benchmark data sets show that our tracker significantly outperforms several
state-of-the-art trackers in terms of tracking accuracy
ArTIST: Autoregressive Trajectory Inpainting and Scoring for Tracking
One of the core components in online multiple object tracking (MOT)
frameworks is associating new detections with existing tracklets, typically
done via a scoring function. Despite the great advances in MOT, designing a
reliable scoring function remains a challenge. In this paper, we introduce a
probabilistic autoregressive generative model to score tracklet proposals by
directly measuring the likelihood that a tracklet represents natural motion.
One key property of our model is its ability to generate multiple likely
futures of a tracklet given partial observations. This allows us to not only
score tracklets but also effectively maintain existing tracklets when the
detector fails to detect some objects even for a long time, e.g., due to
occlusion, by sampling trajectories so as to inpaint the gaps caused by
misdetection. Our experiments demonstrate the effectiveness of our approach to
scoring and inpainting tracklets on several MOT benchmark datasets. We
additionally show the generality of our generative model by using it to produce
future representations in the challenging task of human motion prediction
Interactive effects of orthography and semantics in Chinese picture naming
Posters - Language Production/Writing: abstract no. 4035Picture-naming performance in English and Dutch is enhanced by presentation of a word that is similar in form to the picture name. However, it is unclear whether facilitation has an orthographic or a phonological locus. We investigated the loci of the facilitation effect in Cantonese Chinese speakers by manipulating—at three SOAs (2100, 0, and 1100 msec)—semantic, orthographic, and phonological similarity. We identified an effect of orthographic facilitation that was independent of and larger than phonological facilitation across all SOAs. Semantic interference was also found at SOAs of 2100 and 0 msec. Critically, an interaction of semantics and orthography was observed at an SOA of 1100 msec. This interaction suggests that independent effects of orthographic facilitation on picture naming are located either at the level of semantic processing or at the lemma level and are not due to the activation of picture name segments at the level of phonological retrieval.postprin
- …