3,243 research outputs found

    Multi-object Tracking via End-to-end Tracklet Searching and Ranking

    Full text link
    Recent works in multiple object tracking use sequence model to calculate the similarity score between the detections and the previous tracklets. However, the forced exposure to ground-truth in the training stage leads to the training-inference discrepancy problem, i.e., exposure bias, where association error could accumulate in the inference and make the trajectories drift. In this paper, we propose a novel method for optimizing tracklet consistency, which directly takes the prediction errors into account by introducing an online, end-to-end tracklet search training process. Notably, our methods directly optimize the whole tracklet score instead of pairwise affinity. With sequence model as appearance encoders of tracklet, our tracker achieves remarkable performance gain from conventional tracklet association baseline. Our methods have also achieved state-of-the-art in MOT15~17 challenge benchmarks using public detection and online settings

    Training Algorithms for Multiple Object Tracking

    Get PDF
    Multiple object tracking is a crucial Computer Vision Task. It aims at locating objects of interest in the image sequences, maintaining their identities, and identifying their trajectories over time. A large portion of current research focuses on tracking pedestrians, and other types of objects, that often exhibit predictable behaviours, that allow us, as humans, to track those objects. Nevertheless, most existing approaches rely solely on simple affinity or appearance cues to maintain the identities of the tracked objects, ignoring their behaviour. This presents a challenge when objects of interest are invisible or indistinguishable for a long period of time. In this thesis, we focus on enhancing the quality of multiple object trackers by learning and exploiting the long ranging models of object behaviour. Such behaviours come in different forms, be it a physical model of the ball motion, model of interaction between the ball and the players in sports or motion patterns of pedestrians or cars, that is specific to a particular scene. In the first part of the thesis, we begin with the task of tracking the ball and the players in team sports. We propose a model that tracks both types of objects simultaneously, while respecting the physical laws of ball motion when in free fall, and interaction constraints that appear when players are in the possession of the ball. We show that both the presence of the behaviour models and the simultaneous solution of both tasks aids the performance of tracking, in basketball, volleyball, and soccer. In the second part of the thesis, we focus on motion models of pedestrian and car behaviour that emerge in the outdoor scenes. Such motion models are inherently global, as they determine where people starting from one location tend to end up much later in time. Imposing such global constraints while keeping the tracking problem tractable presents a challenge, which is why many approaches rely on local affinity measures. We formulate a problem of simultaneously tracking the objects and learning their behaviour patterns. We show that our approach, when applied in conjunction with a number of state-of-the-art trackers, improves their performance, by forcing their output to follow the learned motion patterns of the scene. In the last part of the thesis, we study a new emerging class of models for multiple object tracking, that appeared recently due to availability of large scale datasets - sequence models for multiple object tracking. While such models could potentially learn arbitrarily long ranging behaviours, training them presents several challenges. We propose a training scheme and a loss function that allows to significantly improve the quality of training of such models. We demonstrate that simply using our training scheme and loss allows to learn scoring function for trajectories, which enables us to outperform state-of-the-art methods on several tracking benchmarks

    Simple Unsupervised Multi-Object Tracking

    Full text link
    Multi-object tracking has seen a lot of progress recently, albeit with substantial annotation costs for developing better and larger labeled datasets. In this work, we remove the need for annotated datasets by proposing an unsupervised re-identification network, thus sidestepping the labeling costs entirely, required for training. Given unlabeled videos, our proposed method (SimpleReID) first generates tracking labels using SORT and trains a ReID network to predict the generated labels using crossentropy loss. We demonstrate that SimpleReID performs substantially better than simpler alternatives, and we recover the full performance of its supervised counterpart consistently across diverse tracking frameworks. The observations are unusual because unsupervised ReID is not expected to excel in crowded scenarios with occlusions, and drastic viewpoint changes. By incorporating our unsupervised SimpleReID with CenterTrack trained on augmented still images, we establish a new state-of-the-art performance on popular datasets like MOT16/17 without using tracking supervision, beating current best (CenterTrack) by 0.2-0.3 MOTA and 4.4-4.8 IDF1 scores. We further provide evidence for limited scope for improvement in IDF1 scores beyond our unsupervised ReID in the studied settings. Our investigation suggests reconsideration towards more sophisticated, supervised, end-to-end trackers by showing promise in simpler unsupervised alternatives

    A Deep Learning Bidirectional Temporal Tracking Algorithm for Automated Blood Cell Counting from Non-invasive Capillaroscopy Videos

    Full text link
    Oblique back-illumination capillaroscopy has recently been introduced as a method for high-quality, non-invasive blood cell imaging in human capillaries. To make this technique practical for clinical blood cell counting, solutions for automatic processing of acquired videos are needed. Here, we take the first step towards this goal, by introducing a deep learning multi-cell tracking model, named CycleTrack, which achieves accurate blood cell counting from capillaroscopic videos. CycleTrack combines two simple online tracking models, SORT and CenterTrack, and is tailored to features of capillary blood cell flow. Blood cells are tracked by displacement vectors in two opposing temporal directions (forward- and backward-tracking) between consecutive frames. This approach yields accurate tracking despite rapidly moving and deforming blood cells. The proposed model outperforms other baseline trackers, achieving 65.57% Multiple Object Tracking Accuracy and 73.95% ID F1 score on test videos. Compared to manual blood cell counting, CycleTrack achieves 96.58 ±\pm 2.43% cell counting accuracy among 8 test videos with 1000 frames each compared to 93.45% and 77.02% accuracy for independent CenterTrack and SORT almost without additional time expense. It takes 800s to track and count approximately 8000 blood cells from 9,600 frames captured in a typical one-minute video. Moreover, the blood cell velocity measured by CycleTrack demonstrates a consistent, pulsatile pattern within the physiological range of heart rate. Lastly, we discuss future improvements for the CycleTrack framework, which would enable clinical translation of the oblique back-illumination microscope towards a real-time and non-invasive point-of-care blood cell counting and analyzing technology.Comment: 10 pages, 6 figure

    DyGLIP: A Dynamic Graph Model with Link Prediction for Accurate Multi-Camera Multiple Object Tracking

    Get PDF
    Multi-Camera Multiple Object Tracking (MC-MOT) is a significant computer vision problem due to its emerging applicability in several real-world applications. Despite a large number of existing works, solving the data association problem in any MC-MOT pipeline is arguably one of the most challenging tasks. Developing a robust MC-MOT system, however, is still highly challenging due to many practical issues such as inconsistent lighting conditions, varying object movement patterns, or the trajectory occlusions of the objects between the cameras. To address these problems, this work, therefore, proposes a new Dynamic Graph Model with Link Prediction (DyGLIP) approach to solve the data association task. Compared to existing methods, our new model offers several advantages, including better feature representations and the ability to recover from lost tracks during camera transitions. Moreover, our model works gracefully regardless of the overlapping ratios between the cameras. Experimental results show that we outperform existing MC-MOT algorithms by a large margin on several practical datasets. Notably, our model works favorably on online settings but can be extended to an incremental approach for large-scale datasets.Comment: accepted at CVPR 202

    SoDA: Multi-Object Tracking with Soft Data Association

    Full text link
    Robust multi-object tracking (MOT) is a prerequisite fora safe deployment of self-driving cars. Tracking objects, however, remains a highly challenging problem, especially in cluttered autonomous driving scenes in which objects tend to interact with each other in complex ways and frequently get occluded. We propose a novel approach to MOT that uses attention to compute track embeddings that encode the spatiotemporal dependencies between observed objects. This attention measurement encoding allows our model to relax hard data associations, which may lead to unrecoverable errors. Instead, our model aggregates information from all object detections via soft data associations. The resulting latent space representation allows our model to learn to reason about occlusions in a holistic data-driven way and maintain track estimates for objects even when they are occluded. Our experimental results on the Waymo OpenDataset suggest that our approach leverages modern large-scale datasets and performs favorably compared to the state of the art in visual multi-object tracking

    Robust Online Multi-target Visual Tracking using a HISP Filter with Discriminative Deep Appearance Learning

    Full text link
    We propose a novel online multi-target visual tracker based on the recently developed Hypothesized and Independent Stochastic Population (HISP) filter. The HISP filter combines advantages of traditional tracking approaches like MHT and point-process-based approaches like PHD filter, and it has linear complexity while maintaining track identities. We apply this filter for tracking multiple targets in video sequences acquired under varying environmental conditions and targets density using a tracking-by-detection approach. We also adopt deep CNN appearance representation by training a verification-identification network (VerIdNet) on large-scale person re-identification data sets. We construct an augmented likelihood in a principled manner using this deep CNN appearance features and spatio-temporal information. Furthermore, we solve the problem of two or more targets having identical label considering the weight propagated with each confirmed hypothesis. Extensive experiments on MOT16 and MOT17 benchmark data sets show that our tracker significantly outperforms several state-of-the-art trackers in terms of tracking accuracy

    ArTIST: Autoregressive Trajectory Inpainting and Scoring for Tracking

    Full text link
    One of the core components in online multiple object tracking (MOT) frameworks is associating new detections with existing tracklets, typically done via a scoring function. Despite the great advances in MOT, designing a reliable scoring function remains a challenge. In this paper, we introduce a probabilistic autoregressive generative model to score tracklet proposals by directly measuring the likelihood that a tracklet represents natural motion. One key property of our model is its ability to generate multiple likely futures of a tracklet given partial observations. This allows us to not only score tracklets but also effectively maintain existing tracklets when the detector fails to detect some objects even for a long time, e.g., due to occlusion, by sampling trajectories so as to inpaint the gaps caused by misdetection. Our experiments demonstrate the effectiveness of our approach to scoring and inpainting tracklets on several MOT benchmark datasets. We additionally show the generality of our generative model by using it to produce future representations in the challenging task of human motion prediction

    Interactive effects of orthography and semantics in Chinese picture naming

    Get PDF
    Posters - Language Production/Writing: abstract no. 4035Picture-naming performance in English and Dutch is enhanced by presentation of a word that is similar in form to the picture name. However, it is unclear whether facilitation has an orthographic or a phonological locus. We investigated the loci of the facilitation effect in Cantonese Chinese speakers by manipulating—at three SOAs (2100, 0, and 1100 msec)—semantic, orthographic, and phonological similarity. We identified an effect of orthographic facilitation that was independent of and larger than phonological facilitation across all SOAs. Semantic interference was also found at SOAs of 2100 and 0 msec. Critically, an interaction of semantics and orthography was observed at an SOA of 1100 msec. This interaction suggests that independent effects of orthographic facilitation on picture naming are located either at the level of semantic processing or at the lemma level and are not due to the activation of picture name segments at the level of phonological retrieval.postprin
    • …
    corecore