8 research outputs found

    Learning Online Smooth Predictors for Realtime Camera Planning using Recurrent Decision Trees

    Get PDF
    We study the problem of online prediction for realtime camera planning, where the goal is to predict smooth trajectories that correctly track and frame objects of interest (e.g., players in a basketball game). The conventional approach for training predictors does not directly consider temporal consistency, and often produces undesirable jitter. Although post-hoc smoothing (e.g., via a Kalman filter) can mitigate this issue to some degree, it is not ideal due to overly stringent modeling assumptions (e.g., Gaussian noise). We propose a recurrent decision tree framework that can directly incorporate temporal consistency into a data-driven predictor, as well as a learning algorithm that can efficiently learn such temporally smooth models. Our approach does not require any post-processing, making online smooth predictions much easier to generate when the noise model is unknown. We apply our approach to sports broadcasting: given noisy player detections, we learn where the camera should look based on human demonstrations. Our experiments exhibit significant improvements over conventional baselines and showcase the practicality of our approach

    Learning Online Smooth Predictors for Realtime Camera Planning using Recurrent Decision Trees

    Get PDF
    We study the problem of online prediction for realtime camera planning, where the goal is to predict smooth trajectories that correctly track and frame objects of interest (e.g., players in a basketball game). The conventional approach for training predictors does not directly consider temporal consistency, and often produces undesirable jitter. Although post-hoc smoothing (e.g., via a Kalman filter) can mitigate this issue to some degree, it is not ideal due to overly stringent modeling assumptions (e.g., Gaussian noise). We propose a recurrent decision tree framework that can directly incorporate temporal consistency into a data-driven predictor, as well as a learning algorithm that can efficiently learn such temporally smooth models. Our approach does not require any post-processing, making online smooth predictions much easier to generate when the noise model is unknown. We apply our approach to sports broadcasting: given noisy player detections, we learn where the camera should look based on human demonstrations. Our experiments exhibit significant improvements over conventional baselines and showcase the practicality of our approach

    Take your Eyes off the Ball: Improving Ball-Tracking by Focusing on Team Play

    Get PDF
    Accurate video-based ball tracking in team sports is important for automated game analysis, and has proven very difficult because the ball is often occluded by the players. In this paper, we propose a novel approach to addressing this issue by formulating the tracking in terms of deciding which player, if any, is in possession of the ball at any given time. This is very different from standard approaches that first attempt to track the ball and only then to assign possession. We will show that our method substantially increases performance when applied to long basketball and soccer sequences

    Detecting Regions of Interest in Dynamic Scenes with Camera Motions

    Get PDF
    ©2012 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other users, including reprinting/ republishing this material for advertising or promotional purposes, creating new collective works for resale or redistribution to servers or lists, or reuse of any copyrighted components of this work in other works.Presented at the 2012 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 16-21 June 2012, Providence, RI.DOI: 10.1109/CVPR.2012.6247809We present a method to detect the regions of interests in moving camera views of dynamic scenes with multiple moving objects. We start by extracting a global motion tendency that reflects the scene context by tracking movements of objects in the scene. We then use Gaussian process regression to represent the extracted motion tendency as a stochastic vector field. The generated stochastic field is robust to noise and can handle a video from an uncalibrated moving camera. We use the stochastic field for predicting important future regions of interest as the scene evolves dynamically. We evaluate our approach on a variety of videos of team sports and compare the detected regions of interest to the camera motion generated by actual camera operators. Our experimental results demonstrate that our approach is computationally efficient and provides better predictions than previously proposed RBF-based approaches

    Training Algorithms for Multiple Object Tracking

    Get PDF
    Multiple object tracking is a crucial Computer Vision Task. It aims at locating objects of interest in the image sequences, maintaining their identities, and identifying their trajectories over time. A large portion of current research focuses on tracking pedestrians, and other types of objects, that often exhibit predictable behaviours, that allow us, as humans, to track those objects. Nevertheless, most existing approaches rely solely on simple affinity or appearance cues to maintain the identities of the tracked objects, ignoring their behaviour. This presents a challenge when objects of interest are invisible or indistinguishable for a long period of time. In this thesis, we focus on enhancing the quality of multiple object trackers by learning and exploiting the long ranging models of object behaviour. Such behaviours come in different forms, be it a physical model of the ball motion, model of interaction between the ball and the players in sports or motion patterns of pedestrians or cars, that is specific to a particular scene. In the first part of the thesis, we begin with the task of tracking the ball and the players in team sports. We propose a model that tracks both types of objects simultaneously, while respecting the physical laws of ball motion when in free fall, and interaction constraints that appear when players are in the possession of the ball. We show that both the presence of the behaviour models and the simultaneous solution of both tasks aids the performance of tracking, in basketball, volleyball, and soccer. In the second part of the thesis, we focus on motion models of pedestrian and car behaviour that emerge in the outdoor scenes. Such motion models are inherently global, as they determine where people starting from one location tend to end up much later in time. Imposing such global constraints while keeping the tracking problem tractable presents a challenge, which is why many approaches rely on local affinity measures. We formulate a problem of simultaneously tracking the objects and learning their behaviour patterns. We show that our approach, when applied in conjunction with a number of state-of-the-art trackers, improves their performance, by forcing their output to follow the learned motion patterns of the scene. In the last part of the thesis, we study a new emerging class of models for multiple object tracking, that appeared recently due to availability of large scale datasets - sequence models for multiple object tracking. While such models could potentially learn arbitrarily long ranging behaviours, training them presents several challenges. We propose a training scheme and a loss function that allows to significantly improve the quality of training of such models. We demonstrate that simply using our training scheme and loss allows to learn scoring function for trajectories, which enables us to outperform state-of-the-art methods on several tracking benchmarks

    Tracking Interacting Objects in Image Sequences

    Get PDF
    Object tracking in image sequences is a key challenge in computer vision. Its goal is to follow objects that move or evolve over time while preserving the identity of each object. However, most existing approaches focus on one class of objects and model only very simple interactions, such as the fact that different objects do not occupy the same spatial location at a given time instance. They ignore that objects may interact in more complex ways. For example, in a parking lot, a person may get in a car and become invisible in the scene. In this thesis, we focus on tracking interacting objects in image sequences. We show that by exploiting the relationship between different objects, we can achieve more reliable tracking results. We explore a wide range of applications, such as tracking players and the ball in team sports, tracking cars and people in a parking lot and tracking dividing cells in biomedical imagery. We start by tracking the ball in team sports, which is a very challenging task because the ball is often occluded by the players. We propose a sequential approach that tracks the players first, and then tracks the ball by deciding which player, if any, is in possession of the ball at any given time. This is very different from standard approaches that first attempt to track the ball and only then to assign possession. We show that our method substantially increases performance when applied to long basketball and soccer sequences. We then focus on simultaneously tracking interacting objects. We achieve this by formulating the tracking problem as a network-flow Mixed Integer Program, and expressing the fact that one object can appear or disappear at locations of another in terms of linear flow constraints. We demonstrate our method on scenes involving cars and passengers, bags being carried and dropped by people, and balls being passed from one player to the next in team sports. In particular, we show that by estimating jointly and globally the trajectories of different types of objects, the presence of the ones which were not initially detected based solely on image evidence can be inferred from the detections of the others. We finally extend our approach to dividing cells in biomedical imagery. In this case, cells interact by overlapping with each other and giving birth to daughter cells. We propose a novel approach to automatically detecting and tracking cell populations in time-lapse images. Unlike earlier approaches that rely on linking a predetermined and potentially incomplete set of detections, we generate an overcomplete set of competing detection hypotheses. We then perform detection and tracking simultaneously by solving an integer program to find the optimal and consistent subset. This eliminates the need for heuristics to handle missed detections due to occlusions and complex morphology. We demonstrate the effectiveness of our approach on a range of challenging image sequences consisting of clumped cells and show that it outperforms the state-of-the-art techniques
    corecore