985 research outputs found

    Tracking Table Tennis Balls in Real Match Scenes for Umpiring Applications

    Get PDF
    Judging the legitimacy of table tennis services presents many challenges where technology can be judiciously applied to enhance decision-making. This paper presents a purpose-built system to automatically detect and track the ball during table-tennis services to enable precise judgment over their legitimacy in real-time. The system comprises a suite of algorithms which adaptively exploit spatial and temporal information from real match video sequences, which are generally characterised by high object motion, allied with object blurring and occlusion. Experimental results on a diverse set of table-tennis test sequences corroborate the system performance in facilitating consistently accurate and efficient decision-making over the validity of a service

    Tracking by Animation: Unsupervised Learning of Multi-Object Attentive Trackers

    Get PDF
    Online Multi-Object Tracking (MOT) from videos is a challenging computer vision task which has been extensively studied for decades. Most of the existing MOT algorithms are based on the Tracking-by-Detection (TBD) paradigm combined with popular machine learning approaches which largely reduce the human effort to tune algorithm parameters. However, the commonly used supervised learning approaches require the labeled data (e.g., bounding boxes), which is expensive for videos. Also, the TBD framework is usually suboptimal since it is not end-to-end, i.e., it considers the task as detection and tracking, but not jointly. To achieve both label-free and end-to-end learning of MOT, we propose a Tracking-by-Animation framework, where a differentiable neural model first tracks objects from input frames and then animates these objects into reconstructed frames. Learning is then driven by the reconstruction error through backpropagation. We further propose a Reprioritized Attentive Tracking to improve the robustness of data association. Experiments conducted on both synthetic and real video datasets show the potential of the proposed model. Our project page is publicly available at: https://github.com/zhen-he/tracking-by-animationComment: CVPR 201

    End-to-end Learning of Multi-sensor 3D Tracking by Detection

    Full text link
    In this paper we propose a novel approach to tracking by detection that can exploit both cameras as well as LIDAR data to produce very accurate 3D trajectories. Towards this goal, we formulate the problem as a linear program that can be solved exactly, and learn convolutional networks for detection as well as matching in an end-to-end manner. We evaluate our model in the challenging KITTI dataset and show very competitive results.Comment: Presented at IEEE International Conference on Robotics and Automation (ICRA), 201

    Object Tracking

    Get PDF
    Object tracking consists in estimation of trajectory of moving objects in the sequence of images. Automation of the computer object tracking is a difficult task. Dynamics of multiple parameters changes representing features and motion of the objects, and temporary partial or full occlusion of the tracked objects have to be considered. This monograph presents the development of object tracking algorithms, methods and systems. Both, state of the art of object tracking methods and also the new trends in research are described in this book. Fourteen chapters are split into two sections. Section 1 presents new theoretical ideas whereas Section 2 presents real-life applications. Despite the variety of topics contained in this monograph it constitutes a consisted knowledge in the field of computer object tracking. The intention of editor was to follow up the very quick progress in the developing of methods as well as extension of the application

    VISUAL TRACKING AND ILLUMINATION RECOVERY VIA SPARSE REPRESENTATION

    Get PDF
    Compressive sensing, or sparse representation, has played a fundamental role in many fields of science. It shows that the signals and images can be reconstructed from far fewer measurements than what is usually considered to be necessary. Sparsity leads to efficient estimation, efficient compression, dimensionality reduction, and efficient modeling. Recently, there has been a growing interest in compressive sensing in computer vision and it has been successfully applied to face recognition, background subtraction, object tracking and other problems. Sparsity can be achieved by solving the compressive sensing problem using L1 minimization. In this dissertation, we present the results of a study of applying sparse representation to illumination recovery, object tracking, and simultaneous tracking and recognition. Illumination recovery, also known as inverse lighting, is the problem of recovering an illumination distribution in a scene from the appearance of objects located in the scene. It is used for Augmented Reality, where the virtual objects match the existing image and cast convincing shadows on the real scene rendered with the recovered illumination. Shadows in a scene are caused by the occlusion of incoming light, and thus contain information about the lighting of the scene. Although shadows have been used in determining the 3D shape of the object that casts shadows onto the scene, few studies have focused on the illumination information provided by the shadows. In this dissertation, we recover the illumination of a scene from a single image with cast shadows given the geometry of the scene. The images with cast shadows can be quite complex and therefore cannot be well approximated by low-dimensional linear subspaces. However, in this study we show that the set of images produced by a Lambertian scene with cast shadows can be efficiently represented by a sparse set of images generated by directional light sources. We first model an image with cast shadows as composed of a diffusive part (without cast shadows) and a residual part that captures cast shadows. Then, we express the problem in an L1-regularized least squares formulation, with nonnegativity constraints (as light has to be nonnegative at any point in space). This sparse representation enjoys an effective and fast solution, thanks to recent advances in compressive sensing. In experiments on both synthetic and real data, our approach performs favorably in comparison to several previously proposed methods. Visual tracking, which consistently infers the motion of a desired target in a video sequence, has been an active and fruitful research topic in computer vision for decades. It has many practical applications such as surveillance, human computer interaction, medical imaging and so on. Many challenges to design a robust tracking algorithm come from the enormous unpredictable variations in the target, such as deformations, fast motion, occlusions, background clutter, and lighting changes. To tackle the challenges posed by tracking, we propose a robust visual tracking method by casting tracking as a sparse approximation problem in a particle filter framework. In this framework, occlusion, noise and other challenging issues are addressed seamlessly through a set of trivial templates. Specifically, to find the tracking target at a new frame, each target candidate is sparsely represented in the space spanned by target templates and trivial templates. The sparsity is achieved by solving an L1-regularized least squares problem. Then the candidate with the smallest projection error is taken as the tracking target. After that, tracking is continued using a Bayesian state inference framework in which a particle filter is used for propagating sample distributions over time. Three additional components further improve the robustness of our approach: 1) a velocity incorporated motion model that helps concentrate the samples on the true target location in the next frame, 2) the nonnegativity constraints that help filter out clutter that is similar to tracked targets in reversed intensity patterns, and 3) a dynamic template update scheme that keeps track of the most representative templates throughout the tracking procedure. We test the proposed approach on many challenging sequences involving heavy occlusions, drastic illumination changes, large scale changes, non-rigid object movement, out-of-plane rotation, and large pose variations. The proposed approach shows excellent performance in comparison with four previously proposed trackers. We also extend the work to simultaneous tracking and recognition in vehicle classification in IR video sequences. We attempt to resolve the uncertainties in tracking and recognition at the same time by introducing a static template set that stores target images in various conditions such as different poses, lighting, and so on. The recognition results at each frame are propagated to produce the final result for the whole video. The tracking result is evaluated at each frame and low confidence in tracking performance initiates a new cycle of tracking and classification. We demonstrate the robustness of the proposed method on vehicle tracking and classification using outdoor IR video sequences

    Real-time Aerial Vehicle Detection and Tracking using a Multi-modal Optical Sensor

    Get PDF
    Vehicle tracking from an aerial platform poses a number of unique challenges including the small number of pixels representing a vehicle, large camera motion, and parallax error. For these reasons, it is accepted to be a more challenging task than traditional object tracking and it is generally tackled through a number of different sensor modalities. Recently, the Wide Area Motion Imagery sensor platform has received reasonable attention as it can provide higher resolution single band imagery in addition to its large area coverage. However, still, richer sensory information is required to persistently track vehicles or more research on the application of WAMI for tracking is required. With the advancements in sensor technology, hyperspectral data acquisition at video frame rates become possible as it can be cruical in identifying objects even in low resolution scenes. For this reason, in this thesis, a multi-modal optical sensor concept is considered to improve tracking in adverse scenes. The Rochester Institute of Technology Multi-object Spectrometer is capable of collecting limited hyperspectral data at desired locations in addition to full-frame single band imagery. By acquiring hyperspectral data quickly, tracking can be achieved at reasonableframe rates which turns out to be crucial in tracking. On the other hand, the relatively high cost of hyperspectral data acquisition and transmission need to be taken into account to design a realistic tracking. By inserting extended data of the pixels of interest we can address or avoid the unique challenges posed by aerial tracking. In this direction, we integrate limited hyperspectral data to improve measurement-to-track association. Also, a hyperspectral data based target detection method is presented to avoid the parallax effect and reduce the clutter density. Finally, the proposed system is evaluated on realistic, synthetic scenarios generated by the Digital Image and Remote Sensing software

    Robust and Efficient Inference of Scene and Object Motion in Multi-Camera Systems

    Get PDF
    Multi-camera systems have the ability to overcome some of the fundamental limitations of single camera based systems. Having multiple view points of a scene goes a long way in limiting the influence of field of view, occlusion, blur and poor resolution of an individual camera. This dissertation addresses robust and efficient inference of object motion and scene in multi-camera and multi-sensor systems. The first part of the dissertation discusses the role of constraints introduced by projective imaging towards robust inference of multi-camera/sensor based object motion. We discuss the role of the homography and epipolar constraints for fusing object motion perceived by individual cameras. For planar scenes, the homography constraints provide a natural mechanism for data association. For scenes that are not planar, the epipolar constraint provides a weaker multi-view relationship. We use the epipolar constraint for tracking in multi-camera and multi-sensor networks. In particular, we show that the epipolar constraint reduces the dimensionality of the state space of the problem by introducing a ``shared'' state space for the joint tracking problem. This allows for robust tracking even when one of the sensors fail due to poor SNR or occlusion. The second part of the dissertation deals with challenges in the computational aspects of tracking algorithms that are common to such systems. Much of the inference in the multi-camera and multi-sensor networks deal with complex non-linear models corrupted with non-Gaussian noise. Particle filters provide approximate Bayesian inference in such settings. We analyze the computational drawbacks of traditional particle filtering algorithms, and present a method for implementing the particle filter using the Independent Metropolis Hastings sampler, that is highly amenable to pipelined implementations and parallelization. We analyze the implementations of the proposed algorithm, and in particular concentrate on implementations that have minimum processing times. The last part of the dissertation deals with the efficient sensing paradigm of compressing sensing (CS) applied to signals in imaging, such as natural images and reflectance fields. We propose a hybrid signal model on the assumption that most real-world signals exhibit subspace compressibility as well as sparse representations. We show that several real-world visual signals such as images, reflectance fields, videos etc., are better approximated by this hybrid of two models. We derive optimal hybrid linear projections of the signal and show that theoretical guarantees and algorithms designed for CS can be easily extended to hybrid subspace-compressive sensing. Such methods reduce the amount of information sensed by a camera, and help in reducing the so called data deluge problem in large multi-camera systems

    Configurable Input Devices for 3D Interaction using Optical Tracking

    Get PDF
    Three-dimensional interaction with virtual objects is one of the aspects that needs to be addressed in order to increase the usability and usefulness of virtual reality. Human beings have difficulties understanding 3D spatial relationships and manipulating 3D user interfaces, which require the control of multiple degrees of freedom simultaneously. Conventional interaction paradigms known from the desktop computer, such as the use of interaction devices as the mouse and keyboard, may be insufficient or even inappropriate for 3D spatial interaction tasks. The aim of the research in this thesis is to develop the technology required to improve 3D user interaction. This can be accomplished by allowing interaction devices to be constructed such that their use is apparent from their structure, and by enabling efficient development of new input devices for 3D interaction. The driving vision in this thesis is that for effective and natural direct 3D interaction the structure of an interaction device should be specifically tuned to the interaction task. Two aspects play an important role in this vision. First, interaction devices should be structured such that interaction techniques are as direct and transparent as possible. Interaction techniques define the mapping between interaction task parameters and the degrees of freedom of interaction devices. Second, the underlying technology should enable developers to rapidly construct and evaluate new interaction devices. The thesis is organized as follows. In Chapter 2, a review of the optical tracking field is given. The tracking pipeline is discussed, existing methods are reviewed, and improvement opportunities are identified. In Chapters 3 and 4 the focus is on the development of optical tracking techniques of rigid objects. The goal of the tracking method presented in Chapter 3 is to reduce the occlusion problem. The method exploits projection invariant properties of line pencil markers, and the fact that line features only need to be partially visible. In Chapter 4, the aim is to develop a tracking system that supports devices of arbitrary shapes, and allows for rapid development of new interaction devices. The method is based on subgraph isomorphism to identify point clouds. To support the development of new devices in the virtual environment an automatic model estimation method is used. Chapter 5 provides an analysis of three optical tracking systems based on different principles. The first system is based on an optimization procedure that matches the 3D device model points to the 2D data points that are detected in the camera images. The other systems are the tracking methods as discussed in Chapters 3 and 4. In Chapter 6 an analysis of various filtering and prediction methods is given. These techniques can be used to make the tracking system more robust against noise, and to reduce the latency problem. Chapter 7 focusses on optical tracking of composite input devices, i.e., input devices 197 198 Summary that consist of multiple rigid parts that can have combinations of rotational and translational degrees of freedom with respect to each other. Techniques are developed to automatically generate a 3D model of a segmented input device from motion data, and to use this model to track the device. In Chapter 8, the presented techniques are combined to create a configurable input device, which supports direct and natural co-located interaction. In this chapter, the goal of the thesis is realized. The device can be configured such that its structure reflects the parameters of the interaction task. In Chapter 9, the configurable interaction device is used to study the influence of spatial device structure with respect to the interaction task at hand. The driving vision of this thesis, that the spatial structure of an interaction device should match that of the task, is analyzed and evaluated by performing a user study. The concepts and techniques developed in this thesis allow researchers to rapidly construct and apply new interaction devices for 3D interaction in virtual environments. Devices can be constructed such that their spatial structure reflects the 3D parameters of the interaction task at hand. The interaction technique then becomes a transparent one-to-one mapping that directly mediates the functions of the device to the task. The developed configurable interaction devices can be used to construct intuitive spatial interfaces, and allow researchers to rapidly evaluate new device configurations and to efficiently perform studies on the relation between the spatial structure of devices and the interaction task
    • …
    corecore