284,575 research outputs found

    Tracking Objects as Points

    Full text link
    Tracking has traditionally been the art of following interest points through space and time. This changed with the rise of powerful deep networks. Nowadays, tracking is dominated by pipelines that perform object detection followed by temporal association, also known as tracking-by-detection. In this paper, we present a simultaneous detection and tracking algorithm that is simpler, faster, and more accurate than the state of the art. Our tracker, CenterTrack, applies a detection model to a pair of images and detections from the prior frame. Given this minimal input, CenterTrack localizes objects and predicts their associations with the previous frame. That's it. CenterTrack is simple, online (no peeking into the future), and real-time. It achieves 67.3% MOTA on the MOT17 challenge at 22 FPS and 89.4% MOTA on the KITTI tracking benchmark at 15 FPS, setting a new state of the art on both datasets. CenterTrack is easily extended to monocular 3D tracking by regressing additional 3D attributes. Using monocular video input, it achieves 28.3% [email protected] on the newly released nuScenes 3D tracking benchmark, substantially outperforming the monocular baseline on this benchmark while running at 28 FPS.Comment: ECCV 2020 Camera-ready version. Updated track rebirth results. Code available at https://github.com/xingyizhou/CenterTrac

    CoMaL Tracking: Tracking Points at the Object Boundaries

    Full text link
    Traditional point tracking algorithms such as the KLT use local 2D information aggregation for feature detection and tracking, due to which their performance degrades at the object boundaries that separate multiple objects. Recently, CoMaL Features have been proposed that handle such a case. However, they proposed a simple tracking framework where the points are re-detected in each frame and matched. This is inefficient and may also lose many points that are not re-detected in the next frame. We propose a novel tracking algorithm to accurately and efficiently track CoMaL points. For this, the level line segment associated with the CoMaL points is matched to MSER segments in the next frame using shape-based matching and the matches are further filtered using texture-based matching. Experiments show improvements over a simple re-detect-and-match framework as well as KLT in terms of speed/accuracy on different real-world applications, especially at the object boundaries.Comment: 10 pages, 10 figures, to appear in 1st Joint BMTT-PETS Workshop on Tracking and Surveillance, CVPR 201

    Tracking Objects as Pixel-wise Distributions

    Full text link
    Multi-object tracking (MOT) requires detecting and associating objects through frames. Unlike tracking via detected bounding boxes or tracking objects as points, we propose tracking objects as pixel-wise distributions. We instantiate this idea on a transformer-based architecture, P3AFormer, with pixel-wise propagation, prediction, and association. P3AFormer propagates pixel-wise features guided by flow information to pass messages between frames. Furthermore, P3AFormer adopts a meta-architecture to produce multi-scale object feature maps. During inference, a pixel-wise association procedure is proposed to recover object connections through frames based on the pixel-wise prediction. P3AFormer yields 81.2\% in terms of MOTA on the MOT17 benchmark -- the first among all transformer networks to reach 80\% MOTA in literature. P3AFormer also outperforms state-of-the-arts on the MOT20 and KITTI benchmarks.Comment: Accepted in ECCV22 as an oral presentation paper. The code&project page is at https://github.com/dvlab-research/ECCV22-P3AFormer-Tracking-Objects-as-Pixel-wise-Distribution

    Lower bounds for Arrangement-based Range-Free Localization in Sensor Networks

    Full text link
    Colander are location aware entities that collaborate to determine approximate location of mobile or static objects when beacons from an object are received by all colanders that are within its distance RR. This model, referred to as arrangement-based localization, does not require distance estimation between entities, which has been shown to be highly erroneous in practice. Colander are applicable in localization in sensor networks and tracking of mobile objects. A set SR2S \subset {\mathbb R}^2 is an (R,ϵ)(R,\epsilon)-colander if by placing receivers at the points of SS, a wireless device with transmission radius RR can be localized to within a circle of radius ϵ\epsilon. We present tight upper and lower bounds on the size of (R,ϵ)(R,\epsilon)-colanders. We measure the expected size of colanders that will form (R,ϵ)(R, \epsilon)-colanders if they distributed uniformly over the plane

    Capturing Hands in Action using Discriminative Salient Points and Physics Simulation

    Full text link
    Hand motion capture is a popular research field, recently gaining more attention due to the ubiquity of RGB-D sensors. However, even most recent approaches focus on the case of a single isolated hand. In this work, we focus on hands that interact with other hands or objects and present a framework that successfully captures motion in such interaction scenarios for both rigid and articulated objects. Our framework combines a generative model with discriminatively trained salient points to achieve a low tracking error and with collision detection and physics simulation to achieve physically plausible estimates even in case of occlusions and missing visual data. Since all components are unified in a single objective function which is almost everywhere differentiable, it can be optimized with standard optimization techniques. Our approach works for monocular RGB-D sequences as well as setups with multiple synchronized RGB cameras. For a qualitative and quantitative evaluation, we captured 29 sequences with a large variety of interactions and up to 150 degrees of freedom.Comment: Accepted for publication by the International Journal of Computer Vision (IJCV) on 16.02.2016 (submitted on 17.10.14). A combination into a single framework of an ECCV'12 multicamera-RGB and a monocular-RGBD GCPR'14 hand tracking paper with several extensions, additional experiments and detail

    Detecting shadows and low-lying objects in indoor and outdoor scenes using homographies

    Get PDF
    Many computer vision applications apply background suppression techniques for the detection and segmentation of moving objects in a scene. While these algorithms tend to work well in controlled conditions they often fail when applied to unconstrained real-world environments. This paper describes a system that detects and removes erroneously segmented foreground regions that are close to a ground plane. These regions include shadows, changing background objects and other low-lying objects such as leaves and rubbish. The system uses a set-up of two or more cameras and requires no 3D reconstruction or depth analysis of the regions. Therefore, a strong camera calibration of the set-up is not necessary. A geometric constraint called a homography is exploited to determine if foreground points are on or above the ground plane. The system takes advantage of the fact that regions in images off the homography plane will not correspond after a homography transformation. Experimental results using real world scenes from a pedestrian tracking application illustrate the effectiveness of the proposed approach

    Satellite Articulation Sensing using Computer Vision

    Get PDF
    Autonomous on-orbit satellite servicing benefits from an inspector satellite that can gain as much information as possible about the primary satellite. This includes performance of articulated objects such as solar arrays, antennas, and sensors. A method for building an articulated model from monocular imagery using tracked feature points and the known relative inspection route is developed. Two methods are also developed for tracking the articulation of a satellite in real-time given an articulated model using both tracked feature points and image silhouettes. Performance is evaluated for multiple inspection routes and the effect of inspection route noise is assessed. Additionally, a satellite model is built and used to collect stop-motion images simulating articulated motion over an inspection route under simulated space illumination. The images are used in the silhouette articulation tracking method and successful tracking is demonstrated qualitatively. Finally, a human pose tracking algorithm is modified for tracking the satellite articulation demonstrating the applicability of human tracking methods to satellite articulation tracking methods when an articulated model is available

    Tracking by 3D Model Estimation of Unknown Objects in Videos

    Full text link
    Most model-free visual object tracking methods formulate the tracking task as object location estimation given by a 2D segmentation or a bounding box in each video frame. We argue that this representation is limited and instead propose to guide and improve 2D tracking with an explicit object representation, namely the textured 3D shape and 6DoF pose in each video frame. Our representation tackles a complex long-term dense correspondence problem between all 3D points on the object for all video frames, including frames where some points are invisible. To achieve that, the estimation is driven by re-rendering the input video frames as well as possible through differentiable rendering, which has not been used for tracking before. The proposed optimization minimizes a novel loss function to estimate the best 3D shape, texture, and 6DoF pose. We improve the state-of-the-art in 2D segmentation tracking on three different datasets with mostly rigid objects

    Comparison of Natural Feature Descriptors for Rigid-Object Tracking for Real-Time Augmented Reality

    Get PDF
    This paper presents a comparison of natural feature descrip- tors for rigid object tracking for augmented reality (AR) applica- tions. AR relies on object tracking in order to identify a physical object and to superimpose virtual object on an object. Natu- ral feature tracking (NFT) is one approach for computer vision- based object tracking. NFT utilizes interest points of a physcial object, represents them as descriptors, and matches the descrip- tors against reference descriptors in order to identify a phsical object to track. In this research, we investigate four different nat- ural feature descriptors (SIFT, SURF, FREAK, ORB) and their capability to track rigid objects. Rigid objects need robust de- scriptors since they need to describe the objects in a 3D space. AR applications are also real-time application, thus, fast feature matching is mandatory. FREAK and ORB are binary descriptors, which promise a higher performance in comparison to SIFT and SURF. We deployed a test in which we match feature descriptors to artificial rigid objects. The results indicate that the SIFT de- scriptor is the most promising solution in our addressed domain, AR-based assembly training

    Object detection and tracking aided SLAM in image sequences for dynamic environment.

    Get PDF
    Object detection in a dynamic environment is important for accurate tracking and mapping in Simultaneous Localization and Mapping (SLAM). Dynamic feature points from people or vehicles are the main cause of unreliable SLAM performance. Previous researchers have used varied techniques to solve this problem, such as semantic segmentation, optical flow, and moving consistency check algorithm. In this proposal, Object Detection and Tracking SLAM (ODTS), we define a weighted grid-based attention model for a feature tracking module to track landmarks and objects. ODTS system tracks landmarks, such as buildings in the background, and objects, such as vehicles, in the foreground. For optimizing performance, a robust self-attention module is integrated. For evaluation, the trajectory of the robot is tracked, and the root mean square error (RMSE) is recorded. Additionally, the number of background and foreground feature points were observed for landmarks and objects. ODTS significantly minimizes the tracking lost problem and produces more accurate maps and tracking of feature points