284,575 research outputs found
Tracking Objects as Points
Tracking has traditionally been the art of following interest points through
space and time. This changed with the rise of powerful deep networks. Nowadays,
tracking is dominated by pipelines that perform object detection followed by
temporal association, also known as tracking-by-detection. In this paper, we
present a simultaneous detection and tracking algorithm that is simpler,
faster, and more accurate than the state of the art. Our tracker, CenterTrack,
applies a detection model to a pair of images and detections from the prior
frame. Given this minimal input, CenterTrack localizes objects and predicts
their associations with the previous frame. That's it. CenterTrack is simple,
online (no peeking into the future), and real-time. It achieves 67.3% MOTA on
the MOT17 challenge at 22 FPS and 89.4% MOTA on the KITTI tracking benchmark at
15 FPS, setting a new state of the art on both datasets. CenterTrack is easily
extended to monocular 3D tracking by regressing additional 3D attributes. Using
monocular video input, it achieves 28.3% [email protected] on the newly released
nuScenes 3D tracking benchmark, substantially outperforming the monocular
baseline on this benchmark while running at 28 FPS.Comment: ECCV 2020 Camera-ready version. Updated track rebirth results. Code
available at https://github.com/xingyizhou/CenterTrac
CoMaL Tracking: Tracking Points at the Object Boundaries
Traditional point tracking algorithms such as the KLT use local 2D
information aggregation for feature detection and tracking, due to which their
performance degrades at the object boundaries that separate multiple objects.
Recently, CoMaL Features have been proposed that handle such a case. However,
they proposed a simple tracking framework where the points are re-detected in
each frame and matched. This is inefficient and may also lose many points that
are not re-detected in the next frame. We propose a novel tracking algorithm to
accurately and efficiently track CoMaL points. For this, the level line segment
associated with the CoMaL points is matched to MSER segments in the next frame
using shape-based matching and the matches are further filtered using
texture-based matching. Experiments show improvements over a simple
re-detect-and-match framework as well as KLT in terms of speed/accuracy on
different real-world applications, especially at the object boundaries.Comment: 10 pages, 10 figures, to appear in 1st Joint BMTT-PETS Workshop on
Tracking and Surveillance, CVPR 201
Tracking Objects as Pixel-wise Distributions
Multi-object tracking (MOT) requires detecting and associating objects
through frames. Unlike tracking via detected bounding boxes or tracking objects
as points, we propose tracking objects as pixel-wise distributions. We
instantiate this idea on a transformer-based architecture, P3AFormer, with
pixel-wise propagation, prediction, and association. P3AFormer propagates
pixel-wise features guided by flow information to pass messages between frames.
Furthermore, P3AFormer adopts a meta-architecture to produce multi-scale object
feature maps. During inference, a pixel-wise association procedure is proposed
to recover object connections through frames based on the pixel-wise
prediction. P3AFormer yields 81.2\% in terms of MOTA on the MOT17 benchmark --
the first among all transformer networks to reach 80\% MOTA in literature.
P3AFormer also outperforms state-of-the-arts on the MOT20 and KITTI benchmarks.Comment: Accepted in ECCV22 as an oral presentation paper. The code&project
page is at
https://github.com/dvlab-research/ECCV22-P3AFormer-Tracking-Objects-as-Pixel-wise-Distribution
Lower bounds for Arrangement-based Range-Free Localization in Sensor Networks
Colander are location aware entities that collaborate to determine
approximate location of mobile or static objects when beacons from an object
are received by all colanders that are within its distance . This model,
referred to as arrangement-based localization, does not require distance
estimation between entities, which has been shown to be highly erroneous in
practice. Colander are applicable in localization in sensor networks and
tracking of mobile objects.
A set is an -colander if by placing
receivers at the points of , a wireless device with transmission radius
can be localized to within a circle of radius . We present tight
upper and lower bounds on the size of -colanders. We measure the
expected size of colanders that will form -colanders if they
distributed uniformly over the plane
Capturing Hands in Action using Discriminative Salient Points and Physics Simulation
Hand motion capture is a popular research field, recently gaining more
attention due to the ubiquity of RGB-D sensors. However, even most recent
approaches focus on the case of a single isolated hand. In this work, we focus
on hands that interact with other hands or objects and present a framework that
successfully captures motion in such interaction scenarios for both rigid and
articulated objects. Our framework combines a generative model with
discriminatively trained salient points to achieve a low tracking error and
with collision detection and physics simulation to achieve physically plausible
estimates even in case of occlusions and missing visual data. Since all
components are unified in a single objective function which is almost
everywhere differentiable, it can be optimized with standard optimization
techniques. Our approach works for monocular RGB-D sequences as well as setups
with multiple synchronized RGB cameras. For a qualitative and quantitative
evaluation, we captured 29 sequences with a large variety of interactions and
up to 150 degrees of freedom.Comment: Accepted for publication by the International Journal of Computer
Vision (IJCV) on 16.02.2016 (submitted on 17.10.14). A combination into a
single framework of an ECCV'12 multicamera-RGB and a monocular-RGBD GCPR'14
hand tracking paper with several extensions, additional experiments and
detail
Detecting shadows and low-lying objects in indoor and outdoor scenes using homographies
Many computer vision applications apply background suppression techniques for the detection and segmentation of moving objects in a scene. While these algorithms tend to work well in controlled conditions they often fail when applied to unconstrained real-world environments. This paper describes a system that detects and removes erroneously segmented foreground regions that are close to a ground plane. These regions include shadows, changing background objects and other low-lying objects such as leaves and rubbish. The system uses a set-up of two or more cameras and requires no 3D reconstruction or depth analysis of the regions. Therefore, a strong camera calibration of the set-up is not necessary. A geometric constraint called a homography is exploited to determine if foreground points are on or above the ground plane. The system takes advantage of the fact that regions in images off the homography plane will not correspond after a homography transformation. Experimental results using real world scenes from a pedestrian tracking application illustrate the effectiveness of the proposed approach
Satellite Articulation Sensing using Computer Vision
Autonomous on-orbit satellite servicing benefits from an inspector satellite that can gain as much information as possible about the primary satellite. This includes performance of articulated objects such as solar arrays, antennas, and sensors. A method for building an articulated model from monocular imagery using tracked feature points and the known relative inspection route is developed. Two methods are also developed for tracking the articulation of a satellite in real-time given an articulated model using both tracked feature points and image silhouettes. Performance is evaluated for multiple inspection routes and the effect of inspection route noise is assessed. Additionally, a satellite model is built and used to collect stop-motion images simulating articulated motion over an inspection route under simulated space illumination. The images are used in the silhouette articulation tracking method and successful tracking is demonstrated qualitatively. Finally, a human pose tracking algorithm is modified for tracking the satellite articulation demonstrating the applicability of human tracking methods to satellite articulation tracking methods when an articulated model is available
Tracking by 3D Model Estimation of Unknown Objects in Videos
Most model-free visual object tracking methods formulate the tracking task as
object location estimation given by a 2D segmentation or a bounding box in each
video frame. We argue that this representation is limited and instead propose
to guide and improve 2D tracking with an explicit object representation, namely
the textured 3D shape and 6DoF pose in each video frame. Our representation
tackles a complex long-term dense correspondence problem between all 3D points
on the object for all video frames, including frames where some points are
invisible. To achieve that, the estimation is driven by re-rendering the input
video frames as well as possible through differentiable rendering, which has
not been used for tracking before. The proposed optimization minimizes a novel
loss function to estimate the best 3D shape, texture, and 6DoF pose. We improve
the state-of-the-art in 2D segmentation tracking on three different datasets
with mostly rigid objects
Comparison of Natural Feature Descriptors for Rigid-Object Tracking for Real-Time Augmented Reality
This paper presents a comparison of natural feature descrip- tors for rigid object tracking for augmented reality (AR) applica- tions. AR relies on object tracking in order to identify a physical object and to superimpose virtual object on an object. Natu- ral feature tracking (NFT) is one approach for computer vision- based object tracking. NFT utilizes interest points of a physcial object, represents them as descriptors, and matches the descrip- tors against reference descriptors in order to identify a phsical object to track. In this research, we investigate four different nat- ural feature descriptors (SIFT, SURF, FREAK, ORB) and their capability to track rigid objects. Rigid objects need robust de- scriptors since they need to describe the objects in a 3D space. AR applications are also real-time application, thus, fast feature matching is mandatory. FREAK and ORB are binary descriptors, which promise a higher performance in comparison to SIFT and SURF. We deployed a test in which we match feature descriptors to artificial rigid objects. The results indicate that the SIFT de- scriptor is the most promising solution in our addressed domain, AR-based assembly training
Object detection and tracking aided SLAM in image sequences for dynamic environment.
Object detection in a dynamic environment is important for accurate tracking and mapping in Simultaneous Localization and Mapping (SLAM). Dynamic feature points from people or vehicles are the main cause of unreliable SLAM performance. Previous researchers have used varied techniques to solve this problem, such as semantic segmentation, optical flow, and moving consistency check algorithm. In this proposal, Object Detection and Tracking SLAM (ODTS), we define a weighted grid-based attention model for a feature tracking module to track landmarks and objects. ODTS system tracks landmarks, such as buildings in the background, and objects, such as vehicles, in the foreground. For optimizing performance, a robust self-attention module is integrated. For evaluation, the trajectory of the robot is tracked, and the root mean square error (RMSE) is recorded. Additionally, the number of background and foreground feature points were observed for landmarks and objects. ODTS significantly minimizes the tracking lost problem and produces more accurate maps and tracking of feature points
- …