97,236 research outputs found
Occlusion resistant learning of intuitive physics from videos
To reach human performance on complex tasks, a key ability for artificial
systems is to understand physical interactions between objects, and predict
future outcomes of a situation. This ability, often referred to as intuitive
physics, has recently received attention and several methods were proposed to
learn these physical rules from video sequences. Yet, most of these methods are
restricted to the case where no, or only limited, occlusions occur. In this
work we propose a probabilistic formulation of learning intuitive physics in 3D
scenes with significant inter-object occlusions. In our formulation, object
positions are modeled as latent variables enabling the reconstruction of the
scene. We then propose a series of approximations that make this problem
tractable. Object proposals are linked across frames using a combination of a
recurrent interaction network, modeling the physics in object space, and a
compositional renderer, modeling the way in which objects project onto pixel
space. We demonstrate significant improvements over state-of-the-art in the
intuitive physics benchmark of IntPhys. We apply our method to a second dataset
with increasing levels of occlusions, showing it realistically predicts
segmentation masks up to 30 frames in the future. Finally, we also show results
on predicting motion of objects in real videos
Joint Detection and Tracking in Videos with Identification Features
Recent works have shown that combining object detection and tracking tasks,
in the case of video data, results in higher performance for both tasks, but
they require a high frame-rate as a strict requirement for performance. This is
assumption is often violated in real-world applications, when models run on
embedded devices, often at only a few frames per second.
Videos at low frame-rate suffer from large object displacements. Here
re-identification features may support to match large-displaced object
detections, but current joint detection and re-identification formulations
degrade the detector performance, as these two are contrasting tasks. In the
real-world application having separate detector and re-id models is often not
feasible, as both the memory and runtime effectively double.
Towards robust long-term tracking applicable to reduced-computational-power
devices, we propose the first joint optimization of detection, tracking and
re-identification features for videos. Notably, our joint optimization
maintains the detector performance, a typical multi-task challenge. At
inference time, we leverage detections for tracking (tracking-by-detection)
when the objects are visible, detectable and slowly moving in the image. We
leverage instead re-identification features to match objects which disappeared
(e.g. due to occlusion) for several frames or were not tracked due to fast
motion (or low-frame-rate videos). Our proposed method reaches the
state-of-the-art on MOT, it ranks 1st in the UA-DETRAC'18 tracking challenge
among online trackers, and 3rd overall.Comment: Accepted at Image and Vision Computing Journa
Trajectory recognition as the basis for object individuation: A functional model of object file instantiation and object token encoding
The perception of persisting visual objects is mediated by transient intermediate representations, object files, that are instantiated in response to some, but not all, visual trajectories. The standard object file concept does not, however, provide a mechanism sufficient to account for all experimental data on visual object persistence, object tracking, and the ability to perceive spatially-disconnected stimuli as coherent objects. Based on relevant anatomical, functional, and developmental data, a functional model is developed that bases object individuation on the specific recognition of visual trajectories. This model is shown to account for a wide range of data, and to generate a variety of testable predictions. Individual variations of the model parameters are expected to generate distinct trajectory and object recognition abilities. Over-encoding of trajectory information in stored object tokens in early infancy, in particular, is expected to disrupt the ability to re-identify individuals across perceptual episodes, and lead to developmental outcomes with characteristics of autism spectrum disorders
- …