15,863 research outputs found
Detect to Track and Track to Detect
Recent approaches for high accuracy detection and tracking of object
categories in video consist of complex multistage solutions that become more
cumbersome each year. In this paper we propose a ConvNet architecture that
jointly performs detection and tracking, solving the task in a simple and
effective way. Our contributions are threefold: (i) we set up a ConvNet
architecture for simultaneous detection and tracking, using a multi-task
objective for frame-based object detection and across-frame track regression;
(ii) we introduce correlation features that represent object co-occurrences
across time to aid the ConvNet during tracking; and (iii) we link the frame
level detections based on our across-frame tracklets to produce high accuracy
detections at the video level. Our ConvNet architecture for spatiotemporal
object detection is evaluated on the large-scale ImageNet VID dataset where it
achieves state-of-the-art results. Our approach provides better single model
performance than the winning method of the last ImageNet challenge while being
conceptually much simpler. Finally, we show that by increasing the temporal
stride we can dramatically increase the tracker speed.Comment: ICCV 2017. Code and models:
https://github.com/feichtenhofer/Detect-Track Results:
https://www.robots.ox.ac.uk/~vgg/research/detect-track
Real time motion estimation using a neural architecture implemented on GPUs
This work describes a neural network based architecture that represents and estimates object motion in videos. This architecture addresses multiple computer vision tasks such as image segmentation, object representation or characterization, motion analysis and tracking. The use of a neural network architecture allows for the simultaneous estimation of global and local motion and the representation of deformable objects. This architecture also avoids the problem of finding corresponding features while tracking moving objects. Due to the parallel nature of neural networks, the architecture has been implemented on GPUs that allows the system to meet a set of requirements such as: time constraints management, robustness, high processing speed and re-configurability. Experiments are presented that demonstrate the validity of our architecture to solve problems of mobile agents tracking and motion analysis.This work was partially funded by the Spanish Government DPI2013-40534-R grant and Valencian Government GV/2013/005 grant
Real time motion estimation using a neural architecture implemented on GPUs
This work describes a neural network based architecture that represents and estimates object motion in videos. This architecture addresses multiple computer vision tasks such as image segmentation, object representation or characterization, motion analysis and tracking. The use of a neural network architecture allows for the simultaneous estimation of global and local motion and the representation of deformable objects. This architecture also avoids the problem of finding corresponding features while tracking moving objects. Due to the parallel nature of neural networks, the architecture has been implemented on GPUs that allows the system to meet a set of requirements such as: time constraints management, robustness, high processing speed and re-configurability. Experiments are presented that demonstrate the validity of our architecture to solve problems of mobile agents tracking and motion analysis
Learning Intelligent Dialogs for Bounding Box Annotation
We introduce Intelligent Annotation Dialogs for bounding box annotation. We
train an agent to automatically choose a sequence of actions for a human
annotator to produce a bounding box in a minimal amount of time. Specifically,
we consider two actions: box verification, where the annotator verifies a box
generated by an object detector, and manual box drawing. We explore two kinds
of agents, one based on predicting the probability that a box will be
positively verified, and the other based on reinforcement learning. We
demonstrate that (1) our agents are able to learn efficient annotation
strategies in several scenarios, automatically adapting to the image
difficulty, the desired quality of the boxes, and the detector strength; (2) in
all scenarios the resulting annotation dialogs speed up annotation compared to
manual box drawing alone and box verification alone, while also outperforming
any fixed combination of verification and drawing in most scenarios; (3) in a
realistic scenario where the detector is iteratively re-trained, our agents
evolve a series of strategies that reflect the shifting trade-off between
verification and drawing as the detector grows stronger.Comment: This paper appeared at CVPR 201
- …