2,156 research outputs found
A Universal Update-pacing Framework For Visual Tracking
This paper proposes a novel framework to alleviate the model drift problem in
visual tracking, which is based on paced updates and trajectory selection.
Given a base tracker, an ensemble of trackers is generated, in which each
tracker's update behavior will be paced and then traces the target object
forward and backward to generate a pair of trajectories in an interval. Then,
we implicitly perform self-examination based on trajectory pair of each tracker
and select the most robust tracker. The proposed framework can effectively
leverage temporal context of sequential frames and avoid to learn corrupted
information. Extensive experiments on the standard benchmark suggest that the
proposed framework achieves superior performance against state-of-the-art
trackers.Comment: Submitted to ICIP 201
Temporal HeartNet: Towards Human-Level Automatic Analysis of Fetal Cardiac Screening Video
We present an automatic method to describe clinically useful information
about scanning, and to guide image interpretation in ultrasound (US) videos of
the fetal heart. Our method is able to jointly predict the visibility, viewing
plane, location and orientation of the fetal heart at the frame level. The
contributions of the paper are three-fold: (i) a convolutional neural network
architecture is developed for a multi-task prediction, which is computed by
sliding a 3x3 window spatially through convolutional maps. (ii) an anchor
mechanism and Intersection over Union (IoU) loss are applied for improving
localization accuracy. (iii) a recurrent architecture is designed to
recursively compute regional convolutional features temporally over sequential
frames, allowing each prediction to be conditioned on the whole video. This
results in a spatial-temporal model that precisely describes detailed heart
parameters in challenging US videos. We report results on a real-world clinical
dataset, where our method achieves performance on par with expert annotations.Comment: To appear in MICCAI, 201
Temporally Resolved Intensity Contouring (TRIC) for characterization of the absolute spatio-temporal intensity distribution of a relativistic, femtosecond laser pulse
Today's high-power laser systems are capable of reaching photon intensities
up to W/cm^2, generating plasmas when interacting with material. The
high intensity and ultrashort laser pulse duration (fs) make direct observation
of plasma dynamics a challenging task. In the field of laser-plasma physics and
especially for the acceleration of ions, the spatio-temporal intensity
distribution is one of the most critical aspects. We describe a novel method
based on a single-shot (i.e. single laser pulse) chirped probing scheme, taking
nine sequential frames at framerates up to THz. This technique, to which we
refer as temporally resolved intensity contouring (TRIC) enables single-shot
measurement of laser-plasma dynamics. Using TRIC, we demonstrate the
reconstruction of the complete spatio-temporal intensity distribution of a
high-power laser pulse in the focal plane at full pulse energy with sub
picosecond resolution.Comment: Daniel Haffa, Jianhui Bin and Martin Speicher are corresponding
author
Slow and steady feature analysis: higher order temporal coherence in video
How can unlabeled video augment visual learning? Existing methods perform
"slow" feature analysis, encouraging the representations of temporally close
frames to exhibit only small differences. While this standard approach captures
the fact that high-level visual signals change slowly over time, it fails to
capture *how* the visual content changes. We propose to generalize slow feature
analysis to "steady" feature analysis. The key idea is to impose a prior that
higher order derivatives in the learned feature space must be small. To this
end, we train a convolutional neural network with a regularizer on tuples of
sequential frames from unlabeled video. It encourages feature changes over time
to be smooth, i.e., similar to the most recent changes. Using five diverse
datasets, including unlabeled YouTube and KITTI videos, we demonstrate our
method's impact on object, scene, and action recognition tasks. We further show
that our features learned from unlabeled video can even surpass a standard
heavily supervised pretraining approach.Comment: in Computer Vision and Pattern Recognition (CVPR) 2016, Las Vegas,
NV, June 201
FCN-rLSTM: Deep Spatio-Temporal Neural Networks for Vehicle Counting in City Cameras
In this paper, we develop deep spatio-temporal neural networks to
sequentially count vehicles from low quality videos captured by city cameras
(citycams). Citycam videos have low resolution, low frame rate, high occlusion
and large perspective, making most existing methods lose their efficacy. To
overcome limitations of existing methods and incorporate the temporal
information of traffic video, we design a novel FCN-rLSTM network to jointly
estimate vehicle density and vehicle count by connecting fully convolutional
neural networks (FCN) with long short term memory networks (LSTM) in a residual
learning fashion. Such design leverages the strengths of FCN for pixel-level
prediction and the strengths of LSTM for learning complex temporal dynamics.
The residual learning connection reformulates the vehicle count regression as
learning residual functions with reference to the sum of densities in each
frame, which significantly accelerates the training of networks. To preserve
feature map resolution, we propose a Hyper-Atrous combination to integrate
atrous convolution in FCN and combine feature maps of different convolution
layers. FCN-rLSTM enables refined feature representation and a novel end-to-end
trainable mapping from pixels to vehicle count. We extensively evaluated the
proposed method on different counting tasks with three datasets, with
experimental results demonstrating their effectiveness and robustness. In
particular, FCN-rLSTM reduces the mean absolute error (MAE) from 5.31 to 4.21
on TRANCOS, and reduces the MAE from 2.74 to 1.53 on WebCamT. Training process
is accelerated by 5 times on average.Comment: Accepted by International Conference on Computer Vision (ICCV), 201
Below Horizon Aircraft Detection Using Deep Learning for Vision-Based Sense and Avoid
Commercial operation of unmanned aerial vehicles (UAVs) would benefit from an
onboard ability to sense and avoid (SAA) potential mid-air collision threats.
In this paper we present a new approach for detection of aircraft below the
horizon. We address some of the challenges faced by existing vision-based SAA
methods such as detecting stationary aircraft (that have no relative motion to
the background), rejecting moving ground vehicles, and simultaneous detection
of multiple aircraft. We propose a multi-stage, vision-based aircraft detection
system which utilises deep learning to produce candidate aircraft that we track
over time. We evaluate the performance of our proposed system on real flight
data where we demonstrate detection ranges comparable to the state of the art
with the additional capability of detecting stationary aircraft, rejecting
moving ground vehicles, and tracking multiple aircraft
- …