6 research outputs found
MOR-UAV: A Benchmark Dataset and Baselines for Moving Object Recognition in UAV Videos
Visual data collected from Unmanned Aerial Vehicles (UAVs) has opened a new
frontier of computer vision that requires automated analysis of aerial
images/videos. However, the existing UAV datasets primarily focus on object
detection. An object detector does not differentiate between the moving and
non-moving objects. Given a real-time UAV video stream, how can we both
localize and classify the moving objects, i.e. perform moving object
recognition (MOR)? The MOR is one of the essential tasks to support various UAV
vision-based applications including aerial surveillance, search and rescue,
event recognition, urban and rural scene understanding.To the best of our
knowledge, no labeled dataset is available for MOR evaluation in UAV videos.
Therefore, in this paper, we introduce MOR-UAV, a large-scale video dataset for
MOR in aerial videos. We achieve this by labeling axis-aligned bounding boxes
for moving objects which requires less computational resources than producing
pixel-level estimates. We annotate 89,783 moving object instances collected
from 30 UAV videos, consisting of 10,948 frames in various scenarios such as
weather conditions, occlusion, changing flying altitude and multiple camera
views. We assigned the labels for two categories of vehicles (car and heavy
vehicle). Furthermore, we propose a deep unified framework MOR-UAVNet for MOR
in UAV videos. Since, this is a first attempt for MOR in UAV videos, we present
16 baseline results based on the proposed framework over the MOR-UAV dataset
through quantitative and qualitative experiments. We also analyze the
motion-salient regions in the network through multiple layer visualizations.
The MOR-UAVNet works online at inference as it requires only few past frames.
Moreover, it doesn't require predefined target initialization from user.
Experiments also demonstrate that the MOR-UAV dataset is quite challenging
Learning by Analogy: Reliable Supervision from Transformations for Unsupervised Optical Flow Estimation
Unsupervised learning of optical flow, which leverages the supervision from
view synthesis, has emerged as a promising alternative to supervised methods.
However, the objective of unsupervised learning is likely to be unreliable in
challenging scenes. In this work, we present a framework to use more reliable
supervision from transformations. It simply twists the general unsupervised
learning pipeline by running another forward pass with transformed data from
augmentation, along with using transformed predictions of original data as the
self-supervision signal. Besides, we further introduce a lightweight network
with multiple frames by a highly-shared flow decoder. Our method consistently
gets a leap of performance on several benchmarks with the best accuracy among
deep unsupervised methods. Also, our method achieves competitive results to
recent fully supervised methods while with much fewer parameters.Comment: Accepted to CVPR 2020, https://github.com/lliuz/ARFlo
Unsupervised Monocular Depth Reconstruction of Non-Rigid Scenes
Monocular depth reconstruction of complex and dynamic scenes is a highly
challenging problem. While for rigid scenes learning-based methods have been
offering promising results even in unsupervised cases, there exists little to
no literature addressing the same for dynamic and deformable scenes. In this
work, we present an unsupervised monocular framework for dense depth estimation
of dynamic scenes, which jointly reconstructs rigid and non-rigid parts without
explicitly modelling the camera motion. Using dense correspondences, we derive
a training objective that aims to opportunistically preserve pairwise distances
between reconstructed 3D points. In this process, the dense depth map is
learned implicitly using the as-rigid-as-possible hypothesis. Our method
provides promising results, demonstrating its capability of reconstructing 3D
from challenging videos of non-rigid scenes. Furthermore, the proposed method
also provides unsupervised motion segmentation results as an auxiliary output