14,362 research outputs found
SpaceNet MVOI: a Multi-View Overhead Imagery Dataset
Detection and segmentation of objects in overheard imagery is a challenging
task. The variable density, random orientation, small size, and
instance-to-instance heterogeneity of objects in overhead imagery calls for
approaches distinct from existing models designed for natural scene datasets.
Though new overhead imagery datasets are being developed, they almost
universally comprise a single view taken from directly overhead ("at nadir"),
failing to address a critical variable: look angle. By contrast, views vary in
real-world overhead imagery, particularly in dynamic scenarios such as natural
disasters where first looks are often over 40 degrees off-nadir. This
represents an important challenge to computer vision methods, as changing view
angle adds distortions, alters resolution, and changes lighting. At present,
the impact of these perturbations for algorithmic detection and segmentation of
objects is untested. To address this problem, we present an open source
Multi-View Overhead Imagery dataset, termed SpaceNet MVOI, with 27 unique looks
from a broad range of viewing angles (-32.5 degrees to 54.0 degrees). Each of
these images cover the same 665 square km geographic extent and are annotated
with 126,747 building footprint labels, enabling direct assessment of the
impact of viewpoint perturbation on model performance. We benchmark multiple
leading segmentation and object detection models on: (1) building detection,
(2) generalization to unseen viewing angles and resolutions, and (3)
sensitivity of building footprint extraction to changes in resolution. We find
that state of the art segmentation and object detection models struggle to
identify buildings in off-nadir imagery and generalize poorly to unseen views,
presenting an important benchmark to explore the broadly relevant challenge of
detecting small, heterogeneous target objects in visually dynamic contexts.Comment: Accepted into IEEE International Conference on Computer Vision (ICCV)
201
Object Detection in 20 Years: A Survey
Object detection, as of one the most fundamental and challenging problems in
computer vision, has received great attention in recent years. Its development
in the past two decades can be regarded as an epitome of computer vision
history. If we think of today's object detection as a technical aesthetics
under the power of deep learning, then turning back the clock 20 years we would
witness the wisdom of cold weapon era. This paper extensively reviews 400+
papers of object detection in the light of its technical evolution, spanning
over a quarter-century's time (from the 1990s to 2019). A number of topics have
been covered in this paper, including the milestone detectors in history,
detection datasets, metrics, fundamental building blocks of the detection
system, speed up techniques, and the recent state of the art detection methods.
This paper also reviews some important detection applications, such as
pedestrian detection, face detection, text detection, etc, and makes an in-deep
analysis of their challenges as well as technical improvements in recent years.Comment: This work has been submitted to the IEEE TPAMI for possible
publicatio
Video Object Detection with an Aligned Spatial-Temporal Memory
We introduce Spatial-Temporal Memory Networks for video object detection. At
its core, a novel Spatial-Temporal Memory module (STMM) serves as the recurrent
computation unit to model long-term temporal appearance and motion dynamics.
The STMM's design enables full integration of pretrained backbone CNN weights,
which we find to be critical for accurate detection. Furthermore, in order to
tackle object motion in videos, we propose a novel MatchTrans module to align
the spatial-temporal memory from frame to frame. Our method produces
state-of-the-art results on the benchmark ImageNet VID dataset, and our
ablative studies clearly demonstrate the contribution of our different design
choices. We release our code and models at
http://fanyix.cs.ucdavis.edu/project/stmn/project.html
- …