46,179 research outputs found
SpaceNet MVOI: a Multi-View Overhead Imagery Dataset
Detection and segmentation of objects in overheard imagery is a challenging
task. The variable density, random orientation, small size, and
instance-to-instance heterogeneity of objects in overhead imagery calls for
approaches distinct from existing models designed for natural scene datasets.
Though new overhead imagery datasets are being developed, they almost
universally comprise a single view taken from directly overhead ("at nadir"),
failing to address a critical variable: look angle. By contrast, views vary in
real-world overhead imagery, particularly in dynamic scenarios such as natural
disasters where first looks are often over 40 degrees off-nadir. This
represents an important challenge to computer vision methods, as changing view
angle adds distortions, alters resolution, and changes lighting. At present,
the impact of these perturbations for algorithmic detection and segmentation of
objects is untested. To address this problem, we present an open source
Multi-View Overhead Imagery dataset, termed SpaceNet MVOI, with 27 unique looks
from a broad range of viewing angles (-32.5 degrees to 54.0 degrees). Each of
these images cover the same 665 square km geographic extent and are annotated
with 126,747 building footprint labels, enabling direct assessment of the
impact of viewpoint perturbation on model performance. We benchmark multiple
leading segmentation and object detection models on: (1) building detection,
(2) generalization to unseen viewing angles and resolutions, and (3)
sensitivity of building footprint extraction to changes in resolution. We find
that state of the art segmentation and object detection models struggle to
identify buildings in off-nadir imagery and generalize poorly to unseen views,
presenting an important benchmark to explore the broadly relevant challenge of
detecting small, heterogeneous target objects in visually dynamic contexts.Comment: Accepted into IEEE International Conference on Computer Vision (ICCV)
201
Digital image correlation (DIC) analysis of the 3 December 2013 Montescaglioso landslide (Basilicata, Southern Italy). Results from a multi-dataset investigation
Image correlation remote sensing monitoring techniques are becoming key tools for
providing effective qualitative and quantitative information suitable for natural hazard assessments,
specifically for landslide investigation and monitoring. In recent years, these techniques have
been successfully integrated and shown to be complementary and competitive with more standard
remote sensing techniques, such as satellite or terrestrial Synthetic Aperture Radar interferometry.
The objective of this article is to apply the proposed in-depth calibration and validation analysis,
referred to as the Digital Image Correlation technique, to measure landslide displacement.
The availability of a multi-dataset for the 3 December 2013 Montescaglioso landslide, characterized
by different types of imagery, such as LANDSAT 8 OLI (Operational Land Imager) and TIRS
(Thermal Infrared Sensor), high-resolution airborne optical orthophotos, Digital Terrain Models
and COSMO-SkyMed Synthetic Aperture Radar, allows for the retrieval of the actual landslide
displacement field at values ranging from a few meters (2–3 m in the north-eastern sector of the
landslide) to 20–21 m (local peaks on the central body of the landslide). Furthermore, comprehensive
sensitivity analyses and statistics-based processing approaches are used to identify the role of the
background noise that affects the whole dataset. This noise has a directly proportional relationship to
the different geometric and temporal resolutions of the processed imagery. Moreover, the accuracy
of the environmental-instrumental background noise evaluation allowed the actual displacement
measurements to be correctly calibrated and validated, thereby leading to a better definition of
the threshold values of the maximum Digital Image Correlation sub-pixel accuracy and reliability
(ranging from 1/10 to 8/10 pixel) for each processed dataset
HP-GAN: Probabilistic 3D human motion prediction via GAN
Predicting and understanding human motion dynamics has many applications,
such as motion synthesis, augmented reality, security, and autonomous vehicles.
Due to the recent success of generative adversarial networks (GAN), there has
been much interest in probabilistic estimation and synthetic data generation
using deep neural network architectures and learning algorithms.
We propose a novel sequence-to-sequence model for probabilistic human motion
prediction, trained with a modified version of improved Wasserstein generative
adversarial networks (WGAN-GP), in which we use a custom loss function designed
for human motion prediction. Our model, which we call HP-GAN, learns a
probability density function of future human poses conditioned on previous
poses. It predicts multiple sequences of possible future human poses, each from
the same input sequence but a different vector z drawn from a random
distribution. Furthermore, to quantify the quality of the non-deterministic
predictions, we simultaneously train a motion-quality-assessment model that
learns the probability that a given skeleton sequence is a real human motion.
We test our algorithm on two of the largest skeleton datasets: NTURGB-D and
Human3.6M. We train our model on both single and multiple action types. Its
predictive power for long-term motion estimation is demonstrated by generating
multiple plausible futures of more than 30 frames from just 10 frames of input.
We show that most sequences generated from the same input have more than 50\%
probabilities of being judged as a real human sequence. We will release all the
code used in this paper to Github
- …