Search CORE

46,179 research outputs found

SpaceNet MVOI: a Multi-View Overhead Imagery Dataset

Author: Bastidas Alexei
Kumar Varun
Lindenbaum David
McPherson Sean
Shermeyer Jacob
Tang Hanlin
Van Etten Adam
Weir Nicholas
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 15/08/2019
Field of study

Detection and segmentation of objects in overheard imagery is a challenging task. The variable density, random orientation, small size, and instance-to-instance heterogeneity of objects in overhead imagery calls for approaches distinct from existing models designed for natural scene datasets. Though new overhead imagery datasets are being developed, they almost universally comprise a single view taken from directly overhead ("at nadir"), failing to address a critical variable: look angle. By contrast, views vary in real-world overhead imagery, particularly in dynamic scenarios such as natural disasters where first looks are often over 40 degrees off-nadir. This represents an important challenge to computer vision methods, as changing view angle adds distortions, alters resolution, and changes lighting. At present, the impact of these perturbations for algorithmic detection and segmentation of objects is untested. To address this problem, we present an open source Multi-View Overhead Imagery dataset, termed SpaceNet MVOI, with 27 unique looks from a broad range of viewing angles (-32.5 degrees to 54.0 degrees). Each of these images cover the same 665 square km geographic extent and are annotated with 126,747 building footprint labels, enabling direct assessment of the impact of viewpoint perturbation on model performance. We benchmark multiple leading segmentation and object detection models on: (1) building detection, (2) generalization to unseen viewing angles and resolutions, and (3) sensitivity of building footprint extraction to changes in resolution. We find that state of the art segmentation and object detection models struggle to identify buildings in off-nadir imagery and generalize poorly to unseen views, presenting an important benchmark to explore the broadly relevant challenge of detecting small, heterogeneous target objects in visually dynamic contexts.Comment: Accepted into IEEE International Conference on Computer Vision (ICCV) 201

arXiv.org e-Print Archive

Crossref

Digital image correlation (DIC) analysis of the 3 December 2013 Montescaglioso landslide (Basilicata, Southern Italy). Results from a multi-dataset investigation

Author: Bozzano Francesca
Caporossi Paolo
Mazzanti Paolo
Publication venue: 'MDPI AG'
Publication date: 01/01/2018
Field of study

Image correlation remote sensing monitoring techniques are becoming key tools for providing effective qualitative and quantitative information suitable for natural hazard assessments, specifically for landslide investigation and monitoring. In recent years, these techniques have been successfully integrated and shown to be complementary and competitive with more standard remote sensing techniques, such as satellite or terrestrial Synthetic Aperture Radar interferometry. The objective of this article is to apply the proposed in-depth calibration and validation analysis, referred to as the Digital Image Correlation technique, to measure landslide displacement. The availability of a multi-dataset for the 3 December 2013 Montescaglioso landslide, characterized by different types of imagery, such as LANDSAT 8 OLI (Operational Land Imager) and TIRS (Thermal Infrared Sensor), high-resolution airborne optical orthophotos, Digital Terrain Models and COSMO-SkyMed Synthetic Aperture Radar, allows for the retrieval of the actual landslide displacement field at values ranging from a few meters (2–3 m in the north-eastern sector of the landslide) to 20–21 m (local peaks on the central body of the landslide). Furthermore, comprehensive sensitivity analyses and statistics-based processing approaches are used to identify the role of the background noise that affects the whole dataset. This noise has a directly proportional relationship to the different geometric and temporal resolutions of the processed imagery. Moreover, the accuracy of the environmental-instrumental background noise evaluation allowed the actual displacement measurements to be correctly calibrated and validated, thereby leading to a better definition of the threshold values of the maximum Digital Image Correlation sub-pixel accuracy and reliability (ranging from 1/10 to 8/10 pixel) for each processed dataset

Multidisciplinary Digital Publishing Institute

Directory of Open Access Journals

Archivio della ricerca- Università di Roma La Sapienza

HP-GAN: Probabilistic 3D human motion prediction via GAN

Author: Barsoum Emad
Kender John
Liu Zicheng
Publication venue
Publication date: 27/11/2017
Field of study

Predicting and understanding human motion dynamics has many applications, such as motion synthesis, augmented reality, security, and autonomous vehicles. Due to the recent success of generative adversarial networks (GAN), there has been much interest in probabilistic estimation and synthetic data generation using deep neural network architectures and learning algorithms. We propose a novel sequence-to-sequence model for probabilistic human motion prediction, trained with a modified version of improved Wasserstein generative adversarial networks (WGAN-GP), in which we use a custom loss function designed for human motion prediction. Our model, which we call HP-GAN, learns a probability density function of future human poses conditioned on previous poses. It predicts multiple sequences of possible future human poses, each from the same input sequence but a different vector z drawn from a random distribution. Furthermore, to quantify the quality of the non-deterministic predictions, we simultaneously train a motion-quality-assessment model that learns the probability that a given skeleton sequence is a real human motion. We test our algorithm on two of the largest skeleton datasets: NTURGB-D and Human3.6M. We train our model on both single and multiple action types. Its predictive power for long-term motion estimation is demonstrated by generating multiple plausible futures of more than 30 frames from just 10 frames of input. We show that most sequences generated from the same input have more than 50\% probabilities of being judged as a real human sequence. We will release all the code used in this paper to Github

arXiv.org e-Print Archive

Crossref