1,425 research outputs found
Multiview Aerial Visual Recognition (MAVREC): Can Multi-view Improve Aerial Visual Perception?
Despite the commercial abundance of UAVs, aerial data acquisition remains
challenging, and the existing Asia and North America-centric open-source UAV
datasets are small-scale or low-resolution and lack diversity in scene
contextuality. Additionally, the color content of the scenes, solar-zenith
angle, and population density of different geographies influence the data
diversity. These two factors conjointly render suboptimal aerial-visual
perception of the deep neural network (DNN) models trained primarily on the
ground-view data, including the open-world foundational models.
To pave the way for a transformative era of aerial detection, we present
Multiview Aerial Visual RECognition or MAVREC, a video dataset where we record
synchronized scenes from different perspectives -- ground camera and
drone-mounted camera. MAVREC consists of around 2.5 hours of industry-standard
2.7K resolution video sequences, more than 0.5 million frames, and 1.1 million
annotated bounding boxes. This makes MAVREC the largest ground and aerial-view
dataset, and the fourth largest among all drone-based datasets across all
modalities and tasks. Through our extensive benchmarking on MAVREC, we
recognize that augmenting object detectors with ground-view images from the
corresponding geographical location is a superior pre-training strategy for
aerial detection. Building on this strategy, we benchmark MAVREC with a
curriculum-based semi-supervised object detection approach that leverages
labeled (ground and aerial) and unlabeled (only aerial) images to enhance the
aerial detection. We publicly release the MAVREC dataset:
https://mavrec.github.io
- …