4,188 research outputs found
Smart environment monitoring through micro unmanned aerial vehicles
In recent years, the improvements of small-scale Unmanned Aerial Vehicles (UAVs) in terms of flight time, automatic control, and remote transmission are promoting the development of a wide range of practical applications. In aerial video surveillance, the monitoring of broad areas still has many challenges due to the achievement of different tasks in real-time, including mosaicking, change detection, and object detection. In this thesis work, a small-scale UAV based vision system to maintain regular surveillance over target areas is proposed. The system works in two modes. The first mode allows to monitor an area of interest by performing several flights. During the first flight, it creates an incremental geo-referenced mosaic of an area of interest and classifies all the known elements (e.g., persons) found on the ground by an improved Faster R-CNN architecture previously trained. In subsequent reconnaissance flights, the system searches for any changes (e.g., disappearance of persons) that may occur in the mosaic by a histogram equalization and RGB-Local Binary Pattern (RGB-LBP) based algorithm. If present, the mosaic is updated. The second mode, allows to perform a real-time classification by using, again, our improved Faster R-CNN model, useful for time-critical operations. Thanks to different design features, the system works in real-time and performs mosaicking and change detection tasks at low-altitude, thus allowing the classification even of small objects. The proposed system was tested by using the whole set of challenging video sequences contained in the UAV Mosaicking and Change Detection (UMCD) dataset and other public datasets. The evaluation of the system by well-known performance metrics has shown remarkable results in terms of mosaic creation and updating, as well as in terms of change detection and object detection
RGB2LIDAR: Towards Solving Large-Scale Cross-Modal Visual Localization
We study an important, yet largely unexplored problem of large-scale
cross-modal visual localization by matching ground RGB images to a
geo-referenced aerial LIDAR 3D point cloud (rendered as depth images). Prior
works were demonstrated on small datasets and did not lend themselves to
scaling up for large-scale applications. To enable large-scale evaluation, we
introduce a new dataset containing over 550K pairs (covering 143 km^2 area) of
RGB and aerial LIDAR depth images. We propose a novel joint embedding based
method that effectively combines the appearance and semantic cues from both
modalities to handle drastic cross-modal variations. Experiments on the
proposed dataset show that our model achieves a strong result of a median rank
of 5 in matching across a large test set of 50K location pairs collected from a
14km^2 area. This represents a significant advancement over prior works in
performance and scale. We conclude with qualitative results to highlight the
challenging nature of this task and the benefits of the proposed model. Our
work provides a foundation for further research in cross-modal visual
localization.Comment: ACM Multimedia 202
- …