5,487 research outputs found

    Cross-Task Transfer for Geotagged Audiovisual Aerial Scene Recognition

    Get PDF
    Aerial scene recognition is a fundamental task in remote sensing and has recently received increased interest. While the visual information from overhead images with powerful models and efficient algorithms yields considerable performance on scene recognition, it still suffers from the variation of ground objects, lighting conditions etc. Inspired by the multi-channel perception theory in cognition science, in this paper, for improving the performance on the aerial scene recognition, we explore a novel audiovisual aerial scene recognition task using both images and sounds as input. Based on an observation that some specific sound events are more likely to be heard at a given geographic location, we propose to exploit the knowledge from the sound events to improve the performance on the aerial scene recognition. For this purpose, we have constructed a new dataset named AuDio Visual Aerial sceNe reCognition datasEt (ADVANCE). With the help of this dataset, we evaluate three proposed approaches for transferring the sound event knowledge to the aerial scene recognition task in a multimodal learning framework, and show the benefit of exploiting the audio information for the aerial scene recognition. The source code is publicly available for reproducibility purposes.Comment: ECCV 202

    Automated Multi-Modal Search and Rescue using Boosted Histogram of Oriented Gradients

    Get PDF
    Unmanned Aerial Vehicles (UAVs) provides a platform for many automated tasks and with an ever increasing advances in computing, these tasks can be more complex. The use of UAVs is expanded in this thesis with the goal of Search and Rescue (SAR), where a UAV can assist fast responders to search for a lost person and relay possible search areas back to SAR teams. To identify a person from an aerial perspective, low-level Histogram of Oriented Gradients (HOG) feature descriptors are used over a segmented region, provided from thermal data, to increase classification speed. This thesis also introduces a dataset to support a Bird’s-Eye-View (BEV) perspective and tests the viability of low level HOG feature descriptors on this dataset. The low-level feature descriptors are known as Boosted Histogram of Oriented Gradients (BHOG) features, which discretizes gradients over varying sized cells and blocks that are trained with a Cascaded Gentle AdaBoost Classifier using our compiled BEV dataset. The classification is supported by multiple sensing modes with color and thermal videos to increase classification speed. The thermal video is segmented to indicate any Region of Interest (ROI) that are mapped to the color video where classification occurs. The ROI decreases classification time needed for the aerial platform by eliminating a per-frame sliding window. Testing reveals that with the use of only color data iv and a classifier trained for a profile of a person, there is an average recall of 78%, while the thermal detection results with an average recall of 76%. However, there is a speed up of 2 with a video of 240x320 resolution. The BEV testing reveals that higher resolutions are favored with a recall rate of 71% using BHOG features, and 92% using Haar-Features. In the lower resolution BEV testing, the recall rates are 42% and 55%, for BHOG and Haar-Features, respectively
    • …
    corecore