663 research outputs found

    Building-road Collaborative Extraction from Remotely Sensed Images via Cross-Interaction

    Full text link
    Buildings are the basic carrier of social production and human life; roads are the links that interconnect social networks. Building and road information has important application value in the frontier fields of regional coordinated development, disaster prevention, auto-driving, etc. Mapping buildings and roads from very high-resolution (VHR) remote sensing images have become a hot research topic. However, the existing methods often ignore the strong spatial correlation between roads and buildings and extract them in isolation. To fully utilize the complementary advantages between buildings and roads, we propose a building-road collaborative extraction method based on multi-task and cross-scale feature interaction to improve the accuracy of both tasks in a complementary way. A multi-task interaction module is proposed to interact information across tasks and preserve the unique information of each task, which tackle the seesaw phenomenon in multitask learning. By considering the variation in appearance and structure between buildings and roads, a cross-scale interaction module is designed to automatically learn the optimal reception field for different tasks. Compared with many existing methods that train each task individually, the proposed collaborative extraction method can utilize the complementary advantages between buildings and roads by the proposed inter-task and inter-scale feature interactions, and automatically select the optimal reception field for different tasks. Experiments on a wide range of urban and rural scenarios show that the proposed algorithm can achieve building-road extraction with outstanding performance and efficiency.Comment: 34 pages,9 figures, submitted to ISPRS Journal of Photogrammetry and Remote Sensin

    Cross-Task Transfer for Geotagged Audiovisual Aerial Scene Recognition

    Get PDF
    Aerial scene recognition is a fundamental task in remote sensing and has recently received increased interest. While the visual information from overhead images with powerful models and efficient algorithms yields considerable performance on scene recognition, it still suffers from the variation of ground objects, lighting conditions etc. Inspired by the multi-channel perception theory in cognition science, in this paper, for improving the performance on the aerial scene recognition, we explore a novel audiovisual aerial scene recognition task using both images and sounds as input. Based on an observation that some specific sound events are more likely to be heard at a given geographic location, we propose to exploit the knowledge from the sound events to improve the performance on the aerial scene recognition. For this purpose, we have constructed a new dataset named AuDio Visual Aerial sceNe reCognition datasEt (ADVANCE). With the help of this dataset, we evaluate three proposed approaches for transferring the sound event knowledge to the aerial scene recognition task in a multimodal learning framework, and show the benefit of exploiting the audio information for the aerial scene recognition. The source code is publicly available for reproducibility purposes.Comment: ECCV 202

    Machine Learning for Robust Understanding of Scene Materials in Hyperspectral Images

    Get PDF
    The major challenges in hyperspectral (HS) imaging and data analysis are expensive sensors, high dimensionality of the signal, limited ground truth, and spectral variability. This dissertation develops and analyzes machine learning based methods to address these problems. In the first part, we examine one of the most important HS data analysis tasks-vegetation parameter estimation. We present two Gaussian processes based approaches for improving the accuracy of vegetation parameter retrieval when ground truth is limited and/or spectral variability is high. The first is the adoption of covariance functions based on well-established metrics, such as, spectral angle and spectral correlation, which are known to be better measures of similarity for spectral data. The second is the joint modeling of related vegetation parameters by multitask Gaussian processes so that the prediction accuracy of the vegetation parameter of interest can be improved with the aid of related vegetation parameters for which a larger set of ground truth is available. The efficacy of the proposed methods is demonstrated by comparing them against state-of-the art approaches on three real-world HS datasets and one synthetic dataset. In the second part, we demonstrate how Bayesian optimization can be applied to jointly tune the different components of hyperspectral data analysis frameworks for better performance. Experimental validation on the spatial-spectral classification framework consisting of a classifier and a Markov random field is provided. In the third part, we investigate whether high dimensional HS spectra can be reconstructed from low dimensional multispectral (MS) signals, that can be obtained from much cheaper, lower spectral resolution sensors. A novel end-to-end convolutional residual neural network architecture is proposed that can simultaneously optimize both the MS bands and the transformation to reconstruct HS spectra from MS signals by analyzing a large quantity of HS data. The learned band can be implemented in sensor hardware and the learned transformation can be incorporated in the data processing pipeline to build a low-cost hyperspectral data collection system. Using a diverse set of real-world datasets, we show how the proposed approach of optimizing MS bands along with the transformation rather than just optimizing the transformation with fixed bands, as proposed by previous studies, can drastically increase the reconstruction accuracy. Additionally, we also investigate the prospects of using reconstructed HS spectra for land cover classification

    Hierarchical Disentanglement-Alignment Network for Robust SAR Vehicle Recognition

    Full text link
    Vehicle recognition is a fundamental problem in SAR image interpretation. However, robustly recognizing vehicle targets is a challenging task in SAR due to the large intraclass variations and small interclass variations. Additionally, the lack of large datasets further complicates the task. Inspired by the analysis of target signature variations and deep learning explainability, this paper proposes a novel domain alignment framework named the Hierarchical Disentanglement-Alignment Network (HDANet) to achieve robustness under various operating conditions. Concisely, HDANet integrates feature disentanglement and alignment into a unified framework with three modules: domain data generation, multitask-assisted mask disentanglement, and domain alignment of target features. The first module generates diverse data for alignment, and three simple but effective data augmentation methods are designed to simulate target signature variations. The second module disentangles the target features from background clutter using the multitask-assisted mask to prevent clutter from interfering with subsequent alignment. The third module employs a contrastive loss for domain alignment to extract robust target features from generated diverse data and disentangled features. Lastly, the proposed method demonstrates impressive robustness across nine operating conditions in the MSTAR dataset, and extensive qualitative and quantitative analyses validate the effectiveness of our framework

    HED-UNet: Combined Segmentation and Edge Detection for Monitoring the Antarctic Coastline

    Full text link
    Deep learning-based coastline detection algorithms have begun to outshine traditional statistical methods in recent years. However, they are usually trained only as single-purpose models to either segment land and water or delineate the coastline. In contrast to this, a human annotator will usually keep a mental map of both segmentation and delineation when performing manual coastline detection. To take into account this task duality, we therefore devise a new model to unite these two approaches in a deep learning model. By taking inspiration from the main building blocks of a semantic segmentation framework (UNet) and an edge detection framework (HED), both tasks are combined in a natural way. Training is made efficient by employing deep supervision on side predictions at multiple resolutions. Finally, a hierarchical attention mechanism is introduced to adaptively merge these multiscale predictions into the final model output. The advantages of this approach over other traditional and deep learning-based methods for coastline detection are demonstrated on a dataset of Sentinel-1 imagery covering parts of the Antarctic coast, where coastline detection is notoriously difficult. An implementation of our method is available at \url{https://github.com/khdlr/HED-UNet}.Comment: This work has been accepted by IEEE TGRS for publication. Copyright may be transferred without notice, after which this version may no longer be accessibl
    corecore