6 research outputs found

    A Weakly Supervised Approach for Estimating Spatial Density Functions from High-Resolution Satellite Imagery

    Full text link
    We propose a neural network component, the regional aggregation layer, that makes it possible to train a pixel-level density estimator using only coarse-grained density aggregates, which reflect the number of objects in an image region. Our approach is simple to use and does not require domain-specific assumptions about the nature of the density function. We evaluate our approach on several synthetic datasets. In addition, we use this approach to learn to estimate high-resolution population and housing density from satellite imagery. In all cases, we find that our approach results in better density estimates than a commonly used baseline. We also show how our housing density estimator can be used to classify buildings as residential or non-residential.Comment: 10 pages, 8 figures. ACM SIGSPATIAL 2018, Seattle, US

    Dense Dilated Convolutions Merging Network for Land Cover Classification

    Get PDF
    Land cover classification of remote sensing images is a challenging task due to limited amounts of annotated data, highly imbalanced classes, frequent incorrect pixel-level annotations, and an inherent complexity in the semantic segmentation task. In this article, we propose a novel architecture called the dense dilated convolutions' merging network (DDCM-Net) to address this task. The proposed DDCM-Net consists of dense dilated image convolutions merged with varying dilation rates. This effectively utilizes rich combinations of dilated convolutions that enlarge the network's receptive fields with fewer parameters and features compared with the state-of-the-art approaches in the remote sensing domain. Importantly, DDCM-Net obtains fused local- and global-context information, in effect incorporating surrounding discriminative capability for multiscale and complex-shaped objects with similar color and textures in very high-resolution aerial imagery. We demonstrate the effectiveness, robustness, and flexibility of the proposed DDCM-Net on the publicly available ISPRS Potsdam and Vaihingen data sets, as well as the DeepGlobe land cover data set. Our single model, trained on three-band Potsdam and Vaihingen data sets, achieves better accuracy in terms of both mean intersection over union (mIoU) and F1-score compared with other published models trained with more than three-band data. We further validate our model on the DeepGlobe data set, achieving state-of-the-art result 56.2% mIoU with much fewer parameters and at a lower computational cost compared with related recent work. Code available at https://github.com/samleoqh/DDCM-Semantic-Segmentation-PyTorchComment: Semantic Segmentation, 12 pages, TGRS-2020 early access in IEEE Transactions on Geoscience and Remote Sensing. 2020, Code available at https://github.com/samleoqh/DDCM-Semantic-Segmentation-PyTorc

    Computational Efficiency Studies in Computer Vision Tasks

    Get PDF
    Computer vision has made massive progress in recent years, thanks to hardware and algorithms development. Most methods are performance-driven meanwhile have a lack of consideration for energy efficiency. This dissertation proposes computational efficiency boosting methods for three different vision tasks: ultra-high resolution images segmentation, optical characters recognition for Unmanned Aerial Vehicles (UAV) based videos, and multiple object detection for UAV based videos. The pattern distribution of ultra-high resolution images is usually unbalanced. While part of an image contains complex and fine-grained patterns such as boundaries, most areas are composed of simple and repeated patterns. In the first chapter, we propose to learn a skip map, which can guide a segmentation network to skip simple patterns and hence reduce computational complexity. Specifically, the skip map highlights simple-pattern areas that can be down-sampled for processing at a lower resolution, while the remaining complex part is still segmented at the original resolution. Applied on the state-of-the-art ultra-high resolution image segmentation network GLNet, our proposed skip map saves more than 30% computation while maintaining comparable segmentation performance. In the second chapter, we propose an end-to-end system for UAV videos OCR framework. We first revisit RCNN’s crop & resize training strategy and empirically find that it outperforms aligned RoI sampling on a real-world video text dataset captured by UAV. We further propose a multi-stage image processor that takes videos’ redundancy, continuity, and mixed degradation into account to reduce energy consumption. Lastly, the model is pruned and quantized before deployed on Raspberry Pi. Our proposed energy-efficient video text spotting solution, dubbed as E²VTS, outperforms all previous methods by achieving a competitive tradeoff between energy efficiency and performance. In the last chapter, we propose an energy-efficient video multiple objects detection solution. Besides designing a fast multiple object detector, we propose a data synthesis and a knowledge transfer-based annotation method to overcome class imbalance and domain gap issues. This solution was implemented on LPCVC 2021 UVA challenge and judged to be the first-place winner

    Advancing Land Cover Mapping in Remote Sensing with Deep Learning

    Get PDF
    Automatic mapping of land cover in remote sensing data plays an increasingly significant role in several earth observation (EO) applications, such as sustainable development, autonomous agriculture, and urban planning. Due to the complexity of the real ground surface and environment, accurate classification of land cover types is facing many challenges. This thesis provides novel deep learning-based solutions to land cover mapping challenges such as how to deal with intricate objects and imbalanced classes in multi-spectral and high-spatial resolution remote sensing data. The first work presents a novel model to learn richer multi-scale and global contextual representations in very high-resolution remote sensing images, namely the dense dilated convolutions' merging (DDCM) network. The proposed method is light-weighted, flexible and extendable, so that it can be used as a simple yet effective encoder and decoder module to address different classification and semantic mapping challenges. Intensive experiments on different benchmark remote sensing datasets demonstrate that the proposed method can achieve better performance but consume much fewer computation resources compared with other published methods. Next, a novel graph model is developed for capturing long-range pixel dependencies in remote sensing images to improve land cover mapping. One key component in the method is the self-constructing graph (SCG) module that can effectively construct global context relations (latent graph structure) without requiring prior knowledge graphs. The proposed SCG-based models achieved competitive performance on different representative remote sensing datasets with faster training and lower computational cost compared to strong baseline models. The third work introduces a new framework, namely the multi-view self-constructing graph (MSCG) network, to extend the vanilla SCG model to be able to capture multi-view context representations with rotation invariance to achieve improved segmentation performance. Meanwhile, a novel adaptive class weighting loss function is developed to alleviate the issue of class imbalance commonly found in EO datasets for semantic segmentation. Experiments on benchmark data demonstrate the proposed framework is computationally efficient and robust to produce improved segmentation results for imbalanced classes. To address the key challenges in multi-modal land cover mapping of remote sensing data, namely, 'what', 'how' and 'where' to effectively fuse multi-source features and to efficiently learn optimal joint representations of different modalities, the last work presents a compact and scalable multi-modal deep learning framework (MultiModNet) based on two novel modules: the pyramid attention fusion module and the gated fusion unit. The proposed MultiModNet outperforms the strong baselines on two representative remote sensing datasets with fewer parameters and at a lower computational cost. Extensive ablation studies also validate the effectiveness and flexibility of the framework

    Recent advances in video analytics for rail network surveillance for security, trespass and suicide prevention— a survey

    Get PDF
    Railway networks systems are by design open and accessible to people, but this presents challenges in the prevention of events such as terrorism, trespass, and suicide fatalities. With the rapid advancement of machine learning, numerous computer vision methods have been developed in closed-circuit television (CCTV) surveillance systems for the purposes of managing public spaces. These methods are built based on multiple types of sensors and are designed to automatically detect static objects and unexpected events, monitor people, and prevent potential dangers. This survey focuses on recently developed CCTV surveillance methods for rail networks, discusses the challenges they face, their advantages and disadvantages and a vision for future railway surveillance systems. State-of-the-art methods for object detection and behaviour recognition applied to rail network surveillance systems are introduced, and the ethics of handling personal data and the use of automated systems are also considered