6 research outputs found
A Weakly Supervised Approach for Estimating Spatial Density Functions from High-Resolution Satellite Imagery
We propose a neural network component, the regional aggregation layer, that
makes it possible to train a pixel-level density estimator using only
coarse-grained density aggregates, which reflect the number of objects in an
image region. Our approach is simple to use and does not require
domain-specific assumptions about the nature of the density function. We
evaluate our approach on several synthetic datasets. In addition, we use this
approach to learn to estimate high-resolution population and housing density
from satellite imagery. In all cases, we find that our approach results in
better density estimates than a commonly used baseline. We also show how our
housing density estimator can be used to classify buildings as residential or
non-residential.Comment: 10 pages, 8 figures. ACM SIGSPATIAL 2018, Seattle, US
Dense Dilated Convolutions Merging Network for Land Cover Classification
Land cover classification of remote sensing images is a challenging task due
to limited amounts of annotated data, highly imbalanced classes, frequent
incorrect pixel-level annotations, and an inherent complexity in the semantic
segmentation task. In this article, we propose a novel architecture called the
dense dilated convolutions' merging network (DDCM-Net) to address this task.
The proposed DDCM-Net consists of dense dilated image convolutions merged with
varying dilation rates. This effectively utilizes rich combinations of dilated
convolutions that enlarge the network's receptive fields with fewer parameters
and features compared with the state-of-the-art approaches in the remote
sensing domain. Importantly, DDCM-Net obtains fused local- and global-context
information, in effect incorporating surrounding discriminative capability for
multiscale and complex-shaped objects with similar color and textures in very
high-resolution aerial imagery. We demonstrate the effectiveness, robustness,
and flexibility of the proposed DDCM-Net on the publicly available ISPRS
Potsdam and Vaihingen data sets, as well as the DeepGlobe land cover data set.
Our single model, trained on three-band Potsdam and Vaihingen data sets,
achieves better accuracy in terms of both mean intersection over union (mIoU)
and F1-score compared with other published models trained with more than
three-band data. We further validate our model on the DeepGlobe data set,
achieving state-of-the-art result 56.2% mIoU with much fewer parameters and at
a lower computational cost compared with related recent work. Code available at
https://github.com/samleoqh/DDCM-Semantic-Segmentation-PyTorchComment: Semantic Segmentation, 12 pages, TGRS-2020 early access in IEEE
Transactions on Geoscience and Remote Sensing. 2020, Code available at
https://github.com/samleoqh/DDCM-Semantic-Segmentation-PyTorc
Computational Efficiency Studies in Computer Vision Tasks
Computer vision has made massive progress in recent years, thanks to hardware and algorithms development. Most methods are performance-driven meanwhile have a lack of consideration for energy efficiency. This dissertation proposes computational efficiency boosting methods for three different vision tasks: ultra-high resolution images segmentation, optical characters recognition for Unmanned Aerial Vehicles (UAV) based videos, and multiple object detection for UAV based videos.
The pattern distribution of ultra-high resolution images is usually unbalanced. While part of an image contains complex and fine-grained patterns such as boundaries, most areas are composed of simple and repeated patterns. In the first chapter, we propose to learn a skip map, which can guide a segmentation network to skip simple patterns and hence reduce computational complexity. Specifically, the skip map highlights simple-pattern areas that can be down-sampled for processing
at a lower resolution, while the remaining complex part is still segmented at the original resolution. Applied on the state-of-the-art ultra-high resolution image segmentation network GLNet, our proposed skip map saves more than 30% computation while maintaining comparable segmentation performance.
In the second chapter, we propose an end-to-end system for UAV videos OCR framework. We first revisit RCNN’s crop & resize training strategy and empirically find that it outperforms aligned RoI sampling on a real-world video text dataset captured by UAV. We further propose a multi-stage image processor that takes videos’ redundancy, continuity, and mixed degradation into account to reduce energy consumption. Lastly, the model is pruned and quantized before deployed on Raspberry Pi. Our proposed energy-efficient video text spotting solution, dubbed as E²VTS, outperforms all previous methods by achieving a competitive tradeoff between energy efficiency and performance.
In the last chapter, we propose an energy-efficient video multiple objects detection solution. Besides designing a fast multiple object detector, we propose a data synthesis and a knowledge transfer-based annotation method to overcome class imbalance and domain gap issues. This solution was implemented on LPCVC 2021 UVA challenge and judged to be the first-place winner
Advancing Land Cover Mapping in Remote Sensing with Deep Learning
Automatic mapping of land cover in remote sensing data plays an increasingly significant role in several earth observation (EO) applications, such as sustainable development, autonomous agriculture, and urban planning. Due to the complexity of the real ground surface and environment, accurate classification of land cover types is facing many challenges. This thesis provides novel deep learning-based solutions to land cover mapping challenges such as how to deal with intricate objects and imbalanced classes in multi-spectral and high-spatial resolution remote sensing data.
The first work presents a novel model to learn richer multi-scale and global contextual representations in very high-resolution remote sensing images, namely the dense dilated convolutions' merging (DDCM) network. The proposed method is light-weighted, flexible and extendable, so that it can be used as a simple yet effective encoder and decoder module to address different classification and semantic mapping challenges. Intensive experiments on different benchmark remote sensing datasets demonstrate that the proposed method can achieve better performance but consume much fewer computation resources compared with other published methods.
Next, a novel graph model is developed for capturing long-range pixel dependencies in remote sensing images to improve land cover mapping. One key component in the method is the self-constructing graph (SCG) module that can effectively construct global context relations (latent graph structure) without requiring prior knowledge graphs. The proposed SCG-based models achieved competitive performance on different representative remote sensing datasets with faster training and lower computational cost compared to strong baseline models.
The third work introduces a new framework, namely the multi-view self-constructing graph (MSCG) network, to extend the vanilla SCG model to be able to capture multi-view context representations with rotation invariance to achieve improved segmentation performance. Meanwhile, a novel adaptive class weighting loss function is developed to alleviate the issue of class imbalance commonly found in EO datasets for semantic segmentation. Experiments on benchmark data demonstrate the proposed framework is computationally efficient and robust to produce improved segmentation results for imbalanced classes.
To address the key challenges in multi-modal land cover mapping of remote sensing data, namely, 'what', 'how' and 'where' to effectively fuse multi-source features and to efficiently learn optimal joint representations of different modalities, the last work presents a compact and scalable multi-modal deep learning framework (MultiModNet) based on two novel modules: the pyramid attention fusion module and the gated fusion unit. The proposed MultiModNet outperforms the strong baselines on two representative remote sensing datasets with fewer parameters and at a lower computational cost. Extensive ablation studies also validate the effectiveness and flexibility of the framework
Recent advances in video analytics for rail network surveillance for security, trespass and suicide prevention— a survey
Railway networks systems are by design open and accessible to people, but this presents challenges in the prevention of events such as terrorism, trespass, and suicide fatalities. With the rapid advancement of machine learning, numerous computer vision methods have been developed in closed-circuit television (CCTV) surveillance systems for the purposes of managing public spaces.
These methods are built based on multiple types of sensors and are designed to automatically detect static objects and unexpected events, monitor people, and prevent potential dangers. This survey focuses on recently developed CCTV surveillance methods for rail networks, discusses the challenges they face, their advantages and disadvantages and a vision for future railway surveillance systems. State-of-the-art methods for object detection and behaviour recognition applied to rail network surveillance systems are introduced, and the ethics of handling personal data and the use of automated systems are also considered