532 research outputs found

    A Comprehensive Survey of Deep Learning in Remote Sensing: Theories, Tools and Challenges for the Community

    Full text link
    In recent years, deep learning (DL), a re-branding of neural networks (NNs), has risen to the top in numerous areas, namely computer vision (CV), speech recognition, natural language processing, etc. Whereas remote sensing (RS) possesses a number of unique challenges, primarily related to sensors and applications, inevitably RS draws from many of the same theories as CV; e.g., statistics, fusion, and machine learning, to name a few. This means that the RS community should be aware of, if not at the leading edge of, of advancements like DL. Herein, we provide the most comprehensive survey of state-of-the-art RS DL research. We also review recent new developments in the DL field that can be used in DL for RS. Namely, we focus on theories, tools and challenges for the RS community. Specifically, we focus on unsolved challenges and opportunities as it relates to (i) inadequate data sets, (ii) human-understandable solutions for modelling physical phenomena, (iii) Big Data, (iv) non-traditional heterogeneous data sources, (v) DL architectures and learning algorithms for spectral, spatial and temporal data, (vi) transfer learning, (vii) an improved theoretical understanding of DL systems, (viii) high barriers to entry, and (ix) training and optimizing the DL.Comment: 64 pages, 411 references. To appear in Journal of Applied Remote Sensin

    Segmentation of RADARSAT-2 Dual-Polarization Sea Ice Imagery

    Get PDF
    The mapping of sea ice is an important task for understanding global climate and for safe shipping. Currently, sea ice maps are created by human analysts with the help of remote sensing imagery, including synthetic aperture radar (SAR) imagery. While the maps are generally correct, they can be somewhat subjective and do not have pixel-level resolution due to the time consuming nature of manual segmentation. Therefore, automated sea ice mapping algorithms such as the multivariate iterative region growing with semantics (MIRGS) sea ice image segmentation algorithm are needed. MIRGS was designed to work with one-channel single-polarization SAR imagery from the RADARSAT-1 satellite. The launch of RADARSAT-2 has made available two-channel dual-polarization SAR imagery for the purposes of sea ice mapping. Dual-polarization imagery provides more information for distinguishing ice types, and one of the channels is less sensitive to changes in the backscatter caused by the SAR incidence angle parameter. In the past, this change in backscatter due to the incidence angle was a key limitation that prevented automatic segmentation of full SAR scenes. This thesis investigates techniques to make use of the dual-polarization data in MIRGS. An evaluation of MIRGS with RADARSAT-2 data was performed and showed that some detail was lost and that the incidence angle caused errors in segmentation. Several data fusion schemes were investigated to determine if they can improve performance. Gradient generation methods designed to take advantage of dual-polarization data, feature space fusion using linear and non-linear transforms as well as image fusion methods based on wavelet combination rules were implemented and tested. Tuning of the MIRGS parameters was performed to find the best set of parameters for segmentation of dual-polarization data. Results show that the standard MIRGS algorithm with default parameters provides the highest accuracy, so no changes are necessary for dual-polarization data. A hierarchical segmentation scheme that segments the dual-polarization channels separately was implemented to overcome the incidence angle errors. The technique is effective but requires more user input than the standard MIRGS algorithm

    Collaborative Learning in Computer Vision

    Get PDF
    The science of designing machines to extract meaningful information from digital images, videos, and other visual inputs is known as Computer Vision (CV). Deep learning algorithms cope CV problems by automatically learning task-specific features. Especially, Deep Neural Networks (DNNs) have become an essential component in CV solutions due to their ability to encode large amounts of data and capacity to manipulate billions of model parameters. Unlike machines, humans learn by rapidly constructing abstract models. This is undoubtedly due to the fact that good teachers supply their students with much more than just the correct answer; they also provide intuitive comments, comparisons, and explanations. In deep learning, the availability of such auxiliary information at training time (but not at test time) is referred to as learning by Privileged Information (PI). Typically, predictions (e.g., soft labels) produced by a bigger and better network teacher are used as structured knowledge to supervise the training of a smaller network student, helping the student network to generalize better than that trained from scratch. This dissertation focuses on the category of deep learning systems known as Collaborative Learning, where one DNN model helps other models or several models help each other during training to achieve strong generalization and thus high performance. The question we address here is thus the following: how can we take advantage of PI for training a deep learning model, knowing that, at test time, such PI might be missing? In this context, we introduce new methods to tackle several challenging real-world computer vision problems. First, we propose a method for model compression that leverages PI in a teacher-student framework along with customizable block-wise optimization for learning a target-specific lightweight structure of the neural network. In particular, the proposed resource-aware optimization is employed on suitable parts of the student network while respecting the expected resource budget (e.g., floating-point operations per inference and model parameters). In addition, soft predictions produced by the teacher network are leveraged as a source of PI, forcing the student to preserve baseline performance during network structure optimization. Second, we propose a multiple-model learning method for action recognition, specifically devised for challenging video footages in which actions are not explicitly visualized, but rather, only implicitly referred. We use such videos as stimuli and involve a large sample of subjects to collect a high-definition EEG and video dataset. Next, we employ collaborative learning in a multi-modal setting i.e., the EEG (teacher) model helps the video (student) model by distilling the knowledge (implicit meaning of visual stimuli) to it, sharply boosting the recognition performance. The goal of Unsupervised Domain Adaptation (UDA) methods is to use the labeled source together with the unlabeled target domain data to train a model that generalizes well on the target domain. In contrast, we cast UDA as a pseudo-label refinery problem in the challenging source-free scenario i.e., in cases where the source domain data is inaccessible during training. We propose Negative Ensemble Learning (NEL) technique, a unified method for adaptive noise filtering and progressive pseudo-label refinement. In particular, the ensemble members collaboratively learn with a Disjoint Set of Residual Labels, an outcome of the output prediction consensus, to refine the challenging noise associated with the inferred pseudo-labels. A single model trained with the refined pseudo-labels leads to superior performance on the target domain, without using source data samples at all. We conclude this dissertation with a method extending our previous study by incorporating Continual Learning in the Source-Free UDA. Our new method comprises of two stages: a Source-Free UDA pipeline based on pseudo-label refinement, and a procedure for extracting class-conditioned source-style images by leveraging the pre-trained source model. While stage 1 holds the same collaborative peculiarities, in stage 2, the collaboration exists in an indirect manner i.e., it is the source model that provides the only possibility to generate source-style synthetic images which eventually helps the final model in preserving good performance on both source and target domains. In each study, we consider heterogeneous CV tasks. Nevertheless, with an extensive pool of experiments on various benchmarks carrying diverse complexities and challenges, we show that the collaborative learning framework outperforms the related state-of-the-art methods by a considerable margin

    Radar and RGB-depth sensors for fall detection: a review

    Get PDF
    This paper reviews recent works in the literature on the use of systems based on radar and RGB-Depth (RGB-D) sensors for fall detection, and discusses outstanding research challenges and trends related to this research field. Systems to detect reliably fall events and promptly alert carers and first responders have gained significant interest in the past few years in order to address the societal issue of an increasing number of elderly people living alone, with the associated risk of them falling and the consequences in terms of health treatments, reduced well-being, and costs. The interest in radar and RGB-D sensors is related to their capability to enable contactless and non-intrusive monitoring, which is an advantage for practical deployment and users’ acceptance and compliance, compared with other sensor technologies, such as video-cameras, or wearables. Furthermore, the possibility of combining and fusing information from The heterogeneous types of sensors is expected to improve the overall performance of practical fall detection systems. Researchers from different fields can benefit from multidisciplinary knowledge and awareness of the latest developments in radar and RGB-D sensors that this paper is discussing
    corecore