68 research outputs found

    X-ModalNet: A Semi-Supervised Deep Cross-Modal Network for Classification of Remote Sensing Data

    Get PDF
    This paper addresses the problem of semi-supervised transfer learning with limited cross-modality data in remote sensing. A large amount of multi-modal earth observation images, such as multispectral imagery (MSI) or synthetic aperture radar (SAR) data, are openly available on a global scale, enabling parsing global urban scenes through remote sensing imagery. However, their ability in identifying materials (pixel-wise classification) remains limited, due to the noisy collection environment and poor discriminative information as well as limited number of well-annotated training images. To this end, we propose a novel cross-modal deep-learning framework, called X-ModalNet, with three well-designed modules: self-adversarial module, interactive learning module, and label propagation module, by learning to transfer more discriminative information from a small-scale hyperspectral image (HSI) into the classification task using a large-scale MSI or SAR data. Significantly, X-ModalNet generalizes well, owing to propagating labels on an updatable graph constructed by high-level features on the top of the network, yielding semi-supervised cross-modality learning. We evaluate X-ModalNet on two multi-modal remote sensing datasets (HSI-MSI and HSI-SAR) and achieve a significant improvement in comparison with several state-of-the-art methods

    A Review of Landcover Classification with Very-High Resolution Remotely Sensed Optical Images—Analysis Unit, Model Scalability and Transferability

    Get PDF
    As an important application in remote sensing, landcover classification remains one of the most challenging tasks in very-high-resolution (VHR) image analysis. As the rapidly increasing number of Deep Learning (DL) based landcover methods and training strategies are claimed to be the state-of-the-art, the already fragmented technical landscape of landcover mapping methods has been further complicated. Although there exists a plethora of literature review work attempting to guide researchers in making an informed choice of landcover mapping methods, the articles either focus on the review of applications in a specific area or revolve around general deep learning models, which lack a systematic view of the ever advancing landcover mapping methods. In addition, issues related to training samples and model transferability have become more critical than ever in an era dominated by data-driven approaches, but these issues were addressed to a lesser extent in previous review articles regarding remote sensing classification. Therefore, in this paper, we present a systematic overview of existing methods by starting from learning methods and varying basic analysis units for landcover mapping tasks, to challenges and solutions on three aspects of scalability and transferability with a remote sensing classification focus including (1) sparsity and imbalance of data; (2) domain gaps across different geographical regions; and (3) multi-source and multi-view fusion. We discuss in detail each of these categorical methods and draw concluding remarks in these developments and recommend potential directions for the continued endeavor

    A Review of Landcover Classification with Very-High Resolution Remotely Sensed Optical Images—Analysis Unit, Model Scalability and Transferability

    Get PDF
    As an important application in remote sensing, landcover classification remains one of the most challenging tasks in very-high-resolution (VHR) image analysis. As the rapidly increasing number of Deep Learning (DL) based landcover methods and training strategies are claimed to be the state-of-the-art, the already fragmented technical landscape of landcover mapping methods has been further complicated. Although there exists a plethora of literature review work attempting to guide researchers in making an informed choice of landcover mapping methods, the articles either focus on the review of applications in a specific area or revolve around general deep learning models, which lack a systematic view of the ever advancing landcover mapping methods. In addition, issues related to training samples and model transferability have become more critical than ever in an era dominated by data-driven approaches, but these issues were addressed to a lesser extent in previous review articles regarding remote sensing classification. Therefore, in this paper, we present a systematic overview of existing methods by starting from learning methods and varying basic analysis units for landcover mapping tasks, to challenges and solutions on three aspects of scalability and transferability with a remote sensing classification focus including (1) sparsity and imbalance of data; (2) domain gaps across different geographical regions; and (3) multi-source and multi-view fusion. We discuss in detail each of these categorical methods and draw concluding remarks in these developments and recommend potential directions for the continued endeavor

    Deep multitask learning with label interdependency distillation for multicriteria street-level image classification

    Get PDF
    Multitask learning (MTL) aims at beneficial joint solving of multiple prediction problems by sharing information across different tasks. However, without adequate consideration of interdependencies, MTL models are prone to miss valuable information. In this paper, we introduce a novel deep MTL architecture that specifically encodes cross-task interdependencies within the setting of multiple image classification problems. Based on task-wise interim class label probability predictions by an intermediately supervised hard parameter sharing convolutional neural network, interdependencies are inferred in two ways: i) by directly stacking label probability sequences to the image feature vector (i.e., multitask stacking), and ii) by passing probability sequences to gated recurrent unit-based recurrent neural networks to explicitly learn cross-task interdependency representations and stacking those to the image feature vector (i.e., interdependency representation learning). The proposed MTL architecture is applied as a tool for generic multi-criteria building characterization using street-level imagery related to risk assessments toward multiple natural hazards. Experimental results for classifying buildings according to five vulnerability-related target variables (i.e., five learning tasks), namely height, lateral load-resisting system material, seismic building structural type, roof shape, and block position are obtained for the Chilean capital Santiago de Chile. Our MTL methods with cross-task label interdependency modeling consistently outperform single task learning (STL) and classical hard parameter sharing MTL alike. Even when starting already from high classification accuracy levels, estimated generalization capabilities can be further improved by considerable margins of accumulated task-specific residuals beyond +6% Îş. Thereby, the combination of multitask stacking and interdependency representation learning attains the highest accuracy estimates for the addressed task and data setting (up to cross-task accuracy mean values of 88.43% overall accuracy and 84.49% Îş). From an efficiency perspective, the proposed MTL methods turn out to be substantially favorable compared to STL in terms of training time consumption

    Building Detection from Very High Resolution Remotely Sensed Imagery Using Deep Neural Networks

    Get PDF
    The past decades have witnessed a significant change in human societies with a fast pace and rapid urbanization. The boom of urbanization is contributed by the influx of people to the urban area and comes with building construction and deconstruction. The estimation of both residential and industrial buildings is important to reveal and demonstrate the human activities of the regions. As a result, it is essential to effectively and accurately detect the buildings in urban areas for urban planning and population monitoring. The automatic building detection method in remote sensing has always been a challenging task, because small targets cannot be identified in images with low resolution, as well as the complexity in the various scales, structure, and colours of urban buildings. However, the development of techniques improves the performance of the building detection task, by taking advantage of the accessibility of very high-resolution (VHR) remotely sensed images and the innovation of object detection methods. The purpose of this study is to develop a framework for the automatic detection of urban buildings from the VHR remotely sensed imagery at a large scale by using the state-of-art deep learning network. The thesis addresses the research gaps and difficulties as well as the achievements in building detection. The conventional hand-crafted methods, machine learning methods, and deep learning methods are reviewed and discussed. The proposed method employs a deep convolutional neural network (CNN) for building detection. Two input datasets with different spatial resolutions were used to train and validate the CNN model, and a testing dataset was used to evaluate the performance of the proposed building detection method. The experiment result indicates that the proposed method performs well at both building detection and outline segmentation task with a total precision of 0.92, a recall of 0.866, an F1-score of 0.891. In conclusion, this study proves the feasibility of CNN on solving building detection challenges using VHR remotely sensed imagery

    Analysis of sensorimotor rhythms based on lower-limbs motor imagery for brain-computer interface

    Get PDF
    Over recent years significant advancements in the field of assistive technologies have been observed. One of the paramount needs for the development and advancement that urged researchers to contribute in the field other than congenital or diagnosed chronic disorders, is the rising number of affectees from accidents, natural calamity (due to climate change), or warfare, worldwide resulting in spinal cord injuries (SCI), neural disorder, or amputation (interception) of limbs, that impede a human to live a normal life. In addition to this, more than ten million people in the world are living with some form of handicap due to the central nervous system (CNS) disorder, which is precarious. Biomedical devices for rehabilitation are the center of research focus for many years. For people with lost motor control, or amputation, but unscathed sensory control, instigation of control signals from the source, i.e. electrophysiological signals, is vital for seamless control of assistive biomedical devices. Control signals, i.e. motion intentions, arouse    in the sensorimotor cortex of the brain that can be detected using invasive or non-invasive modality. With non-invasive modality, the electroencephalography (EEG) is used to record these motion intentions encoded in electrical activity of the cortex, and are deciphered to recognize user intent for locomotion. They are further transferred to the actuator, or end effector of the assistive device for control purposes. This can be executed via the brain-computer interface (BCI) technology. BCI is an emerging research field that establishes a real-time bidirectional connection between the human brain and a computer/output device. Amongst its diverse applications, neurorehabilitation to deliver sensory feedback and brain controlled biomedical devices for rehabilitation are most popular. While substantial literature on control of upper-limb assistive technologies controlled via BCI is there, less is known about the lower-limb (LL) control of biomedical devices for navigation or gait assistance via BCI. The types  of EEG signals compatible with an independent BCI are the oscillatory/sensorimotor rhythms (SMR) and event-related potential (ERP). These signals have successfully been used in BCIs for navigation control of assistive devices. However, ERP paradigm accounts for a voluminous setup for stimulus presentation to the user during operation of BCI assistive device. Contrary to this, the SMR does not require large setup for activation of cortical activity; it instead depends on the motor imagery (MI) that is produced synchronously or asynchronously by the user. MI is a covert cognitive process also termed kinaesthetic motor imagery (KMI) and elicits clearly after rigorous training trials, in form of event-related desynchronization (ERD) or synchronization (ERS), depending on imagery activity or resting period. It usually comprises of limb movement tasks, but is not limited to it in a BCI paradigm. In order to produce detectable features that correlate to the user¿s intent, selection of cognitive task is an important aspect to improve the performance of a BCI. MI used in BCI predominantly remains associated with the upper- limbs, particularly hands, due to the somatotopic organization of the motor cortex. The hand representation area is substantially large, in contrast to the anatomical location of the LL representation areas in the human sensorimotor cortex. The LL area is located within the interhemispheric fissure, i.e. between the mesial walls of both hemispheres of the cortex. This makes it arduous to detect EEG features prompted upon imagination of LL. Detailed investigation of the ERD/ERS in the mu and beta oscillatory rhythms during left and right LL KMI tasks is required, as the user¿s intent to walk is of paramount importance associated to everyday activity. This is an important area of research, followed by the improvisation of the already existing rehabilitation system that serves the LL affectees. Though challenging, solution to these issues is also imperative for the development of robust controllers that follow the asynchronous BCI paradigms to operate LL assistive devices seamlessly. This thesis focusses on the investigation of cortical lateralization of ERD/ERS in the SMR, based on foot dorsiflexion KMI and knee extension KMI separately. This research infers the possibility to deploy these features in real-time BCI by finding maximum possible classification accuracy from the machine learning (ML) models. EEG signal is non-stationary, as it is characterized by individual-to-individual and trial-to-trial variability, and a low signal-to-noise ratio (SNR), which is challenging. They are high in dimension with relatively low number of samples available for fitting ML models to the data. These factors account for ML methods that were developed into the tool of choice  to analyse single-trial EEG data. Hence, the selection of appropriate ML model for true detection of class label with no tradeoff of overfitting is crucial. The feature extraction part of the thesis constituted of testing the band-power (BP) and the common spatial pattern (CSP) methods individually. The study focused on the synchronous BCI paradigm. This was to ensure the exhibition of SMR for the possibility of a practically viable control system in a BCI. For the left vs. right foot KMI, the objective was to distinguish the bilateral tasks, in order to use them as unilateral commands in a 2-class BCI for controlling/navigating a robotic/prosthetic LL for rehabilitation. Similar was the approach for left-right knee KMI. The research was based on four main experimental studies. In addition to the four studies, the research is also inclusive of the comparison of intra-cognitive tasks within the same limb, i.e. left foot vs. left knee and right foot vs. right knee tasks, respectively (Chapter 4). This added to another novel contribution towards the findings based on comparison of different tasks within the same LL. It provides basis to increase the dimensionality of control signals within one BCI paradigm, such as a BCI-controlled LL assistive device with multiple degrees of freedom (DOF) for restoration of locomotion function. This study was based on analysis of statistically significant mu ERD feature using BP feature extraction method. The first stage of this research comprised of the left vs. right foot KMI tasks, wherein the ERD/ERS that elicited in the mu-beta rhythms were analysed using BP feature extraction method (Chapter 5). Three individual features, i.e. mu ERD, beta ERD, and beta ERS were investigated on EEG topography and time-frequency (TF) maps, and average time course of power percentage, using the common average reference and bipolar reference methods. A comparative study was drawn for both references to infer the optimal method. This was followed by ML, i.e. classification of the three feature vectors (mu ERD, beta ERD, and beta ERS), using linear discriminant analysis (LDA), support vector machine (SVM), and k-nearest neighbour (KNN) algorithms, separately. Finally, the multiple correction statistical tests were done, in order to predict maximum possible classification accuracy amongst all paradigms for the most significant feature. All classifier models were supported with the statistical techniques of k-fold cross validation and evaluation of area under receiver-operator characteristic curves (AUC-ROC) for prediction of the true class label. The highest classification accuracy of 83.4% ± 6.72 was obtained with KNN model for beta ERS feature. The next study was based on enhancing the classification accuracy obtained from previous study. It was based on using similar cognitive tasks as study in Chapter 5, however deploying different methodology for feature extraction and classification procedure. In the second study, ERD/ERS from mu and beta rhythms were extracted using CSP and filter bank common spatial pattern (FBCSP) algorithms, to optimize the individual spatial patterns (Chapter 6). This was followed by ML process, for which the supervised logistic regression (Logreg) and LDA were deployed separately. Maximum classification accuracy resulted in 77.5% ± 4.23 with FBCSP feature vector and LDA model, with a maximum kappa coefficient of 0.55 that is in the moderate range of agreement between the two classes. The left vs. right foot discrimination results were nearly same, however the BP feature vector performed better than CSP. The third stage was based on the deployment of novel cognitive task of left vs. right knee extension KMI. Analysis of the ERD/ERS in the mu-beta rhythms was done for verification of cortical lateralization via BP feature vector (Chapter 7). Similar to Chapter 5, in this study the analysis of ERD/ERS features was done on the EEG topography and TF maps, followed by the determination of average time course and peak latency of feature occurrence. However, for this study, only mu ERD and beta ERS features were taken into consideration and the EEG recording method only comprised of common average reference. This was due to the established results from the foot study earlier, in Chapter 5, where beta ERD features showed less average amplitude. The LDA and KNN classification algorithms were employed. Unexpectedly, the left vs. right knee KMI reflected the highest accuracy of 81.04% ± 7.5 and an AUC-ROC = 0.84, strong enough to be used in a real-time BCI as two independent control features. This was using KNN model for beta ERS feature. The final study of this research followed the same paradigm as used in Chapter 6, but for left vs. right knee KMI cognitive task (Chapter 8). Primarily this study aimed at enhancing the resulting accuracy from Chapter 7, using CSP and FBCSP methods with Logreg and LDA models respectively. Results were in accordance with those of the already established foot KMI study, i.e. BP feature vector performed better than the CSP. Highest classification accuracy of 70.00% ± 2.85 with kappa score of 0.40 was obtained with Logreg using FBCSP feature vector. Results stipulated the utilization of ERD/ERS in mu and beta bands, as independent control features for discrimination of bilateral foot or the novel bilateral knee KMI tasks. Resulting classification accuracies implicate that any 2-class BCI, employing unilateral foot, or knee KMI, is suitable for real-time implementation. In conclusion, this thesis demonstrates the possible EEG pre-processing, feature extraction and classification methods to instigate a real-time BCI from the conducted studies. Following this, the critical aspects of latency in information transfer rate, SNR, and tradeoff between dimensionality and overfitting needs to be taken care of, during design of real-time BCI controller. It also highlights that there is a need for consensus over the development of standardized methods of cognitive tasks for MI based BCI. Finally, the application of wireless EEG for portable assistance is essential as it will contribute to lay the foundations of the development of independent asynchronous BCI based on SMR

    Enabling Artificial Intelligence Analytics on The Edge

    Get PDF
    This thesis introduces a novel distributed model for handling in real-time, edge-based video analytics. The novelty of the model relies on decoupling and distributing the services into several decomposed functions, creating virtual function chains (V F C model). The model considers both computational and communication constraints. Theoretical, simulation and experimental results have shown that the V F C model can enable the support of heavy-load services to an edge environment while improving the footprint of the service compared to state-of-the art frameworks. In detail, results on the V F C model have shown that it can reduce the total edge cost, compared with a monolithic and a simple frame distribution models. For experimenting on a real-case scenario, a testbed edge environment has been developed, where the aforementioned models, as well as a general distribution framework (Apache Spark ©), have been deployed. A cloud service has also been considered. Experiments have shown that V F C can outperform all alternative approaches, by reducing operational cost and improving the QoS. Finally, a migration model, a caching model and a QoS monitoring service based on Long-Term-Short-Term models are introduced
    • …
    corecore