145 research outputs found
A Comprehensive Survey on Graph Neural Networks
Deep learning has revolutionized many machine learning tasks in recent years, ranging from image classification and video processing to speech recognition and natural language understanding. The data in these tasks are typically represented in the Euclidean space. However, there is an increasing number of applications, where data are generated from non-Euclidean domains and are represented as graphs with complex relationships and interdependency between objects. The complexity of graph data has imposed significant challenges on the existing machine learning algorithms. Recently, many studies on extending deep learning approaches for graph data have emerged. In this article, we provide a comprehensive overview of graph neural networks (GNNs) in data mining and machine learning fields. We propose a new taxonomy to divide the state-of-the-art GNNs into four categories, namely, recurrent GNNs, convolutional GNNs, graph autoencoders, and spatial-temporal GNNs. We further discuss the applications of GNNs across various domains and summarize the open-source codes, benchmark data sets, and model evaluation of GNNs. Finally, we propose potential research directions in this rapidly growing field
Image-free Classifier Injection for Zero-Shot Classification
Zero-shot learning models achieve remarkable results on image classification
for samples from classes that were not seen during training. However, such
models must be trained from scratch with specialised methods: therefore, access
to a training dataset is required when the need for zero-shot classification
arises. In this paper, we aim to equip pre-trained models with zero-shot
classification capabilities without the use of image data. We achieve this with
our proposed Image-free Classifier Injection with Semantics (ICIS) that injects
classifiers for new, unseen classes into pre-trained classification models in a
post-hoc fashion without relying on image data. Instead, the existing
classifier weights and simple class-wise descriptors, such as class names or
attributes, are used. ICIS has two encoder-decoder networks that learn to
reconstruct classifier weights from descriptors (and vice versa), exploiting
(cross-)reconstruction and cosine losses to regularise the decoding process.
Notably, ICIS can be cheaply trained and applied directly on top of pre-trained
classification models. Experiments on benchmark ZSL datasets show that ICIS
produces unseen classifier weights that achieve strong (generalised) zero-shot
classification performance. Code is available at
https://github.com/ExplainableML/ImageFreeZSL .Comment: Accepted at ICCV 202
Advancing Land Cover Mapping in Remote Sensing with Deep Learning
Automatic mapping of land cover in remote sensing data plays an increasingly significant role in several earth observation (EO) applications, such as sustainable development, autonomous agriculture, and urban planning. Due to the complexity of the real ground surface and environment, accurate classification of land cover types is facing many challenges. This thesis provides novel deep learning-based solutions to land cover mapping challenges such as how to deal with intricate objects and imbalanced classes in multi-spectral and high-spatial resolution remote sensing data.
The first work presents a novel model to learn richer multi-scale and global contextual representations in very high-resolution remote sensing images, namely the dense dilated convolutions' merging (DDCM) network. The proposed method is light-weighted, flexible and extendable, so that it can be used as a simple yet effective encoder and decoder module to address different classification and semantic mapping challenges. Intensive experiments on different benchmark remote sensing datasets demonstrate that the proposed method can achieve better performance but consume much fewer computation resources compared with other published methods.
Next, a novel graph model is developed for capturing long-range pixel dependencies in remote sensing images to improve land cover mapping. One key component in the method is the self-constructing graph (SCG) module that can effectively construct global context relations (latent graph structure) without requiring prior knowledge graphs. The proposed SCG-based models achieved competitive performance on different representative remote sensing datasets with faster training and lower computational cost compared to strong baseline models.
The third work introduces a new framework, namely the multi-view self-constructing graph (MSCG) network, to extend the vanilla SCG model to be able to capture multi-view context representations with rotation invariance to achieve improved segmentation performance. Meanwhile, a novel adaptive class weighting loss function is developed to alleviate the issue of class imbalance commonly found in EO datasets for semantic segmentation. Experiments on benchmark data demonstrate the proposed framework is computationally efficient and robust to produce improved segmentation results for imbalanced classes.
To address the key challenges in multi-modal land cover mapping of remote sensing data, namely, 'what', 'how' and 'where' to effectively fuse multi-source features and to efficiently learn optimal joint representations of different modalities, the last work presents a compact and scalable multi-modal deep learning framework (MultiModNet) based on two novel modules: the pyramid attention fusion module and the gated fusion unit. The proposed MultiModNet outperforms the strong baselines on two representative remote sensing datasets with fewer parameters and at a lower computational cost. Extensive ablation studies also validate the effectiveness and flexibility of the framework
Few-shot image classification : current status and research trends
Conventional image classification methods usually require a large number of training samples for the training model. However, in practical scenarios, the amount of available sample data is often insufficient, which easily leads to overfitting in network construction. Few-shot learning provides an effective solution to this problem and has been a hot research topic. This paper provides an intensive survey on the state-of-the-art techniques in image classification based on few-shot learning. According to the different deep learning mechanisms, the existing algorithms are di-vided into four categories: transfer learning based, meta-learning based, data augmentation based, and multimodal based methods. Transfer learning based methods transfer useful prior knowledge from the source domain to the target domain. Meta-learning based methods employ past prior knowledge to guide the learning of new tasks. Data augmentation based methods expand the amount of sample data with auxiliary information. Multimodal based methods use the information of the auxiliary modal to facilitate the implementation of image classification tasks. This paper also summarizes the few-shot image datasets available in the literature, and experimental results tested by some representative algorithms are provided to compare their performance and analyze their pros and cons. In addition, the application of existing research outcomes on few-shot image classification in different practical fields are discussed. Finally, a few future research directions are iden-tified. © 2022 by the authors. Licensee MDPI, Basel, Switzerland
GraphMAE2: A Decoding-Enhanced Masked Self-Supervised Graph Learner
Graph self-supervised learning (SSL), including contrastive and generative
approaches, offers great potential to address the fundamental challenge of
label scarcity in real-world graph data. Among both sets of graph SSL
techniques, the masked graph autoencoders (e.g., GraphMAE)--one type of
generative method--have recently produced promising results. The idea behind
this is to reconstruct the node features (or structures)--that are randomly
masked from the input--with the autoencoder architecture. However, the
performance of masked feature reconstruction naturally relies on the
discriminability of the input features and is usually vulnerable to disturbance
in the features. In this paper, we present a masked self-supervised learning
framework GraphMAE2 with the goal of overcoming this issue. The idea is to
impose regularization on feature reconstruction for graph SSL. Specifically, we
design the strategies of multi-view random re-mask decoding and latent
representation prediction to regularize the feature reconstruction. The
multi-view random re-mask decoding is to introduce randomness into
reconstruction in the feature space, while the latent representation prediction
is to enforce the reconstruction in the embedding space. Extensive experiments
show that GraphMAE2 can consistently generate top results on various public
datasets, including at least 2.45% improvements over state-of-the-art baselines
on ogbn-Papers100M with 111M nodes and 1.6B edges.Comment: Accepted to WWW'2
- …