39 research outputs found
Spectral–Spatial Classification of Hyperspectral Imagery with 3D Convolutional Neural Network
Recent research has shown that using spectral–spatial information can considerably improve the performance of hyperspectral image (HSI) classification. HSI data is typically presented in the format of 3D cubes. Thus, 3D spatial filtering naturally offers a simple and effective method for simultaneously extracting the spectral–spatial features within such images. In this paper, a 3D convolutional neural network (3D-CNN) framework is proposed for accurate HSI classification. The proposed method views the HSI cube data altogether without relying on any preprocessing or post-processing, extracting the deep spectral–spatial-combined features effectively. In addition, it requires fewer parameters than other deep learning-based methods. Thus, the model is lighter, less likely to over-fit, and easier to train. For comparison and validation, we test the proposed method along with three other deep learning-based HSI classification methods—namely, stacked autoencoder (SAE), deep brief network (DBN), and 2D-CNN-based methods—on three real-world HSI datasets captured by different sensors. Experimental results demonstrate that our 3D-CNN-based method outperforms these state-of-the-art methods and sets a new record
NAR-Former V2: Rethinking Transformer for Universal Neural Network Representation Learning
As more deep learning models are being applied in real-world applications,
there is a growing need for modeling and learning the representations of neural
networks themselves. An efficient representation can be used to predict target
attributes of networks without the need for actual training and deployment
procedures, facilitating efficient network deployment and design. Recently,
inspired by the success of Transformer, some Transformer-based representation
learning frameworks have been proposed and achieved promising performance in
handling cell-structured models. However, graph neural network (GNN) based
approaches still dominate the field of learning representation for the entire
network. In this paper, we revisit Transformer and compare it with GNN to
analyse their different architecture characteristics. We then propose a
modified Transformer-based universal neural network representation learning
model NAR-Former V2. It can learn efficient representations from both
cell-structured networks and entire networks. Specifically, we first take the
network as a graph and design a straightforward tokenizer to encode the network
into a sequence. Then, we incorporate the inductive representation learning
capability of GNN into Transformer, enabling Transformer to generalize better
when encountering unseen architecture. Additionally, we introduce a series of
simple yet effective modifications to enhance the ability of the Transformer in
learning representation from graph structures. Our proposed method surpasses
the GNN-based method NNLP by a significant margin in latency estimation on the
NNLQP dataset. Furthermore, regarding accuracy prediction on the NASBench101
and NASBench201 datasets, our method achieves highly comparable performance to
other state-of-the-art methods.Comment: 9 pages, 2 figures, 6 tables. Code is available at
https://github.com/yuny220/NAR-Former-V
Fast-ParC: Position Aware Global Kernel for ConvNets and ViTs
Transformer models have made tremendous progress in various fields in recent
years. In the field of computer vision, vision transformers (ViTs) also become
strong alternatives to convolutional neural networks (ConvNets), yet they have
not been able to replace ConvNets since both have their own merits. For
instance, ViTs are good at extracting global features with attention mechanisms
while ConvNets are more efficient in modeling local relationships due to their
strong inductive bias. A natural idea that arises is to combine the strengths
of both ConvNets and ViTs to design new structures. In this paper, we propose a
new basic neural network operator named position-aware circular convolution
(ParC) and its accelerated version Fast-ParC. The ParC operator can capture
global features by using a global kernel and circular convolution while keeping
location sensitiveness by employing position embeddings. Our Fast-ParC further
reduces the O(n2) time complexity of ParC to O(n log n) using Fast Fourier
Transform. This acceleration makes it possible to use global convolution in the
early stages of models with large feature maps, yet still maintains the overall
computational cost comparable with using 3x3 or 7x7 kernels. The proposed
operation can be used in a plug-and-play manner to 1) convert ViTs to
pure-ConvNet architecture to enjoy wider hardware support and achieve higher
inference speed; 2) replacing traditional convolutions in the deep stage of
ConvNets to improve accuracy by enlarging the effective receptive field.
Experiment results show that our ParC op can effectively enlarge the receptive
field of traditional ConvNets, and adopting the proposed op benefits both ViTs
and ConvNet models on all three popular vision tasks, image classification,
objectComment: 19 pages, 8 figures, 11 tables. A preliminary version of this paper
has been published in ECCV 2022 and it can be find in arXiv:2203.0395
Teacher Agent: A Non-Knowledge Distillation Method for Rehearsal-based Video Incremental Learning
With the rise in popularity of video-based social media, new categories of
videos are constantly being generated, creating an urgent need for robust
incremental learning techniques for video understanding. One of the biggest
challenges in this task is catastrophic forgetting, where the network tends to
forget previously learned data while learning new categories. To overcome this
issue, knowledge distillation is a widely used technique for rehearsal-based
video incremental learning that involves transferring important information on
similarities among different categories to enhance the student model.
Therefore, it is preferable to have a strong teacher model to guide the
students. However, the limited performance of the network itself and the
occurrence of catastrophic forgetting can result in the teacher network making
inaccurate predictions for some memory exemplars, ultimately limiting the
student network's performance. Based on these observations, we propose a
teacher agent capable of generating stable and accurate soft labels to replace
the output of the teacher model. This method circumvents the problem of
knowledge misleading caused by inaccurate predictions of the teacher model and
avoids the computational overhead of loading the teacher model for knowledge
distillation. Extensive experiments demonstrate the advantages of our method,
yielding significant performance improvements while utilizing only half the
resolution of video clips in the incremental phases as input compared to recent
state-of-the-art methods. Moreover, our method surpasses the performance of
joint training when employing four times the number of samples in episodic
memory.Comment: Under review; Do We Really Need Knowledge Distillation for
Class-incremental Video Learning