31 research outputs found
From Quantum Graph Computing to Quantum Graph Learning: A Survey
Quantum computing (QC) is a new computational paradigm whose foundations
relate to quantum physics. Notable progress has been made, driving the birth of
a series of quantum-based algorithms that take advantage of quantum
computational power. In this paper, we provide a targeted survey of the
development of QC for graph-related tasks. We first elaborate the correlations
between quantum mechanics and graph theory to show that quantum computers are
able to generate useful solutions that can not be produced by classical systems
efficiently for some problems related to graphs. For its practicability and
wide-applicability, we give a brief review of typical graph learning techniques
designed for various tasks. Inspired by these powerful methods, we note that
advanced quantum algorithms have been proposed for characterizing the graph
structures. We give a snapshot of quantum graph learning where expectations
serve as a catalyst for subsequent research. We further discuss the challenges
of using quantum algorithms in graph learning, and future directions towards
more flexible and versatile quantum graph learning solvers
Can Large Pre-trained Models Help Vision Models on Perception Tasks?
The recent upsurge in pre-trained large models (e.g. GPT-4) has swept across
the entire deep learning community. Such powerful large language models (LLMs)
demonstrate advanced generative ability and multimodal understanding
capability, which quickly achieve new state-of-the-art performances on a
variety of benchmarks. The pre-trained LLM usually plays the role as a
universal AI model that can conduct various tasks, including context reasoning,
article analysis and image content comprehension. However, considering the
prohibitively high memory and computational cost for implementing such a large
model, the conventional models (such as CNN and ViT), are still essential for
many visual perception tasks. In this paper, we propose to enhance the
representation ability of ordinary vision models for perception tasks (e.g.
image classification) by taking advantage of large pre-trained models. We
present a new learning paradigm in which the knowledge extracted from large
pre-trained models are utilized to help models like CNN and ViT learn enhanced
representations and achieve better performance. Firstly, we curate a high
quality description set by prompting a multimodal LLM to generate descriptive
text for all training images. Furthermore, we feed these detailed descriptions
into a pre-trained encoder to extract text embeddings with rich semantic
information that encodes the content of images. During training, text
embeddings will serve as extra supervising signals and be aligned with image
representations learned by vision models. The alignment process helps vision
models learn better and achieve higher accuracy with the assistance of
pre-trained LLMs. We conduct extensive experiments to verify that the proposed
algorithm consistently improves the performance for various vision models with
heterogeneous architectures.Comment: 9 pages, 5 figure
Efficient Vision Transformers via Fine-Grained Manifold Distillation
This paper studies the model compression problem of vision transformers.
Benefit from the self-attention module, transformer architectures have shown
extraordinary performance on many computer vision tasks. Although the network
performance is boosted, transformers are often required more computational
resources including memory usage and the inference complexity. Compared with
the existing knowledge distillation approaches, we propose to excavate useful
information from the teacher transformer through the relationship between
images and the divided patches. We then explore an efficient fine-grained
manifold distillation approach that simultaneously calculates cross-images,
cross-patch, and random-selected manifolds in teacher and student models.
Experimental results conducted on several benchmarks demonstrate the
superiority of the proposed algorithm for distilling portable transformer
models with higher performance. For example, our approach achieves 75.06% Top-1
accuracy on the ImageNet-1k dataset for training a DeiT-Tiny model, which
outperforms other ViT distillation methods
One-for-All: Bridge the Gap Between Heterogeneous Architectures in Knowledge Distillation
Knowledge distillation~(KD) has proven to be a highly effective approach for
enhancing model performance through a teacher-student training scheme. However,
most existing distillation methods are designed under the assumption that the
teacher and student models belong to the same model family, particularly the
hint-based approaches. By using centered kernel alignment (CKA) to compare the
learned features between heterogeneous teacher and student models, we observe
significant feature divergence. This divergence illustrates the ineffectiveness
of previous hint-based methods in cross-architecture distillation. To tackle
the challenge in distilling heterogeneous models, we propose a simple yet
effective one-for-all KD framework called OFA-KD, which significantly improves
the distillation performance between heterogeneous architectures. Specifically,
we project intermediate features into an aligned latent space such as the
logits space, where architecture-specific information is discarded.
Additionally, we introduce an adaptive target enhancement scheme to prevent the
student from being disturbed by irrelevant information. Extensive experiments
with various architectures, including CNN, Transformer, and MLP, demonstrate
the superiority of our OFA-KD framework in enabling distillation between
heterogeneous architectures. Specifically, when equipped with our OFA-KD, the
student models achieve notable performance improvements, with a maximum gain of
8.0% on the CIFAR-100 dataset and 0.7% on the ImageNet-1K dataset. PyTorch code
and checkpoints can be found at https://github.com/Hao840/OFAKD
Beyond Dropout: Feature Map Distortion to Regularize Deep Neural Networks
Deep neural networks often consist of a great number of trainable parameters
for extracting powerful features from given datasets. On one hand, massive
trainable parameters significantly enhance the performance of these deep
networks. On the other hand, they bring the problem of over-fitting. To this
end, dropout based methods disable some elements in the output feature maps
during the training phase for reducing the co-adaptation of neurons. Although
the generalization ability of the resulting models can be enhanced by these
approaches, the conventional binary dropout is not the optimal solution.
Therefore, we investigate the empirical Rademacher complexity related to
intermediate layers of deep neural networks and propose a feature distortion
method (Disout) for addressing the aforementioned problem. In the training
period, randomly selected elements in the feature maps will be replaced with
specific values by exploiting the generalization error bound. The superiority
of the proposed feature map distortion for producing deep neural network with
higher testing performance is analyzed and demonstrated on several benchmark
image datasets