43 research outputs found
One-for-All: Bridge the Gap Between Heterogeneous Architectures in Knowledge Distillation
Knowledge distillation~(KD) has proven to be a highly effective approach for
enhancing model performance through a teacher-student training scheme. However,
most existing distillation methods are designed under the assumption that the
teacher and student models belong to the same model family, particularly the
hint-based approaches. By using centered kernel alignment (CKA) to compare the
learned features between heterogeneous teacher and student models, we observe
significant feature divergence. This divergence illustrates the ineffectiveness
of previous hint-based methods in cross-architecture distillation. To tackle
the challenge in distilling heterogeneous models, we propose a simple yet
effective one-for-all KD framework called OFA-KD, which significantly improves
the distillation performance between heterogeneous architectures. Specifically,
we project intermediate features into an aligned latent space such as the
logits space, where architecture-specific information is discarded.
Additionally, we introduce an adaptive target enhancement scheme to prevent the
student from being disturbed by irrelevant information. Extensive experiments
with various architectures, including CNN, Transformer, and MLP, demonstrate
the superiority of our OFA-KD framework in enabling distillation between
heterogeneous architectures. Specifically, when equipped with our OFA-KD, the
student models achieve notable performance improvements, with a maximum gain of
8.0% on the CIFAR-100 dataset and 0.7% on the ImageNet-1K dataset. PyTorch code
and checkpoints can be found at https://github.com/Hao840/OFAKD
Evaluation of Magnetic Materials for Very High Frequency Power Applications
This paper investigates the loss characteristics of RF magnetic materials for power conversion applications in the 10 to 100 MHz range. A measurement method is proposed that provides a direct measurement of an inductor quality factor QL as a function of inductor current at RF frequencies, and enables indirect calculation of core loss as a function of flux density. Possible sources of error in measurement and calculation are evaluated and addressed. The proposed method is used to identify loss characteristics of several commercial RF magnetic-core materials. The loss characteristics of these materials, which have not previously been available, are illustrated and compared in tables and figures. The use of the method and data is demonstrated in the design of a magnetic-core inductor, which is applied in a 30-MHz inverter. The results of this paper are thus useful for the design of magnetic components for very high frequency applications.Sheila and Emanuel Landsman FoundationInterconnect Focus Center (United States. Defense Advanced Research Projects Agency and Semiconductor Research Corporation
Can Large Pre-trained Models Help Vision Models on Perception Tasks?
The recent upsurge in pre-trained large models (e.g. GPT-4) has swept across
the entire deep learning community. Such powerful large language models (LLMs)
demonstrate advanced generative ability and multimodal understanding
capability, which quickly achieve new state-of-the-art performances on a
variety of benchmarks. The pre-trained LLM usually plays the role as a
universal AI model that can conduct various tasks, including context reasoning,
article analysis and image content comprehension. However, considering the
prohibitively high memory and computational cost for implementing such a large
model, the conventional models (such as CNN and ViT), are still essential for
many visual perception tasks. In this paper, we propose to enhance the
representation ability of ordinary vision models for perception tasks (e.g.
image classification) by taking advantage of large pre-trained models. We
present a new learning paradigm in which the knowledge extracted from large
pre-trained models are utilized to help models like CNN and ViT learn enhanced
representations and achieve better performance. Firstly, we curate a high
quality description set by prompting a multimodal LLM to generate descriptive
text for all training images. Furthermore, we feed these detailed descriptions
into a pre-trained encoder to extract text embeddings with rich semantic
information that encodes the content of images. During training, text
embeddings will serve as extra supervising signals and be aligned with image
representations learned by vision models. The alignment process helps vision
models learn better and achieve higher accuracy with the assistance of
pre-trained LLMs. We conduct extensive experiments to verify that the proposed
algorithm consistently improves the performance for various vision models with
heterogeneous architectures.Comment: 9 pages, 5 figure
A Survey on Transformer Compression
Transformer plays a vital role in the realms of natural language processing
(NLP) and computer vision (CV), specially for constructing large language
models (LLM) and large vision models (LVM). Model compression methods reduce
the memory and computational cost of Transformer, which is a necessary step to
implement large language/vision models on practical devices. Given the unique
architecture of Transformer, featuring alternative attention and feedforward
neural network (FFN) modules, specific compression techniques are usually
required. The efficiency of these compression methods is also paramount, as
retraining large models on the entire training dataset is usually impractical.
This survey provides a comprehensive review of recent compression methods, with
a specific focus on their application to Transformer-based models. The
compression methods are primarily categorized into pruning, quantization,
knowledge distillation, and efficient architecture design (Mamba, RetNet, RWKV,
etc.). In each category, we discuss compression methods for both language and
vision tasks, highlighting common underlying principles. Finally, we delve into
the relation between various compression methods, and discuss further
directions in this domain.Comment: Model Compression, Transformer, Large Language Model, Large Vision
Model, LL
Efficient Vision Transformers via Fine-Grained Manifold Distillation
This paper studies the model compression problem of vision transformers.
Benefit from the self-attention module, transformer architectures have shown
extraordinary performance on many computer vision tasks. Although the network
performance is boosted, transformers are often required more computational
resources including memory usage and the inference complexity. Compared with
the existing knowledge distillation approaches, we propose to excavate useful
information from the teacher transformer through the relationship between
images and the divided patches. We then explore an efficient fine-grained
manifold distillation approach that simultaneously calculates cross-images,
cross-patch, and random-selected manifolds in teacher and student models.
Experimental results conducted on several benchmarks demonstrate the
superiority of the proposed algorithm for distilling portable transformer
models with higher performance. For example, our approach achieves 75.06% Top-1
accuracy on the ImageNet-1k dataset for training a DeiT-Tiny model, which
outperforms other ViT distillation methods
SAM-DiffSR: Structure-Modulated Diffusion Model for Image Super-Resolution
Diffusion-based super-resolution (SR) models have recently garnered
significant attention due to their potent restoration capabilities. But
conventional diffusion models perform noise sampling from a single
distribution, constraining their ability to handle real-world scenes and
complex textures across semantic regions. With the success of segment anything
model (SAM), generating sufficiently fine-grained region masks can enhance the
detail recovery of diffusion-based SR model. However, directly integrating SAM
into SR models will result in much higher computational cost. In this paper, we
propose the SAM-DiffSR model, which can utilize the fine-grained structure
information from SAM in the process of sampling noise to improve the image
quality without additional computational cost during inference. In the process
of training, we encode structural position information into the segmentation
mask from SAM. Then the encoded mask is integrated into the forward diffusion
process by modulating it to the sampled noise. This adjustment allows us to
independently adapt the noise mean within each corresponding segmentation area.
The diffusion model is trained to estimate this modulated noise. Crucially, our
proposed framework does NOT change the reverse diffusion process and does NOT
require SAM at inference. Experimental results demonstrate the effectiveness of
our proposed method, showcasing superior performance in suppressing artifacts,
and surpassing existing diffusion-based methods by 0.74 dB at the maximum in
terms of PSNR on DIV2K dataset. The code and dataset are available at
https://github.com/lose4578/SAM-DiffSR
LightCLIP: Learning Multi-Level Interaction for Lightweight Vision-Language Models
Vision-language pre-training like CLIP has shown promising performance on
various downstream tasks such as zero-shot image classification and image-text
retrieval. Most of the existing CLIP-alike works usually adopt relatively large
image encoders like ResNet50 and ViT, while the lightweight counterparts are
rarely discussed. In this paper, we propose a multi-level interaction paradigm
for training lightweight CLIP models. Firstly, to mitigate the problem that
some image-text pairs are not strictly one-to-one correspondence, we improve
the conventional global instance-level alignment objective by softening the
label of negative samples progressively. Secondly, a relaxed bipartite matching
based token-level alignment objective is introduced for finer-grained alignment
between image patches and textual words. Moreover, based on the observation
that the accuracy of CLIP model does not increase correspondingly as the
parameters of text encoder increase, an extra objective of masked language
modeling (MLM) is leveraged for maximizing the potential of the shortened text
encoder. In practice, an auxiliary fusion module injecting unmasked image
embedding into masked text embedding at different network stages is proposed
for enhancing the MLM. Extensive experiments show that without introducing
additional computational cost during inference, the proposed method achieves a
higher performance on multiple downstream tasks
Circuits and passive components for radio-frequency power conversion
Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2010.Cataloged from PDF version of thesis.Includes bibliographical references (p. 141-149).This thesis focuses on developing technology for high efficiency power converters operating at very high frequencies. The work in the thesis involves two aspects of such converters: rf (radio-frequency) power circuit design techniques and magnetic material characterization and application. In the aspect of circuit design techniques, the thesis investigates a new class of matching networks that overcomes a major limitation of rf converter circuits - their high sensitivity to loading condition. These networks, which are termed resistance compression networks, serve to substantially decrease the variation in effective resistance seen by a tuned rf inverter as loading conditions change. The thesis explores the operation, performance characteristics, and design of these networks, and present experimental results demonstrating their performance. The thesis also presents analysis and design considerations for lumped (inductor and capacitor) matching networks operating at high efficiency (> 95%). Formulas for calculating matching network efficiency are given and used to evaluate the optimum number of matching stages as a function of conversion ratio. Both simulation and experimental results are presented that validate the analytical formulation. In the aspect of magnetic materials and applications, the thesis investigates the loss characteristics of several commercial rf magnetic materials for power conversion applications in the 10 MHz to 100 MHz range.(cont.) A measurement method is proposed to identify loss characteristics of different commercial rf magnetic core materials. The loss characteristics of these materials, which have not previously been available, are illustrated and compared in tables and figures. Based on results in characterization of magnetic materials, the thesis describes a procedure for magnetic components design with low permeability magnetic materials that is for very high frequency power conversion applications. This procedure provides a method to compare and evaluate different magnetic materials for given specifications of a magnetic-core inductor. Some important information, e.g. quality factor and size of the inductor can be predicted before the final design. The thesis also investigates some problems such as optimization of a magnetic-core inductor.by Yehui Han.Ph.D