43 research outputs found

    One-for-All: Bridge the Gap Between Heterogeneous Architectures in Knowledge Distillation

    Full text link
    Knowledge distillation~(KD) has proven to be a highly effective approach for enhancing model performance through a teacher-student training scheme. However, most existing distillation methods are designed under the assumption that the teacher and student models belong to the same model family, particularly the hint-based approaches. By using centered kernel alignment (CKA) to compare the learned features between heterogeneous teacher and student models, we observe significant feature divergence. This divergence illustrates the ineffectiveness of previous hint-based methods in cross-architecture distillation. To tackle the challenge in distilling heterogeneous models, we propose a simple yet effective one-for-all KD framework called OFA-KD, which significantly improves the distillation performance between heterogeneous architectures. Specifically, we project intermediate features into an aligned latent space such as the logits space, where architecture-specific information is discarded. Additionally, we introduce an adaptive target enhancement scheme to prevent the student from being disturbed by irrelevant information. Extensive experiments with various architectures, including CNN, Transformer, and MLP, demonstrate the superiority of our OFA-KD framework in enabling distillation between heterogeneous architectures. Specifically, when equipped with our OFA-KD, the student models achieve notable performance improvements, with a maximum gain of 8.0% on the CIFAR-100 dataset and 0.7% on the ImageNet-1K dataset. PyTorch code and checkpoints can be found at https://github.com/Hao840/OFAKD

    Evaluation of Magnetic Materials for Very High Frequency Power Applications

    Get PDF
    This paper investigates the loss characteristics of RF magnetic materials for power conversion applications in the 10 to 100 MHz range. A measurement method is proposed that provides a direct measurement of an inductor quality factor QL as a function of inductor current at RF frequencies, and enables indirect calculation of core loss as a function of flux density. Possible sources of error in measurement and calculation are evaluated and addressed. The proposed method is used to identify loss characteristics of several commercial RF magnetic-core materials. The loss characteristics of these materials, which have not previously been available, are illustrated and compared in tables and figures. The use of the method and data is demonstrated in the design of a magnetic-core inductor, which is applied in a 30-MHz inverter. The results of this paper are thus useful for the design of magnetic components for very high frequency applications.Sheila and Emanuel Landsman FoundationInterconnect Focus Center (United States. Defense Advanced Research Projects Agency and Semiconductor Research Corporation

    Can Large Pre-trained Models Help Vision Models on Perception Tasks?

    Full text link
    The recent upsurge in pre-trained large models (e.g. GPT-4) has swept across the entire deep learning community. Such powerful large language models (LLMs) demonstrate advanced generative ability and multimodal understanding capability, which quickly achieve new state-of-the-art performances on a variety of benchmarks. The pre-trained LLM usually plays the role as a universal AI model that can conduct various tasks, including context reasoning, article analysis and image content comprehension. However, considering the prohibitively high memory and computational cost for implementing such a large model, the conventional models (such as CNN and ViT), are still essential for many visual perception tasks. In this paper, we propose to enhance the representation ability of ordinary vision models for perception tasks (e.g. image classification) by taking advantage of large pre-trained models. We present a new learning paradigm in which the knowledge extracted from large pre-trained models are utilized to help models like CNN and ViT learn enhanced representations and achieve better performance. Firstly, we curate a high quality description set by prompting a multimodal LLM to generate descriptive text for all training images. Furthermore, we feed these detailed descriptions into a pre-trained encoder to extract text embeddings with rich semantic information that encodes the content of images. During training, text embeddings will serve as extra supervising signals and be aligned with image representations learned by vision models. The alignment process helps vision models learn better and achieve higher accuracy with the assistance of pre-trained LLMs. We conduct extensive experiments to verify that the proposed algorithm consistently improves the performance for various vision models with heterogeneous architectures.Comment: 9 pages, 5 figure

    A Survey on Transformer Compression

    Full text link
    Transformer plays a vital role in the realms of natural language processing (NLP) and computer vision (CV), specially for constructing large language models (LLM) and large vision models (LVM). Model compression methods reduce the memory and computational cost of Transformer, which is a necessary step to implement large language/vision models on practical devices. Given the unique architecture of Transformer, featuring alternative attention and feedforward neural network (FFN) modules, specific compression techniques are usually required. The efficiency of these compression methods is also paramount, as retraining large models on the entire training dataset is usually impractical. This survey provides a comprehensive review of recent compression methods, with a specific focus on their application to Transformer-based models. The compression methods are primarily categorized into pruning, quantization, knowledge distillation, and efficient architecture design (Mamba, RetNet, RWKV, etc.). In each category, we discuss compression methods for both language and vision tasks, highlighting common underlying principles. Finally, we delve into the relation between various compression methods, and discuss further directions in this domain.Comment: Model Compression, Transformer, Large Language Model, Large Vision Model, LL

    Efficient Vision Transformers via Fine-Grained Manifold Distillation

    Full text link
    This paper studies the model compression problem of vision transformers. Benefit from the self-attention module, transformer architectures have shown extraordinary performance on many computer vision tasks. Although the network performance is boosted, transformers are often required more computational resources including memory usage and the inference complexity. Compared with the existing knowledge distillation approaches, we propose to excavate useful information from the teacher transformer through the relationship between images and the divided patches. We then explore an efficient fine-grained manifold distillation approach that simultaneously calculates cross-images, cross-patch, and random-selected manifolds in teacher and student models. Experimental results conducted on several benchmarks demonstrate the superiority of the proposed algorithm for distilling portable transformer models with higher performance. For example, our approach achieves 75.06% Top-1 accuracy on the ImageNet-1k dataset for training a DeiT-Tiny model, which outperforms other ViT distillation methods

    SAM-DiffSR: Structure-Modulated Diffusion Model for Image Super-Resolution

    Full text link
    Diffusion-based super-resolution (SR) models have recently garnered significant attention due to their potent restoration capabilities. But conventional diffusion models perform noise sampling from a single distribution, constraining their ability to handle real-world scenes and complex textures across semantic regions. With the success of segment anything model (SAM), generating sufficiently fine-grained region masks can enhance the detail recovery of diffusion-based SR model. However, directly integrating SAM into SR models will result in much higher computational cost. In this paper, we propose the SAM-DiffSR model, which can utilize the fine-grained structure information from SAM in the process of sampling noise to improve the image quality without additional computational cost during inference. In the process of training, we encode structural position information into the segmentation mask from SAM. Then the encoded mask is integrated into the forward diffusion process by modulating it to the sampled noise. This adjustment allows us to independently adapt the noise mean within each corresponding segmentation area. The diffusion model is trained to estimate this modulated noise. Crucially, our proposed framework does NOT change the reverse diffusion process and does NOT require SAM at inference. Experimental results demonstrate the effectiveness of our proposed method, showcasing superior performance in suppressing artifacts, and surpassing existing diffusion-based methods by 0.74 dB at the maximum in terms of PSNR on DIV2K dataset. The code and dataset are available at https://github.com/lose4578/SAM-DiffSR

    LightCLIP: Learning Multi-Level Interaction for Lightweight Vision-Language Models

    Full text link
    Vision-language pre-training like CLIP has shown promising performance on various downstream tasks such as zero-shot image classification and image-text retrieval. Most of the existing CLIP-alike works usually adopt relatively large image encoders like ResNet50 and ViT, while the lightweight counterparts are rarely discussed. In this paper, we propose a multi-level interaction paradigm for training lightweight CLIP models. Firstly, to mitigate the problem that some image-text pairs are not strictly one-to-one correspondence, we improve the conventional global instance-level alignment objective by softening the label of negative samples progressively. Secondly, a relaxed bipartite matching based token-level alignment objective is introduced for finer-grained alignment between image patches and textual words. Moreover, based on the observation that the accuracy of CLIP model does not increase correspondingly as the parameters of text encoder increase, an extra objective of masked language modeling (MLM) is leveraged for maximizing the potential of the shortened text encoder. In practice, an auxiliary fusion module injecting unmasked image embedding into masked text embedding at different network stages is proposed for enhancing the MLM. Extensive experiments show that without introducing additional computational cost during inference, the proposed method achieves a higher performance on multiple downstream tasks

    Circuits and passive components for radio-frequency power conversion

    Get PDF
    Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2010.Cataloged from PDF version of thesis.Includes bibliographical references (p. 141-149).This thesis focuses on developing technology for high efficiency power converters operating at very high frequencies. The work in the thesis involves two aspects of such converters: rf (radio-frequency) power circuit design techniques and magnetic material characterization and application. In the aspect of circuit design techniques, the thesis investigates a new class of matching networks that overcomes a major limitation of rf converter circuits - their high sensitivity to loading condition. These networks, which are termed resistance compression networks, serve to substantially decrease the variation in effective resistance seen by a tuned rf inverter as loading conditions change. The thesis explores the operation, performance characteristics, and design of these networks, and present experimental results demonstrating their performance. The thesis also presents analysis and design considerations for lumped (inductor and capacitor) matching networks operating at high efficiency (> 95%). Formulas for calculating matching network efficiency are given and used to evaluate the optimum number of matching stages as a function of conversion ratio. Both simulation and experimental results are presented that validate the analytical formulation. In the aspect of magnetic materials and applications, the thesis investigates the loss characteristics of several commercial rf magnetic materials for power conversion applications in the 10 MHz to 100 MHz range.(cont.) A measurement method is proposed to identify loss characteristics of different commercial rf magnetic core materials. The loss characteristics of these materials, which have not previously been available, are illustrated and compared in tables and figures. Based on results in characterization of magnetic materials, the thesis describes a procedure for magnetic components design with low permeability magnetic materials that is for very high frequency power conversion applications. This procedure provides a method to compare and evaluate different magnetic materials for given specifications of a magnetic-core inductor. Some important information, e.g. quality factor and size of the inductor can be predicted before the final design. The thesis also investigates some problems such as optimization of a magnetic-core inductor.by Yehui Han.Ph.D
    corecore