10 research outputs found

    I-ViT: Integer-only Quantization for Efficient Vision Transformer Inference

    Full text link
    Vision Transformers (ViTs) have achieved state-of-the-art performance on various computer vision applications. These models, however, have considerable storage and computational overheads, making their deployment and efficient inference on edge devices challenging. Quantization is a promising approach to reducing model complexity; unfortunately, existing efforts to quantize ViTs are simulated quantization (aka fake quantization), which remains floating-point arithmetic during inference and thus contributes little to model acceleration. In this paper, we propose I-ViT, an integer-only quantization scheme for ViTs, to enable ViTs to perform the entire computational graph of inference with integer operations and bit-shifting and no floating-point operations. In I-ViT, linear operations (e.g., MatMul and Dense) follow the integer-only pipeline with dyadic arithmetic, and non-linear operations (e.g., Softmax, GELU, and LayerNorm) are approximated by the proposed light-weight integer-only arithmetic methods. In particular, I-ViT applies the proposed Shiftmax and ShiftGELU, which are designed to use integer bit-shifting to approximate the corresponding floating-point operations. We evaluate I-ViT on various benchmark models and the results show that integer-only INT8 quantization achieves comparable (or even higher) accuracy to the full-precision (FP) baseline. Furthermore, we utilize TVM for practical hardware deployment on the GPU's integer arithmetic units, achieving 3.72~4.11×\times inference speedup compared to the FP model

    PSAQ-ViT V2: Towards Accurate and General Data-Free Quantization for Vision Transformers

    Full text link
    Data-free quantization can potentially address data privacy and security concerns in model compression, and thus has been widely investigated. Recently, PSAQ-ViT designs a relative value metric, patch similarity, to generate data from pre-trained vision transformers (ViTs), achieving the first attempt at data-free quantization for ViTs. In this paper, we propose PSAQ-ViT V2, a more accurate and general data-free quantization framework for ViTs, built on top of PSAQ-ViT. More specifically, following the patch similarity metric in PSAQ-ViT, we introduce an adaptive teacher-student strategy, which facilitates the constant cyclic evolution of the generated samples and the quantized model (student) in a competitive and interactive fashion under the supervision of the full-precision model (teacher), thus significantly improving the accuracy of the quantized model. Moreover, without the auxiliary category guidance, we employ the task- and model-independent prior information, making the general-purpose scheme compatible with a broad range of vision tasks and models. Extensive experiments are conducted on various models on image classification, object detection, and semantic segmentation tasks, and PSAQ-ViT V2, with the naive quantization strategy and without access to real-world data, consistently achieves competitive results, showing potential as a powerful baseline on data-free quantization for ViTs. For instance, with Swin-S as the (backbone) model, 8-bit quantization reaches 82.13 top-1 accuracy on ImageNet, 50.9 box AP and 44.1 mask AP on COCO, and 47.2 mIoU on ADE20K. We hope that accurate and general PSAQ-ViT V2 can serve as a potential and practice solution in real-world applications involving sensitive data. Code is released and merged at: https://github.com/zkkli/PSAQ-ViT.Comment: Accepted by TNNLS 202

    Patch Similarity Aware Data-Free Quantization for Vision Transformers

    Full text link
    Vision transformers have recently gained great success on various computer vision tasks; nevertheless, their high model complexity makes it challenging to deploy on resource-constrained devices. Quantization is an effective approach to reduce model complexity, and data-free quantization, which can address data privacy and security concerns during model deployment, has received widespread interest. Unfortunately, all existing methods, such as BN regularization, were designed for convolutional neural networks and cannot be applied to vision transformers with significantly different model architectures. In this paper, we propose PSAQ-ViT, a Patch Similarity Aware data-free Quantization framework for Vision Transformers, to enable the generation of "realistic" samples based on the vision transformer's unique properties for calibrating the quantization parameters. Specifically, we analyze the self-attention module's properties and reveal a general difference (patch similarity) in its processing of Gaussian noise and real images. The above insights guide us to design a relative value metric to optimize the Gaussian noise to approximate the real images, which are then utilized to calibrate the quantization parameters. Extensive experiments and ablation studies are conducted on various benchmarks to validate the effectiveness of PSAQ-ViT, which can even outperform the real-data-driven methods.Comment: Accepted to ECCV 202

    Patch-wise Mixed-Precision Quantization of Vision Transformer

    Full text link
    As emerging hardware begins to support mixed bit-width arithmetic computation, mixed-precision quantization is widely used to reduce the complexity of neural networks. However, Vision Transformers (ViTs) require complex self-attention computation to guarantee the learning of powerful feature representations, which makes mixed-precision quantization of ViTs still challenging. In this paper, we propose a novel patch-wise mixed-precision quantization (PMQ) for efficient inference of ViTs. Specifically, we design a lightweight global metric, which is faster than existing methods, to measure the sensitivity of each component in ViTs to quantization errors. Moreover, we also introduce a pareto frontier approach to automatically allocate the optimal bit-precision according to the sensitivity. To further reduce the computational complexity of self-attention in inference stage, we propose a patch-wise module to reallocate bit-width of patches in each layer. Extensive experiments on the ImageNet dataset shows that our method greatly reduces the search cost and facilitates the application of mixed-precision quantization to ViTs

    BinaryViT: Towards Efficient and Accurate Binary Vision Transformers

    Full text link
    Vision Transformers (ViTs) have emerged as the fundamental architecture for most computer vision fields, but the considerable memory and computation costs hinders their application on resource-limited devices. As one of the most powerful compression methods, binarization reduces the computation of the neural network by quantizing the weights and activation values as ±\pm1. Although existing binarization methods have demonstrated excellent performance on Convolutional Neural Networks (CNNs), the full binarization of ViTs is still under-studied and suffering a significant performance drop. In this paper, we first argue empirically that the severe performance degradation is mainly caused by the weight oscillation in the binarization training and the information distortion in the activation of ViTs. Based on these analyses, we propose BinaryViT\textbf{BinaryViT}, an accurate full binarization scheme for ViTs, which pushes the quantization of ViTs to the limit. Specifically, we propose a novel gradient regularization scheme (GRS) for driving a bimodal distribution of the weights to reduce oscillation in binarization training. Moreover, we design an activation shift module (ASM) to adaptively tune the activation distribution to reduce the information distortion caused by binarization. Extensive experiments on ImageNet dataset show that our BinaryViT consistently surpasses the strong baseline by 2.05% and improve the accuracy of fully binarized ViTs to a usable level. Furthermore, our method achieves impressive savings of 16.2×\times and 17.7×\times in model size and OPs compared to the full-precision DeiT-S.Comment: We will be making some significant changes to the paper, including the title and methodology. We therefore wish to withdraw the paper for no

    Spectral purification improves monitoring accuracy of the comprehensive growth evaluation index for film-mulched winter wheat

    No full text
    In order to further improve the utility of unmanned aerial vehicle (UAV) remote-sensing for quickly and accurately monitoring the growth of winter wheat under film mulching, this study examined the treatments of ridge mulching, ridge–furrow full mulching, and flat cropping full mulching in winter wheat. Based on the fuzzy comprehensive evaluation (FCE) method, four agronomic parameters (leaf area index, above-ground biomass, plant height, and leaf chlorophyll content) were used to calculate the comprehensive growth evaluation index (CGEI) of the winter wheat, and 14 visible and near-infrared spectral indices were calculated using spectral purification technology to process the remote-sensing image data of winter wheat obtained by multispectral UAV. Four machine learning algorithms, partial least squares, support vector machines, random forests, and artificial neural network networks (ANN), were used to build the winter wheat growth monitoring model under film mulching, and accuracy evaluation and mapping of the spatial and temporal distribution of winter wheat growth status were carried out. The results showed that the CGEI of winter wheat under film mulching constructed using the FCE method could objectively and comprehensively evaluate the crop growth status. The accuracy of remote-sensing inversion of the CGEI based on the ANN model was higher than for the individual agronomic parameters, with a coefficient of determination of 0.75, a root mean square error of 8.40, and a mean absolute value error of 6.53. Spectral purification could eliminate the interference of background effects caused by mulching and soil, effectively improving the accuracy of the remote-sensing inversion of winter wheat under film mulching, with the best inversion effect achieved on the ridge–furrow full mulching area after spectral purification. The results of this study provide a theoretical reference for the use of UAV remote-sensing to monitor the growth status of winter wheat with film mulching

    Shackling Effect Induced Property Differences in Metallo-Supramolecular Polymers

    No full text
    We demonstrate here the synthesis of a novel class of metallo-supramolecular polymers with shackled structure, via the coordination of cyclic di­(bis-terpyridine-triphenyl ether ester) ligands with ruthenium­(II) ions. The constraint from the ring topology via the shackling of ligands provides novel properties to these metallo-supramolecular polymers, including the formation of dendritic crystals, red-shift of absorption bands in the UV–vis spectra from interchain charge-transfer transitions, and a typical flash-type memory behavior

    InternLM2 Technical Report

    Full text link
    The evolution of Large Language Models (LLMs) like ChatGPT and GPT-4 has sparked discussions on the advent of Artificial General Intelligence (AGI). However, replicating such advancements in open-source models has been challenging. This paper introduces InternLM2, an open-source LLM that outperforms its predecessors in comprehensive evaluations across 6 dimensions and 30 benchmarks, long-context modeling, and open-ended subjective evaluations through innovative pre-training and optimization techniques. The pre-training process of InternLM2 is meticulously detailed, highlighting the preparation of diverse data types including text, code, and long-context data. InternLM2 efficiently captures long-term dependencies, initially trained on 4k tokens before advancing to 32k tokens in pre-training and fine-tuning stages, exhibiting remarkable performance on the 200k ``Needle-in-a-Haystack" test. InternLM2 is further aligned using Supervised Fine-Tuning (SFT) and a novel Conditional Online Reinforcement Learning from Human Feedback (COOL RLHF) strategy that addresses conflicting human preferences and reward hacking. By releasing InternLM2 models in different training stages and model sizes, we provide the community with insights into the model's evolution
    corecore