136 research outputs found
I&S-ViT: An Inclusive & Stable Method for Pushing the Limit of Post-Training ViTs Quantization
Albeit the scalable performance of vision transformers (ViTs), the dense
computational costs (training & inference) undermine their position in
industrial applications. Post-training quantization (PTQ), tuning ViTs with a
tiny dataset and running in a low-bit format, well addresses the cost issue but
unluckily bears more performance drops in lower-bit cases. In this paper, we
introduce I&S-ViT, a novel method that regulates the PTQ of ViTs in an
inclusive and stable fashion. I&S-ViT first identifies two issues in the PTQ of
ViTs: (1) Quantization inefficiency in the prevalent log2 quantizer for
post-Softmax activations; (2) Rugged and magnified loss landscape in
coarse-grained quantization granularity for post-LayerNorm activations. Then,
I&S-ViT addresses these issues by introducing: (1) A novel shift-uniform-log2
quantizer (SULQ) that incorporates a shift mechanism followed by uniform
quantization to achieve both an inclusive domain representation and accurate
distribution approximation; (2) A three-stage smooth optimization strategy
(SOS) that amalgamates the strengths of channel-wise and layer-wise
quantization to enable stable learning. Comprehensive evaluations across
diverse vision tasks validate I&S-ViT' superiority over existing PTQ of ViTs
methods, particularly in low-bit scenarios. For instance, I&S-ViT elevates the
performance of 3-bit ViT-B by an impressive 50.68%
FAITHSCORE: Evaluating Hallucinations in Large Vision-Language Models
We introduce FAITHSCORE (Faithfulness to Atomic Image Facts Score), a
reference-free and fine-grained evaluation metric that measures the
faithfulness of the generated free-form answers from large vision-language
models (LVLMs). The FAITHSCORE evaluation first identifies sub-sentences
containing descriptive statements that need to be verified, then extracts a
comprehensive list of atomic facts from these sub-sentences, and finally
conducts consistency verification between fine-grained atomic facts and the
input image. Meta-evaluation demonstrates that our metric highly correlates
with human judgments of faithfulness. We collect two benchmark datasets (i.e.
LLaVA-1k and MSCOCO-Cap) for evaluating LVLMs instruction-following
hallucinations. We measure hallucinations in state-of-the-art LVLMs with
FAITHSCORE on the datasets. Results reveal that current systems are prone to
generate hallucinated content unfaithful to the image, which leaves room for
future improvements. Further, we find that current LVLMs despite doing well on
color and counting, still struggle with long answers, relations, and multiple
objects
Spatial Re-parameterization for N:M Sparsity
This paper presents a Spatial Re-parameterization (SpRe) method for the N:M
sparsity in CNNs. SpRe is stemmed from an observation regarding the restricted
variety in spatial sparsity present in N:M sparsity compared with unstructured
sparsity. Particularly, N:M sparsity exhibits a fixed sparsity rate within the
spatial domains due to its distinctive pattern that mandates N non-zero
components among M successive weights in the input channel dimension of
convolution filters. On the contrary, we observe that unstructured sparsity
displays a substantial divergence in sparsity across the spatial domains, which
we experimentally verified to be very crucial for its robust performance
retention compared with N:M sparsity. Therefore, SpRe employs the
spatial-sparsity distribution of unstructured sparsity to assign an extra
branch in conjunction with the original N:M branch at training time, which
allows the N:M sparse network to sustain a similar distribution of spatial
sparsity with unstructured sparsity. During inference, the extra branch can be
further re-parameterized into the main N:M branch, without exerting any
distortion on the sparse pattern or additional computation costs. SpRe has
achieved a commendable feat by matching the performance of N:M sparsity methods
with state-of-the-art unstructured sparsity methods across various benchmarks.
Code and models are anonymously available at
\url{https://github.com/zyxxmu/SpRe}.Comment: 11 pages, 4 figure
SMMix: Self-Motivated Image Mixing for Vision Transformers
CutMix is a vital augmentation strategy that determines the performance and
generalization ability of vision transformers (ViTs). However, the
inconsistency between the mixed images and the corresponding labels harms its
efficacy. Existing CutMix variants tackle this problem by generating more
consistent mixed images or more precise mixed labels, but inevitably introduce
heavy training overhead or require extra information, undermining ease of use.
To this end, we propose an efficient and effective Self-Motivated image Mixing
method (SMMix), which motivates both image and label enhancement by the model
under training itself. Specifically, we propose a max-min attention region
mixing approach that enriches the attention-focused objects in the mixed
images. Then, we introduce a fine-grained label assignment technique that
co-trains the output tokens of mixed images with fine-grained supervision.
Moreover, we devise a novel feature consistency constraint to align features
from mixed and unmixed images. Due to the subtle designs of the self-motivated
paradigm, our SMMix is significant in its smaller training overhead and better
performance than other CutMix variants. In particular, SMMix improves the
accuracy of DeiT-T/S, CaiT-XXS-24/36, and PVT-T/S/M/L by more than +1% on
ImageNet-1k. The generalization capability of our method is also demonstrated
on downstream tasks and out-of-distribution datasets. Code of this project is
available at https://github.com/ChenMnZ/SMMix
MultiQuant: A Novel Multi-Branch Topology Method for Arbitrary Bit-width Network Quantization
Arbitrary bit-width network quantization has received significant attention
due to its high adaptability to various bit-width requirements during runtime.
However, in this paper, we investigate existing methods and observe a
significant accumulation of quantization errors caused by frequent bit-width
switching of weights and activations, leading to limited performance. To
address this issue, we propose MultiQuant, a novel method that utilizes a
multi-branch topology for arbitrary bit-width quantization. MultiQuant
duplicates the network body into multiple independent branches and quantizes
the weights of each branch to a fixed 2-bit while retaining the input
activations in the expected bit-width. This approach maintains the
computational cost as the same while avoiding the switching of weight
bit-widths, thereby substantially reducing errors in weight quantization.
Additionally, we introduce an amortization branch selection strategy to
distribute quantization errors caused by activation bit-width switching among
branches to enhance performance. Finally, we design an in-place distillation
strategy that facilitates guidance between branches to further enhance
MultiQuant's performance. Extensive experiments demonstrate that MultiQuant
achieves significant performance gains compared to existing arbitrary bit-width
quantization methods. Code is at \url{https://github.com/zysxmu/MultiQuant}
Fine-grained Data Distribution Alignment for Post-Training Quantization
While post-training quantization receives popularity mostly due to its
evasion in accessing the original complete training dataset, its poor
performance also stems from scarce images. To alleviate this limitation, in
this paper, we leverage the synthetic data introduced by zero-shot quantization
with calibration dataset and propose a fine-grained data distribution alignment
(FDDA) method to boost the performance of post-training quantization. The
method is based on two important properties of batch normalization statistics
(BNS) we observed in deep layers of the trained network, (i.e.), inter-class
separation and intra-class incohesion. To preserve this fine-grained
distribution information: 1) We calculate the per-class BNS of the calibration
dataset as the BNS centers of each class and propose a BNS-centralized loss to
force the synthetic data distributions of different classes to be close to
their own centers. 2) We add Gaussian noise into the centers to imitate the
incohesion and propose a BNS-distorted loss to force the synthetic data
distribution of the same class to be close to the distorted centers. By
utilizing these two fine-grained losses, our method manifests the
state-of-the-art performance on ImageNet, especially when both the first and
last layers are quantized to the low-bit. Code is at
\url{https://github.com/zysxmu/FDDA}.Comment: ECCV202
Simultaneous evolutionary expansion and constraint of genomic heterogeneity in multifocal lung cancer.
Recent genomic analyses have revealed substantial tumor heterogeneity across various cancers. However, it remains unclear whether and how genomic heterogeneity is constrained during tumor evolution. Here, we sequence a unique cohort of multiple synchronous lung cancers (MSLCs) to determine the relative diversity and uniformity of genetic drivers upon identical germline and environmental background. We find that each multicentric primary tumor harbors distinct oncogenic alterations, including novel mutations that are experimentally demonstrated to be functional and therapeutically targetable. However, functional studies show a strikingly constrained tumorigenic pathway underlying heterogeneous genetic variants. These results suggest that although the mutation-specific routes that cells take during oncogenesis are stochastic, genetic trajectories may be constrained by selection for functional convergence on key signaling pathways. Our findings highlight the robust evolutionary pressures that simultaneously shape the expansion and constraint of genomic diversity, a principle that holds important implications for understanding tumor evolution and optimizing therapeutic strategies.Across cancer types tumor heterogeneity has been observed, but how this relates to tumor evolution is unclear. Here, the authors sequence multiple synchronous lung cancers, highlighting the evolutionary pressures that simultaneously shape the expansion and constraint of genomic heterogeneity
DiffRate : Differentiable Compression Rate for Efficient Vision Transformers
Token compression aims to speed up large-scale vision transformers (e.g.
ViTs) by pruning (dropping) or merging tokens. It is an important but
challenging task. Although recent advanced approaches achieved great success,
they need to carefully handcraft a compression rate (i.e. number of tokens to
remove), which is tedious and leads to sub-optimal performance. To tackle this
problem, we propose Differentiable Compression Rate (DiffRate), a novel token
compression method that has several appealing properties prior arts do not
have. First, DiffRate enables propagating the loss function's gradient onto the
compression ratio, which is considered as a non-differentiable hyperparameter
in previous work. In this case, different layers can automatically learn
different compression rates layer-wisely without extra overhead. Second, token
pruning and merging can be naturally performed simultaneously in DiffRate,
while they were isolated in previous works. Third, extensive experiments
demonstrate that DiffRate achieves state-of-the-art performance. For example,
by applying the learned layer-wise compression rates to an off-the-shelf ViT-H
(MAE) model, we achieve a 40% FLOPs reduction and a 1.5x throughput
improvement, with a minor accuracy drop of 0.16% on ImageNet without
fine-tuning, even outperforming previous methods with fine-tuning. Codes and
models are available at https://github.com/OpenGVLab/DiffRate.Comment: 16 pages, 8 figures, 13 table
OmniQuant: Omnidirectionally Calibrated Quantization for Large Language Models
Large language models (LLMs) have revolutionized natural language processing
tasks. However, their practical deployment is hindered by their immense memory
and computation requirements. Although recent post-training quantization (PTQ)
methods are effective in reducing memory footprint and improving the
computational efficiency of LLM, they hand-craft quantization parameters, which
leads to low performance and fails to deal with extremely low-bit quantization.
To tackle this issue, we introduce an Omnidirectionally calibrated Quantization
(OmniQuant) technique for LLMs, which achieves good performance in diverse
quantization settings while maintaining the computational efficiency of PTQ by
efficiently optimizing various quantization parameters. OmniQuant comprises two
innovative components including Learnable Weight Clipping (LWC) and Learnable
Equivalent Transformation (LET). LWC modulates the extreme values of weights by
optimizing the clipping threshold. Meanwhile, LET tackles activation outliers
by shifting the challenge of quantization from activations to weights through a
learnable equivalent transformation. Operating within a differentiable
framework using block-wise error minimization, OmniQuant can optimize the
quantization process efficiently for both weight-only and weight-activation
quantization. For instance, the LLaMA-2 model family with the size of 7-70B can
be processed with OmniQuant on a single A100-40G GPU within 1-16 hours using
128 samples. Extensive experiments validate OmniQuant's superior performance
across diverse quantization configurations such as W4A4, W6A6, W4A16, W3A16,
and W2A16. Additionally, OmniQuant demonstrates effectiveness in
instruction-tuned models and delivers notable improvements in inference speed
and memory reduction on real devices. Codes and models are available at
\url{https://github.com/OpenGVLab/OmniQuant}.Comment: Updated result with 2-bit quantization. A differentiable quantization
method for LL
- …