60 research outputs found
Towards Optimal Discrete Online Hashing with Balanced Similarity
When facing large-scale image datasets, online hashing serves as a promising
solution for online retrieval and prediction tasks. It encodes the online
streaming data into compact binary codes, and simultaneously updates the hash
functions to renew codes of the existing dataset. To this end, the existing
methods update hash functions solely based on the new data batch, without
investigating the correlation between such new data and the existing dataset.
In addition, existing works update the hash functions using a relaxation
process in its corresponding approximated continuous space. And it remains as
an open problem to directly apply discrete optimizations in online hashing. In
this paper, we propose a novel supervised online hashing method, termed
Balanced Similarity for Online Discrete Hashing (BSODH), to solve the above
problems in a unified framework. BSODH employs a well-designed hashing
algorithm to preserve the similarity between the streaming data and the
existing dataset via an asymmetric graph regularization. We further identify
the "data-imbalance" problem brought by the constructed asymmetric graph, which
restricts the application of discrete optimization in our problem. Therefore, a
novel balanced similarity is further proposed, which uses two equilibrium
factors to balance the similar and dissimilar weights and eventually enables
the usage of discrete optimizations. Extensive experiments conducted on three
widely-used benchmarks demonstrate the advantages of the proposed method over
the state-of-the-art methods.Comment: 8 pages, 11 figures, conferenc
SMMix: Self-Motivated Image Mixing for Vision Transformers
CutMix is a vital augmentation strategy that determines the performance and
generalization ability of vision transformers (ViTs). However, the
inconsistency between the mixed images and the corresponding labels harms its
efficacy. Existing CutMix variants tackle this problem by generating more
consistent mixed images or more precise mixed labels, but inevitably introduce
heavy training overhead or require extra information, undermining ease of use.
To this end, we propose an efficient and effective Self-Motivated image Mixing
method (SMMix), which motivates both image and label enhancement by the model
under training itself. Specifically, we propose a max-min attention region
mixing approach that enriches the attention-focused objects in the mixed
images. Then, we introduce a fine-grained label assignment technique that
co-trains the output tokens of mixed images with fine-grained supervision.
Moreover, we devise a novel feature consistency constraint to align features
from mixed and unmixed images. Due to the subtle designs of the self-motivated
paradigm, our SMMix is significant in its smaller training overhead and better
performance than other CutMix variants. In particular, SMMix improves the
accuracy of DeiT-T/S, CaiT-XXS-24/36, and PVT-T/S/M/L by more than +1% on
ImageNet-1k. The generalization capability of our method is also demonstrated
on downstream tasks and out-of-distribution datasets. Code of this project is
available at https://github.com/ChenMnZ/SMMix
I&S-ViT: An Inclusive & Stable Method for Pushing the Limit of Post-Training ViTs Quantization
Albeit the scalable performance of vision transformers (ViTs), the dense
computational costs (training & inference) undermine their position in
industrial applications. Post-training quantization (PTQ), tuning ViTs with a
tiny dataset and running in a low-bit format, well addresses the cost issue but
unluckily bears more performance drops in lower-bit cases. In this paper, we
introduce I&S-ViT, a novel method that regulates the PTQ of ViTs in an
inclusive and stable fashion. I&S-ViT first identifies two issues in the PTQ of
ViTs: (1) Quantization inefficiency in the prevalent log2 quantizer for
post-Softmax activations; (2) Rugged and magnified loss landscape in
coarse-grained quantization granularity for post-LayerNorm activations. Then,
I&S-ViT addresses these issues by introducing: (1) A novel shift-uniform-log2
quantizer (SULQ) that incorporates a shift mechanism followed by uniform
quantization to achieve both an inclusive domain representation and accurate
distribution approximation; (2) A three-stage smooth optimization strategy
(SOS) that amalgamates the strengths of channel-wise and layer-wise
quantization to enable stable learning. Comprehensive evaluations across
diverse vision tasks validate I&S-ViT' superiority over existing PTQ of ViTs
methods, particularly in low-bit scenarios. For instance, I&S-ViT elevates the
performance of 3-bit ViT-B by an impressive 50.68%
Representation Disparity-aware Distillation for 3D Object Detection
In this paper, we focus on developing knowledge distillation (KD) for compact
3D detectors. We observe that off-the-shelf KD methods manifest their efficacy
only when the teacher model and student counterpart share similar intermediate
feature representations. This might explain why they are less effective in
building extreme-compact 3D detectors where significant representation
disparity arises due primarily to the intrinsic sparsity and irregularity in 3D
point clouds. This paper presents a novel representation disparity-aware
distillation (RDD) method to address the representation disparity issue and
reduce performance gap between compact students and over-parameterized
teachers. This is accomplished by building our RDD from an innovative
perspective of information bottleneck (IB), which can effectively minimize the
disparity of proposal region pairs from student and teacher in features and
logits. Extensive experiments are performed to demonstrate the superiority of
our RDD over existing KD methods. For example, our RDD increases mAP of
CP-Voxel-S to 57.1% on nuScenes dataset, which even surpasses teacher
performance while taking up only 42% FLOPs.Comment: Accepted by ICCV2023. arXiv admin note: text overlap with
arXiv:2205.15156 by other author
Spatial Re-parameterization for N:M Sparsity
This paper presents a Spatial Re-parameterization (SpRe) method for the N:M
sparsity in CNNs. SpRe is stemmed from an observation regarding the restricted
variety in spatial sparsity present in N:M sparsity compared with unstructured
sparsity. Particularly, N:M sparsity exhibits a fixed sparsity rate within the
spatial domains due to its distinctive pattern that mandates N non-zero
components among M successive weights in the input channel dimension of
convolution filters. On the contrary, we observe that unstructured sparsity
displays a substantial divergence in sparsity across the spatial domains, which
we experimentally verified to be very crucial for its robust performance
retention compared with N:M sparsity. Therefore, SpRe employs the
spatial-sparsity distribution of unstructured sparsity to assign an extra
branch in conjunction with the original N:M branch at training time, which
allows the N:M sparse network to sustain a similar distribution of spatial
sparsity with unstructured sparsity. During inference, the extra branch can be
further re-parameterized into the main N:M branch, without exerting any
distortion on the sparse pattern or additional computation costs. SpRe has
achieved a commendable feat by matching the performance of N:M sparsity methods
with state-of-the-art unstructured sparsity methods across various benchmarks.
Code and models are anonymously available at
\url{https://github.com/zyxxmu/SpRe}.Comment: 11 pages, 4 figure
Knowledge Condensation Distillation
Knowledge Distillation (KD) transfers the knowledge from a high-capacity
teacher network to strengthen a smaller student. Existing methods focus on
excavating the knowledge hints and transferring the whole knowledge to the
student. However, the knowledge redundancy arises since the knowledge shows
different values to the student at different learning stages. In this paper, we
propose Knowledge Condensation Distillation (KCD). Specifically, the knowledge
value on each sample is dynamically estimated, based on which an
Expectation-Maximization (EM) framework is forged to iteratively condense a
compact knowledge set from the teacher to guide the student learning. Our
approach is easy to build on top of the off-the-shelf KD methods, with no extra
training parameters and negligible computation overhead. Thus, it presents one
new perspective for KD, in which the student that actively identifies teacher's
knowledge in line with its aptitude can learn to learn more effectively and
efficiently. Experiments on standard benchmarks manifest that the proposed KCD
can well boost the performance of student model with even higher distillation
efficiency. Code is available at https://github.com/dzy3/KCD.Comment: ECCV202
SiMaN: Sign-to-Magnitude Network Binarization
Binary neural networks (BNNs) have attracted broad research interest due to
their efficient storage and computational ability. Nevertheless, a significant
challenge of BNNs lies in handling discrete constraints while ensuring bit
entropy maximization, which typically makes their weight optimization very
difficult. Existing methods relax the learning using the sign function, which
simply encodes positive weights into +1s, and -1s otherwise. Alternatively, we
formulate an angle alignment objective to constrain the weight binarization to
{0,+1} to solve the challenge. In this paper, we show that our weight
binarization provides an analytical solution by encoding high-magnitude weights
into +1s, and 0s otherwise. Therefore, a high-quality discrete solution is
established in a computationally efficient manner without the sign function. We
prove that the learned weights of binarized networks roughly follow a Laplacian
distribution that does not allow entropy maximization, and further demonstrate
that it can be effectively solved by simply removing the
regularization during network training. Our method, dubbed sign-to-magnitude
network binarization (SiMaN), is evaluated on CIFAR-10 and ImageNet,
demonstrating its superiority over the sign-based state-of-the-arts. Our source
code, experimental settings, training logs and binary models are available at
https://github.com/lmbxmu/SiMaN
- …