33 research outputs found
Beyond Single Instance Multi-view Unsupervised Representation Learning
Recent unsupervised contrastive representation learning follows a Single
Instance Multi-view (SIM) paradigm where positive pairs are usually constructed
with intra-image data augmentation. In this paper, we propose an effective
approach called Beyond Single Instance Multi-view (BSIM). Specifically, we
impose more accurate instance discrimination capability by measuring the joint
similarity between two randomly sampled instances and their mixture, namely
spurious-positive pairs. We believe that learning joint similarity helps to
improve the performance when encoded features are distributed more evenly in
the latent space. We apply it as an orthogonal improvement for unsupervised
contrastive representation learning, including current outstanding methods
SimCLR, MoCo, and BYOL. We evaluate our learned representations on many
downstream benchmarks like linear classification on ImageNet-1k and PASCAL VOC
2007, object detection on MS COCO 2017 and VOC, etc. We obtain substantial
gains with a large margin almost on all these tasks compared with prior arts.Comment: A plug-in approach with minimal modification to existing methods
based on instance discriminatio
MoGA: Searching Beyond MobileNetV3
The evolution of MobileNets has laid a solid foundation for neural network
applications on mobile end. With the latest MobileNetV3, neural architecture
search again claimed its supremacy in network design. Unfortunately, till today
all mobile methods mainly focus on CPU latencies instead of GPU, the latter,
however, is much preferred in practice for it has faster speed, lower overhead
and less interference. Bearing the target hardware in mind, we propose the
first Mobile GPU-Aware (MoGA) neural architecture search in order to be
precisely tailored for real-world applications. Further, the ultimate objective
to devise a mobile network lies in achieving better performance by maximizing
the utilization of bounded resources. Urging higher capability while
restraining time consumption is not reconcilable. We alleviate the tension by
weighted evolution techniques. Moreover, we encourage increasing the number of
parameters for higher representational power. With 200x fewer GPU days than
MnasNet, we obtain a series of models that outperform MobileNetV3 under the
similar latency constraints, i.e., MoGA-A achieves 75.9% top-1 accuracy on
ImageNet, MoGA-B meets 75.5% which costs only 0.5 ms more on mobile GPU. MoGA-C
best attests GPU-awareness by reaching 75.3% and being slower on CPU but faster
on GPU.The models and test code is made available here
https://github.com/xiaomi-automl/MoGA.Comment: Accepted by ICASSP202
Norm Tweaking: High-performance Low-bit Quantization of Large Language Models
As the size of large language models (LLMs) continues to grow, model
compression without sacrificing accuracy has become a crucial challenge for
deployment. While some quantization methods, such as GPTQ, have made progress
in achieving acceptable 4-bit weight-only quantization, attempts at lower bit
quantization often result in severe performance degradation. In this paper, we
introduce a technique called norm tweaking, which can be used as a plugin in
current PTQ methods to achieve high precision while being cost-efficient. Our
approach is inspired by the observation that rectifying the quantized
activation distribution to match its float counterpart can readily restore
accuracy for LLMs. To achieve this, we carefully design a tweaking strategy
that includes calibration data generation and channel-wise distance constraint
to update the weights of normalization layers for better generalization. We
conduct extensive experiments on various datasets using several open-sourced
LLMs. Our method demonstrates significant improvements in both weight-only
quantization and joint quantization of weights and activations, surpassing
existing PTQ methods. On GLM-130B and OPT-66B, our method even achieves the
same level of accuracy at 2-bit quantization as their float ones. Our simple
and effective approach makes it more practical for real-world applications