Deep neural networks have delivered remarkable performance and have been
widely used in various visual tasks. However, their huge size causes
significant inconvenience for transmission and storage. Many previous studies
have explored model size compression. However, these studies often approach
various lossy and lossless compression methods in isolation, leading to
challenges in achieving high compression ratios efficiently. This work proposes
a post-training model size compression method that combines lossy and lossless
compression in a unified way. We first propose a unified parametric weight
transformation, which ensures different lossy compression methods can be
performed jointly in a post-training manner. Then, a dedicated differentiable
counter is introduced to guide the optimization of lossy compression to arrive
at a more suitable point for later lossless compression. Additionally, our
method can easily control a desired global compression ratio and allocate
adaptive ratios for different layers. Finally, our method can achieve a stable
10× compression ratio without sacrificing accuracy and a 20×
compression ratio with minor accuracy loss in a short time. Our code is
available at https://github.com/ModelTC/L2_Compression