Topology-aware efficient and transferable model compression using graph representation and reinforcement learning

Abstract

Deep neural networks (DNNs) have found widespread applications across many domains. However, deploying these models on devices with limited computational and storage capabilities, like mobile devices, poses significant challenges. Model compression, aiming to make these large models more efficient without significant performance loss, is an active research area. However, traditional model compression techniques often require expert knowledge and overlook the inherent structural information within DNNs. To address these challenges, this thesis proposes two novel techniques, Auto Graph Encoder-decoder Model Compression (AGMC) and Graph Neural Network with Reinforcement Learning (GNN-RL). AGMC and GNN-RL harness the power of graph neural networks (GNNs) and reinforcement learning (RL) to extract structural information from DNNs, modeled as computational graphs, and automatically derive efficient compression policies. These policies are then used to guide the model compression process, resulting in compact yet effective DNNs. AGMC combines a GNN-based DNN embedding mechanism with RL to learn and apply effective compression strategies. The results showcase the superiority of AGMC over traditional rule-based DNN embedding techniques, yielding improved performance and higher compression ratios. It outperforms both handcrafted and learning-based model compression approaches on over-parameterized and mobile-friendly DNNs. On over-parameterized DNNs like ResNet-56, our method surpasses previous state-of-the-art methods with higher accuracy. Furthermore, on compact DNNs like MobileNet-v2, AGMC achieves a higher compression ratio with minimal accuracy loss. GNN-RL extends this work by introducing a novel multi-stage graph embedding technique to capture DNN topologies, along with RL to determine an optimal compression policy. The effectiveness of GNN-RL is demonstrated on a diverse set of DNNs, including the ResNet family, VGG-16, MobileNet-v1/v2, and ShuffleNet. GNN-RL achieved competitive results, providing higher compression ratios with less fine-tuning, significantly reducing the computational resources required while maintaining outstanding model performance. These methods pave the way for more automated and efficient model compression, enabling the deployment of complex DNNs on resource-constrained devices

    Similar works