Search CORE

61 research outputs found

MARS: Masked Automatic Ranks Selection in Tensor Decompositions

Author: Kodryan Maxim
Kropotov Dmitry
Vetrov Dmitry
Publication venue
Publication date: 18/06/2021
Field of study

Tensor decomposition methods are known to be efficient for compressing and accelerating neural networks. However, the problem of optimal decomposition structure determination is still not well studied while being quite important. Specifically, decomposition ranks present the crucial parameter controlling the compression-accuracy trade-off. In this paper, we introduce MARS -- a new efficient method for the automatic selection of ranks in general tensor decompositions. During training, the procedure learns binary masks over decomposition cores that "select" the optimal tensor structure. The learning is performed via relaxed maximum a posteriori (MAP) estimation in a specific Bayesian model. The proposed method achieves better results compared to previous works in various tasks

arXiv.org e-Print Archive

Recommended from our members

Tensor shape search for efficient compression of tensorized data and neural networks

Author: He Zichang
Liang William Jiahua
Loaiciga Hugo A
Solgi Ryan
Zhang Zheng
Publication venue: eScholarship, University of California
Publication date: 01/12/2023
Field of study

eScholarship - University of California

Low Rank Optimization for Efficient Deep Learning: Making A Balance between Compact Architecture and Fast Training

Author: Chen Zhangxin
Liu Yipeng
Ou Xinwei
Zhu Ce
Publication venue
Publication date: 21/03/2023
Field of study

Deep neural networks have achieved great success in many data processing applications. However, the high computational complexity and storage cost makes deep learning hard to be used on resource-constrained devices, and it is not environmental-friendly with much power cost. In this paper, we focus on low-rank optimization for efficient deep learning techniques. In the space domain, deep neural networks are compressed by low rank approximation of the network parameters, which directly reduces the storage requirement with a smaller number of network parameters. In the time domain, the network parameters can be trained in a few subspaces, which enables efficient training for fast convergence. The model compression in the spatial domain is summarized into three categories as pre-train, pre-set, and compression-aware methods, respectively. With a series of integrable techniques discussed, such as sparse pruning, quantization, and entropy coding, we can ensemble them in an integration framework with lower computational complexity and storage. Besides of summary of recent technical advances, we have two findings for motivating future works: one is that the effective rank outperforms other sparse measures for network compression. The other is a spatial and temporal balance for tensorized neural networks

arXiv.org e-Print Archive

Recommended from our members

Low-Rank Tensorized Neural Networks With Tensor Geometry Optimization

Author: Solgi Ryan
Publication venue: eScholarship, University of California
Publication date: 01/01/2024
Field of study

Deep neural networks have demonstrated significant achievements across various fields, yet their memory and time complexities present obstacles for implementing them on resource-constrained devices. Compressing deep neural networks using tensor decomposition can decrease both memory usage and computational costs. The performance of a low-rank tensorized network depends on the choices of hyperparameters including the tensor rank and geometry. Previous studies have concentrated on identifying optimal tensor ranks. This thesis studies the effect of tensor geometry used for folding data for low-rank tensor compression. It is demonstrated that tensor geometry significantly affects compression efficiency of the tensorized data and model parameters. Consequently, a novel mathematical formulation is developed to optimize tensor geometry. The tensor geometry optimization model is adopted for efficient deployment of low-rank neural networks. The presented tensor geometry optimization model is combinatorial and thus challenging to solve. Therefore, surrogate and relaxed versions of the model are developed and various methods including integer linear programming, graph optimization, and random search algorithms are applied to solve the presented optimization model. The proposed tensor geometry optimization achieved a notable reduction in both the memory and time complexities of neural networks while maintaining accuracy. The developed methods can be applied for hardware-software co-design of artificial intelligence (AI) accelerators particularly on resource-constrained devices

eScholarship - University of California

Quantization-Aware and Tensor-Compressed Training of Transformers for Natural Language Understanding

Author: Choudhary Samridhi
Kunzmann Siegfried
Yang Zi
Zhang Zheng
Publication venue
Publication date: 01/06/2023
Field of study

Fine-tuned transformer models have shown superior performances in many natural language tasks. However, the large model size prohibits deploying high-performance transformer models on resource-constrained devices. This paper proposes a quantization-aware tensor-compressed training approach to reduce the model size, arithmetic operations, and ultimately runtime latency of transformer-based models. We compress the embedding and linear layers of transformers into small low-rank tensor cores, which significantly reduces model parameters. A quantization-aware training with learnable scale factors is used to further obtain low-precision representations of the tensor-compressed models. The developed approach can be used for both end-to-end training and distillation-based training. To improve the convergence, a layer-by-layer distillation is applied to distill a quantized and tensor-compressed student model from a pre-trained transformer. The performance is demonstrated in two natural language understanding tasks, showing up to

63\times

compression ratio, little accuracy loss and remarkable inference and training speedup

arXiv.org e-Print Archive