10 research outputs found
DyLoRA: Parameter Efficient Tuning of Pre-trained Models using Dynamic Search-Free Low-Rank Adaptation
With the ever-growing size of pretrained models (PMs), fine-tuning them has
become more expensive and resource-hungry. As a remedy, low-rank adapters
(LoRA) keep the main pretrained weights of the model frozen and just introduce
some learnable truncated SVD modules (so-called LoRA blocks) to the model.
While LoRA blocks are parameter-efficient, they suffer from two major problems:
first, the size of these blocks is fixed and cannot be modified after training
(for example, if we need to change the rank of LoRA blocks, then we need to
re-train them from scratch); second, optimizing their rank requires an
exhaustive search and effort. In this work, we introduce a dynamic low-rank
adaptation (DyLoRA) technique to address these two problems together. Our
DyLoRA method trains LoRA blocks for a range of ranks instead of a single rank
by sorting the representation learned by the adapter module at different ranks
during training. We evaluate our solution on different natural language
understanding (GLUE benchmark) and language generation tasks (E2E, DART and
WebNLG) using different pretrained models such as RoBERTa and GPT with
different sizes. Our results show that we can train dynamic search-free models
with DyLoRA at least 4 to 7 times (depending to the task) faster than LoRA
without significantly compromising performance. Moreover, our models can
perform consistently well on a much larger range of ranks compared to LoRA.Comment: Accepted to EACL 202
Sorted LLaMA: Unlocking the Potential of Intermediate Layers of Large Language Models for Dynamic Inference Using Sorted Fine-Tuning (SoFT)
The rapid advancement of large language models (LLMs) has revolutionized
natural language processing (NLP). While these models excel at understanding
and generating human-like text, their widespread deployment can be
prohibitively expensive. SortedNet is a recent training technique for enabling
dynamic inference for deep neural networks. It leverages network modularity to
create sub-models with varying computational loads, sorting them based on
computation/accuracy characteristics in a nested manner. We extend SortedNet to
generative NLP tasks, making large language models dynamic without any
pretraining and by only replacing standard Supervised Fine-Tuning (SFT) with
Sorted Fine-Tuning (SoFT) at the same costs. Our approach boosts model
efficiency, eliminating the need for multiple models for various scenarios
during inference. We show that using this approach, we are able to unlock the
potential of intermediate layers of transformers in generating the target
output. Our sub-models remain integral components of the original model,
minimizing storage requirements and transition costs between different
computational/latency budgets. By applying this approach on LLaMa 2 13B for
tuning on the Stanford Alpaca dataset and comparing it to normal tuning and
early exit via PandaLM benchmark, we show that Sorted Fine-Tuning can deliver
models twice as fast as the original model while maintaining or exceeding
performance
SortedNet, a Place for Every Network and Every Network in its Place: Towards a Generalized Solution for Training Many-in-One Neural Networks
As the size of deep learning models continues to grow, finding optimal models
under memory and computation constraints becomes increasingly more important.
Although usually the architecture and constituent building blocks of neural
networks allow them to be used in a modular way, their training process is not
aware of this modularity. Consequently, conventional neural network training
lacks the flexibility to adapt the computational load of the model during
inference. This paper proposes SortedNet, a generalized and scalable solution
to harness the inherent modularity of deep neural networks across various
dimensions for efficient dynamic inference. Our training considers a nested
architecture for the sub-models with shared parameters and trains them together
with the main model in a sorted and probabilistic manner. This sorted training
of sub-networks enables us to scale the number of sub-networks to hundreds
using a single round of training. We utilize a novel updating scheme during
training that combines random sampling of sub-networks with gradient
accumulation to improve training efficiency. Furthermore, the sorted nature of
our training leads to a search-free sub-network selection at inference time;
and the nested architecture of the resulting sub-networks leads to minimal
storage requirement and efficient switching between sub-networks at inference.
Our general dynamic training approach is demonstrated across various
architectures and tasks, including large language models and pre-trained vision
models. Experimental results show the efficacy of the proposed approach in
achieving efficient sub-networks while outperforming state-of-the-art dynamic
training approaches. Our findings demonstrate the feasibility of training up to
160 different sub-models simultaneously, showcasing the extensive scalability
of our proposed method while maintaining 96% of the model performance
Effect of carboxymethyl cellulose edible coating containing Zataria multiflora essential oil and grape seed extract on chemical attributes of rainbow trout meat
Meat products, especially fish meat, are very susceptible to lipid oxidation and microbial spoilage. In this study, first, gas chromatography mass spectrometry (GC-MS) analysis of Zataria multiflora essential oil (ZEO) components was done and then two concentrations of ZEO, (1% and 2%) and two concentrations of grape seed extract (GSE), (0.5% and 1%) were used in carboxymethyl cellulose coating alone and in combination, and their antioxidant effects on rainbow trout meat were evaluated in a 20-day period using thiobarbituric acid reactive substances (TBARS) test. Their effects on total volatile basic nitrogen (TVBN) and pH were evaluated as well. The main components of ZEO are thymol and carvacrol. These components significantly decreased production of thio-barbituric acid (TBA), TVBN and pH level of fish meat. The initial pH, TVBN and TBA content was 6.62, 12.67 mg N per 100 g and 0.19 mg kg-1, respectively. In most treatments significant (p < 0.05) effects on aforementioned factors was seen during storage at 4 ˚C. The results indicated that use of ZEO and GSE as a natural antioxidant agents was effective in reducing undesirable chemical reactions in storage of fish meat
Investigating the Effects of Rural ICT Centers’ Services Quality on Customers’ Satisfaction (Case Study: Rural ICT Centers of Gillan)
Information & communication technology(ICT) is regarded as a means to rural sustainable development in order to reduce poverty, bridge the digital divide and prevent the migration of people from rural areas to cities. To achieve these goals and to deliver governmental services and other essential services needed by rural communities, ten thousand rural ICT centers with an investment of $ 280 million has been put to use by the government. The purpose of this study is to assess the dimensions of services’ quality of rural ICT centers in guilan using Parasuraman SERVQUAL model and its Impact on satisfaction. Using a descriptive correlational research method, 384 customers of these centers were selected randomly. SERVQUAL standard questionnaire was used to measure service quality dimensions and satisfaction questionnaires were used to measure customers satisfaction. To analyze the data, structural equation modeling was used. The results suggest that services’ quality dimentions were ranked in this order: reliability, empathy, guarantee, accountability,and tangibility. and satisfaction dimentions were rankd as satisfaction with personnel and total satisfaction with services. AMOS Software was used for data analysis and presenting of the results
Optimization of Turbine Blade Cooling Using Combined Cooling Techniques
This paper presents analysis and optimization of turbine bade cooling systems. Since the temperature of combustion gases is very high sometimes reaching 2400 K, the turbine blade cannot sustain the resulting thermal stress. Moreover, for higher efficiency for advanced gas turbines, increase of inlet temperature is needed. Common blade cooling methods are film cooling, convection cooling, impingement cooling and combined cooling. In this paper, a numerical solution of the thermal and flow fields in film cooling technique on the AGTB expand symmetrical turbine blade was obtained and the results were validated with experimental data. Then the turbine blade geometry was changed and two combined cooling (impingement/convection cooing and impingement/film cooling) techniques were evaluated. The low Reynolds number k-epsilon turbulence model (AKN) was used for the turbulent flow simulations at various blowing ratios for two blade thicknesses. Comparisons of the results between the available experimental and numerical data showed that the AKN model is capable of predicting the turbulent flow and heat transfer in turbine blade cooling. Combined techniques (impingement/convection cooling and impingement/film cooling) were also carried out and more cooling effectiveness and uniform temperature distribution were found than film cooling method only