88 research outputs found

    DyLoRA: Parameter Efficient Tuning of Pre-trained Models using Dynamic Search-Free Low-Rank Adaptation

    Full text link
    With the ever-growing size of pretrained models (PMs), fine-tuning them has become more expensive and resource-hungry. As a remedy, low-rank adapters (LoRA) keep the main pretrained weights of the model frozen and just introduce some learnable truncated SVD modules (so-called LoRA blocks) to the model. While LoRA blocks are parameter-efficient, they suffer from two major problems: first, the size of these blocks is fixed and cannot be modified after training (for example, if we need to change the rank of LoRA blocks, then we need to re-train them from scratch); second, optimizing their rank requires an exhaustive search and effort. In this work, we introduce a dynamic low-rank adaptation (DyLoRA) technique to address these two problems together. Our DyLoRA method trains LoRA blocks for a range of ranks instead of a single rank by sorting the representation learned by the adapter module at different ranks during training. We evaluate our solution on different natural language understanding (GLUE benchmark) and language generation tasks (E2E, DART and WebNLG) using different pretrained models such as RoBERTa and GPT with different sizes. Our results show that we can train dynamic search-free models with DyLoRA at least 4 to 7 times (depending to the task) faster than LoRA without significantly compromising performance. Moreover, our models can perform consistently well on a much larger range of ranks compared to LoRA.Comment: Accepted to EACL 202

    Continuation KD: Improved Knowledge Distillation through the Lens of Continuation Optimization

    Full text link
    Knowledge Distillation (KD) has been extensively used for natural language understanding (NLU) tasks to improve a small model's (a student) generalization by transferring the knowledge from a larger model (a teacher). Although KD methods achieve state-of-the-art performance in numerous settings, they suffer from several problems limiting their performance. It is shown in the literature that the capacity gap between the teacher and the student networks can make KD ineffective. Additionally, existing KD techniques do not mitigate the noise in the teacher's output: modeling the noisy behaviour of the teacher can distract the student from learning more useful features. We propose a new KD method that addresses these problems and facilitates the training compared to previous techniques. Inspired by continuation optimization, we design a training procedure that optimizes the highly non-convex KD objective by starting with the smoothed version of this objective and making it more complex as the training proceeds. Our method (Continuation-KD) achieves state-of-the-art performance across various compact architectures on NLU (GLUE benchmark) and computer vision tasks (CIFAR-10 and CIFAR-100).Comment: Published at EMNLP 2022 (Findings

    Sorted LLaMA: Unlocking the Potential of Intermediate Layers of Large Language Models for Dynamic Inference Using Sorted Fine-Tuning (SoFT)

    Full text link
    The rapid advancement of large language models (LLMs) has revolutionized natural language processing (NLP). While these models excel at understanding and generating human-like text, their widespread deployment can be prohibitively expensive. SortedNet is a recent training technique for enabling dynamic inference for deep neural networks. It leverages network modularity to create sub-models with varying computational loads, sorting them based on computation/accuracy characteristics in a nested manner. We extend SortedNet to generative NLP tasks, making large language models dynamic without any pretraining and by only replacing standard Supervised Fine-Tuning (SFT) with Sorted Fine-Tuning (SoFT) at the same costs. Our approach boosts model efficiency, eliminating the need for multiple models for various scenarios during inference. We show that using this approach, we are able to unlock the potential of intermediate layers of transformers in generating the target output. Our sub-models remain integral components of the original model, minimizing storage requirements and transition costs between different computational/latency budgets. By applying this approach on LLaMa 2 13B for tuning on the Stanford Alpaca dataset and comparing it to normal tuning and early exit via PandaLM benchmark, we show that Sorted Fine-Tuning can deliver models twice as fast as the original model while maintaining or exceeding performance

    SortedNet, a Place for Every Network and Every Network in its Place: Towards a Generalized Solution for Training Many-in-One Neural Networks

    Full text link
    As the size of deep learning models continues to grow, finding optimal models under memory and computation constraints becomes increasingly more important. Although usually the architecture and constituent building blocks of neural networks allow them to be used in a modular way, their training process is not aware of this modularity. Consequently, conventional neural network training lacks the flexibility to adapt the computational load of the model during inference. This paper proposes SortedNet, a generalized and scalable solution to harness the inherent modularity of deep neural networks across various dimensions for efficient dynamic inference. Our training considers a nested architecture for the sub-models with shared parameters and trains them together with the main model in a sorted and probabilistic manner. This sorted training of sub-networks enables us to scale the number of sub-networks to hundreds using a single round of training. We utilize a novel updating scheme during training that combines random sampling of sub-networks with gradient accumulation to improve training efficiency. Furthermore, the sorted nature of our training leads to a search-free sub-network selection at inference time; and the nested architecture of the resulting sub-networks leads to minimal storage requirement and efficient switching between sub-networks at inference. Our general dynamic training approach is demonstrated across various architectures and tasks, including large language models and pre-trained vision models. Experimental results show the efficacy of the proposed approach in achieving efficient sub-networks while outperforming state-of-the-art dynamic training approaches. Our findings demonstrate the feasibility of training up to 160 different sub-models simultaneously, showcasing the extensive scalability of our proposed method while maintaining 96% of the model performance

    Protocol for systematic review: peak bone mass pattern in different parts of the world

    Get PDF
    Copyright: © 2015 Mohammadi Z. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. Peak bone mass, which can be defined as the amount of bone tissue present at the end of the skeletal maturation, and also it is an important determinant of osteoporotic fracture risk. The peak bone mass of a given part of the skeleton is directly dependent upon both its genetics and environmental factors. Therefore, the aim of the proposed research is a comprehensive systematic assessment of the pattern of peak bone mass in different countries across the globe. The present article explains the protocol for conducting such a research

    Re-admission Rate of Patients with Ureteral Stone: A Descriptive Study

    Get PDF
    Introduction: Patients with acute renal colic need to choose between undergoing medical treatments and receiving interventions. The Aim of this study is to evaluate the outcomes of patients who are discharged from emergency departments with ureteral stones lesser than 6 millimeters. In doing so, the effect of diagnostic treatment approaches on clinical outcomes and referral rate is to be assessed.Patients and Methods: This study was performed on patients with ureteral stones referred to emergency department of Shohadaye Tajrish Hospital between May2015 to June 2018. A checklist was filled out for each patient and it included their complete medical history, physical examination results and paraclinical data. Patients were then studied for 4 weeks to determine referral times to hospital and clinical outcomes.Results: 105 patients include 81 men (77.14%) with average age of 37.1±12.4 years were studied. The mean stone diameter was 4.2±2.1 mm.  Most of ureteral stones were seen in the right-hand side (60 percent). 71 patients (67.6%) did not have any history of nephrolithiasis and 73 (69.5%) did not have positive family history for nephrolithiasis. Ureteral stones were still observed in 42 patients (40%) after two weeks of studies and only one patient (1.1%) had stone in Ultrasound Imaging after 4 weeks of observations.Conclusion: Most Patients (95%) with stones smaller than 6 mm responded to Medical Expulsive Therapy (MET) after 4 weeks and passed spontaneously ureteral calculi

    Design, Synthesis and Biological Evaluation of New 5,5-Diarylhydantoin Derivatives as Selective Cyclooxygenase-2 Inhibitors

    Get PDF
    A new group of 5,5-diarylhydantoin derivatives bearing a methylsulfonyl COX-2 pharmacophore at the para position of the C-5 phenyl ring were designed and synthesized as selective COX-2 inhibitors. In vitro COX-1/COX-2 inhibition structure-activity relationships identified 5-[4-(methylsulfonyl)phenyl]-5-phenyl-hydantoin (4) as a highly potent and selective COX-2 inhibitor (COX-2 IC50 = 0.077 μM; selectivity index > 1298). It was more selective than the reference drug celecoxib (COX-2 IC50 = 0.060 μM; selectivity index = 405). A molecular modeling study where 4 was docked in the binding site of COX-2 indicated that the p-MeSO2 COX-2 pharmacophore group on the C-5 phenyl ring is oriented in the vicinity of the COX-2 secondary pocket. The results of this study showed that the type of substituent on the N-3 hydantoin ring substituent is important for COX-2 inhibitory activity

    Flange Wrinkling in Flexible Roll Forming Process

    Get PDF
    AbstractFlexible roll forming is an advanced sheet metal forming process for producing variable cross section profiles. Flange wrinkling at the transition zone where the cross section changes is a major defect in the flexible roll forming process. In this paper, the flange wrinkling at the transition zone is studied using finite element analysis. The results showed that the strip deformation at the transition zone can be considered as a combination of two strip deformations observed in the conventional roll forming process and the flanging process. According to finite element analysis results, when the flange wrinkling occurs, compressive longitudinal strain is smaller than the necessary compressive longitudinal strain calculated by mathematical modeling to obtain the intended profile geometry in the compression zone. Therefore, comparison of compressive longitudinal strain obtained from the finite element analysis and the necessary compressive longitudinal strain is a good criterion to predict the flange wrinkling occurrence. A flexible roll forming setup was developed. Longitudinal strain history is obtained from the finite element simulation and is compared with the experimental data from the flexible roll forming setup. Results show a good agreement and confirm the finite element analysis

    Distributions of High-Sensitivity C-Reactive Protein, Total Cholesterol-HDL Ratio and 10-Year Cardiovascular Risk: National Population-Based Study

    Get PDF
    The present study aimed to evaluate the distributions of High-Sensitivity C-reactive protein, TC-HDL ratio and 10-year risk of cardiovascular diseases among Iranian adult population. We conducted a cross-sectional study on a total of 2125 adults aged 25 to 65. Data of the Third National Surveillance of Risk Factors of Non-Communicable Diseases (SuRFNCD-2007) was used. Anthropometric indices, blood pressure and biochemical measurements had been obtained. Ten-year risk of cardiovascular events was also calculated using different models. Median (interquartile range) and geometric means (95% CI) of hs-CRP were 5.1(3.9) and 4.1(4.38-4.85), respectively. Mean TC-HDL ratio±(SD) was 5.94±2.84 in men and 5.37±1.97 in women (P<0.001). In spite of risk scores (FRS and SCORE), no significant gender and age-related differences were observed in hs-CRP levels. Exclusion of CRP levels≥10 did not change the results. The proportion of high-risk categories using SCORE and FRS models were 3.6 % and 8.8 %, respectively. In comparison with other published data, greater means and median values of High-Sensitivity C-reactive protein were observed. Higher TC-HDL ratio and cardiovascular risk in men than in women were also demonstrated. The issue of screening for cardiovascular diseases has yet to be addressed due to considerable prevalence of elevated CRP and increased risk of cardiovascular events among various subgroups
    • …
    corecore