57 research outputs found

    Estimation of spectral lines using expectation propagation

    Get PDF
    We consider the line spectral estimation (LSE) from general linear/nonlinear measurements obtained through a generalized linear model (GLM). This paper develops expectation propagation (EP) based LSE (EPLSE) method. The proposed method automatically estimates the model order, noise variance, and can deal with the nonlinear measurements. Numerical experiments show the excellent performance of EPLSE

    Quantized Side Tuning: Fast and Memory-Efficient Tuning of Quantized Large Language Models

    Full text link
    Finetuning large language models (LLMs) has been empirically effective on a variety of downstream tasks. Existing approaches to finetuning an LLM either focus on parameter-efficient finetuning, which only updates a small number of trainable parameters, or attempt to reduce the memory footprint during the training phase of the finetuning. Typically, the memory footprint during finetuning stems from three contributors: model weights, optimizer states, and intermediate activations. However, existing works still require considerable memory and none can simultaneously mitigate memory footprint for all three sources. In this paper, we present Quantized Side Tuing (QST), which enables memory-efficient and fast finetuning of LLMs by operating through a dual-stage process. First, QST quantizes an LLM's model weights into 4-bit to reduce the memory footprint of the LLM's original weights; QST also introduces a side network separated from the LLM, which utilizes the hidden states of the LLM to make task-specific predictions. Using a separate side network avoids performing backpropagation through the LLM, thus reducing the memory requirement of the intermediate activations. Furthermore, QST leverages several low-rank adaptors and gradient-free downsample modules to significantly reduce the trainable parameters, so as to save the memory footprint of the optimizer states. Experiments show that QST can reduce the total memory footprint by up to 2.3 ×\times and speed up the finetuning process by up to 3 ×\times while achieving competent performance compared with the state-of-the-art. When it comes to full finetuning, QST can reduce the total memory footprint up to 7 ×\times

    Galvatron: Efficient Transformer Training over Multiple GPUs Using Automatic Parallelism

    Full text link
    Transformer models have achieved state-of-the-art performance on various domains of applications and gradually becomes the foundations of the advanced large deep learning (DL) models. However, how to train these models over multiple GPUs efficiently is still challenging due to a large number of parallelism choices. Existing DL systems either rely on manual efforts to make distributed training plans or apply parallelism combinations within a very limited search space. In this approach, we propose Galvatron, a new system framework that incorporates multiple popular parallelism dimensions and automatically finds the most efficient hybrid parallelism strategy. To better explore such a rarely huge search space, we 1) involve a decision tree to make decomposition and pruning based on some reasonable intuitions, and then 2) design a dynamic programming search algorithm to generate the optimal plan. Evaluations on four representative Transformer workloads show that Galvatron could perform automatically distributed training with different GPU memory budgets. Among all evluated scenarios, Galvatron always achieves superior system throughput compared to previous work with limited parallelism

    Improving Automatic Parallel Training via Balanced Memory Workload Optimization

    Full text link
    Transformer models have emerged as the leading approach for achieving state-of-the-art performance across various application domains, serving as the foundation for advanced large-scale deep learning (DL) models. However, efficiently training these models across multiple GPUs remains a complex challenge due to the abundance of parallelism options. Existing DL systems either require manual efforts to design distributed training plans or limit parallelism combinations to a constrained search space. In this paper, we present Galvatron-BMW, a novel system framework that integrates multiple prevalent parallelism dimensions and automatically identifies the most efficient hybrid parallelism strategy. To effectively navigate this vast search space, we employ a decision tree approach for decomposition and pruning based on intuitive insights. We further utilize a dynamic programming search algorithm to derive the optimal plan. Moreover, to improve resource utilization and enhance system efficiency, we propose a bi-objective optimization workflow that focuses on workload balance. Our evaluations on different Transformer models demonstrate the capabilities of Galvatron-BMW in automating distributed training under varying GPU memory constraints. Across all tested scenarios, Galvatron-BMW consistently achieves superior system throughput, surpassing previous approaches that rely on limited parallelism strategies.Comment: arXiv admin note: substantial text overlap with arXiv:2211.1387

    Synergistic treatment of osteosarcoma with biomimetic nanoparticles transporting doxorubicin and siRNA

    Get PDF
    IntroductionOsteosarcoma tumors are the most common malignant bone tumors in children and adolescents. Their treatment usually requires surgical removal of all detectable cancerous tissue and multidrug chemotherapy; however, the prognosis for patients with unresectable or recurrent osteosarcoma is unfavorable. To make chemotherapy safer and more effective for osteosarcoma patients, biomimetic nanoparticles (NPs) camouflaged by mesenchymal stem cell membranes (MSCMs) were synthesized to induce osteosarcoma cell apoptosis by co-delivering the anticancer drug doxorubicin hydrochloride(DOX) and a small interfering RNA (siRNA). Importantly, these NPs have high biocompatibility and tumor-homing ability. This study aimed to improve the efficacy of osteosarcoma therapy by using the synergistic combination of DOX and an siRNA targeting the apoptosis suppressor gene survivin.MethodsBiomimetic NPs (DOX/siSUR-PLGA@MSCM NPs) were synthesized by coloading DOX and survivin siRNA (siSUR) into poly (lactide-co-glycolide acid) (PLGA) via a double-emulsion solvent evaporation method. The NPs were camouflaged by MSCMs to deliver both DOX and survivin-targeting siRNA and characterized and evaluated in terms of cellular uptake, in vitro release, in vitro and in vivo antitumor effects, and biosafety.ResultsDOX/siSUR-PLGA@MSCM NPs had good tumor-homing ability due to the MSCMs modification. The drug-laden biomimetic NPs had good antitumor effects in homozygous MG63 tumor-bearing mice due to the synergistic effect of the drug combination.ConclusionDOX/siSUR-PLGA@MSCM NPs can show improved therapeutic effects in osteosarcoma patients due to the combination of a chemotherapeutic drug and gene therapy based on their good tumor targeting and biosafety

    Characteristics of slamming pressure and force for trimaran hull

    Get PDF
    In this paper, the characteristics of the impact pressure and force of a trimaran section was studied by Computational Fluid Dynamics (CFD). The time domain features of the slamming pressure or force showed a strong correlation with the penetration depth regardless of the specific ways of water entry. The effects of velocity and acceleration on the impact pressure and force were analyzed. It was found that the initial impact of the main hull and the wet-deck slamming were predominantly affected by the entry velocities, whilst the acceleration had almost no effect for initial impact. The impact velocity presented a quadratic relation with slamming pressure/forces, and the relation between acceleration and wet-deck slamming pressure/force was linear. These were consistent with the patterns implied by analytical models such as the Wagner or MLM (Modified Logvinovich model) theories

    Phylogenomic analyses provide insights into primate evolution

    Get PDF
    Comparative analysis of primate genomes within a phylogenetic context is essential for understanding the evolution of human genetic architecture and primate diversity. We present such a study of 50 primate species spanning 38 genera and 14 families, including 27 genomes first reported here, with many from previously less well represented groups, the New World monkeys and the Strepsirrhini. Our analyses reveal heterogeneous rates of genomic rearrangement and gene evolution across primate lineages. Thousands of genes under positive selection in different lineages play roles in the nervous, skeletal, and digestive systems and may have contributed to primate innovations and adaptations. Our study reveals that many key genomic innovations occurred in the Simiiformes ancestral node and may have had an impact on the adaptive radiation of the Simiiformes and human evolution
    • …
    corecore