41 research outputs found

    Random-LTD: Random and Layerwise Token Dropping Brings Efficient Training for Large-scale Transformers

    Full text link
    Large-scale transformer models have become the de-facto architectures for various machine learning applications, e.g., CV and NLP. However, those large models also introduce prohibitive training costs. To mitigate this issue, we propose a novel random and layerwise token dropping method (random-LTD), which skips the computation of a subset of the input tokens at all middle layers. Particularly, random-LTD achieves considerable speedups and comparable accuracy as the standard training baseline. Compared to other token dropping methods, random-LTD does not require (1) any importance score-based metrics, (2) any special token treatment (e.g., [CLS]), and (3) many layers in full sequence length training except the first and the last layers. Besides, a new LayerToken learning rate schedule is proposed for pretraining problems that resolve the heavy tuning requirement for our proposed training mechanism. Finally, we demonstrate that random-LTD can be applied to broader applications, including GPT and BERT pretraining as well as ViT and GPT finetuning tasks. Our results show that random-LTD can save about 33.3% theoretical compute cost and 25.6% wall-clock training time while achieving similar zero-shot evaluations on GPT-31.3B as compared to baseline.Comment: 22 page

    DeepSpeed-VisualChat: Multi-Round Multi-Image Interleave Chat via Multi-Modal Causal Attention

    Full text link
    Most of the existing multi-modal models, hindered by their incapacity to adeptly manage interleaved image-and-text inputs in multi-image, multi-round dialogues, face substantial constraints in resource allocation for training and data accessibility, impacting their adaptability and scalability across varied interaction realms. To address this, we present the DeepSpeed-VisualChat framework, designed to optimize Large Language Models (LLMs) by incorporating multi-modal capabilities, with a focus on enhancing the proficiency of Large Vision and Language Models in handling interleaved inputs. Our framework is notable for (1) its open-source support for multi-round and multi-image dialogues, (2) introducing an innovative multi-modal causal attention mechanism, and (3) utilizing data blending techniques on existing datasets to assure seamless interactions in multi-round, multi-image conversations. Compared to existing frameworks, DeepSpeed-VisualChat shows superior scalability up to 70B parameter language model size, representing a significant advancement in multi-modal language models and setting a solid foundation for future explorations

    Populus trichocarpa PtNF-YA9, A Multifunctional Transcription Factor, Regulates Seed Germination, Abiotic Stress, Plant Growth and Development in Arabidopsis

    Get PDF
    NF-YAs play important roles in abiotic stress. However, their characteristics and functions in abiotic stress of poplar, a model woody plant, have not been fully investigated. Here, the biological functions of PtNF-YA9 (Potri.011G101000), an NF-YA gene from Populus trichocarpa, were first fully investigated. PtNF-YA9 is located in the nucleus. The expression of PtNF-YA9 was reduced by mannitol, NaCl, and abscisic acid (ABA). The GUS staining of ProNF-YA9::GUS transgenic lines was also reduced by mannitol treatments. In the PtNF-YA9-overexpressed Arabidopsis (OxPtNA9), OxPtNA9 lines exhibited sensitivity to simulated drought, ABA, and salinity stress during germination stage, and growth arrest emerged at post-germination stage. These phenomena might involve the ABA signaling pathway via the regulation of ABI3, ABI4, and ABI5. At vegetative stages, OxPtNA9 lines decreased in water loss via promoting stomatal closure and displayed high instantaneous water-use efficiency (WUE) of the leaf to exhibit enhanced drought tolerance. Furthermore, OxPtNA9 lines exhibited long primary root in the half-strength Murashige–Skoog agar medium supplemented with NaCl and conferred strong tolerance in the soil under salt stress. Additionally, PtNF-YA9 exhibited dwarf phenotype, short hypocotyl, small leaf area and biomass, delayed flowering, and increased chlorophyll content. Above all, our research proposes a model in which PtNF-YA9 not only plays a key role in reducing plant growth but also can play a primary role in the mechanism of an acclimatization strategy in response to adverse environmental conditions

    Inhibition of A/Human/Hubei/3/2005 (H3N2) influenza virus infection by silver nanoparticles in vitro and in vivo

    Full text link
    AbstractSilver nanoparticles (AgNPs) have attracted much attention as antimicrobial agents and have demonstrated efficient inhibitory activity against various viruses, including human immunodeficiency virus, hepatitis B virus, and Tacaribe virus. In this study, we investigated if AgNPs could have antiviral and preventive effects in A/Human/Hubei/3/2005 (H3N2) influenza virus infection. Madin-Darby canine kidney cells infected with AgNP-treated H3N2 influenza virus showed better viability (P,0.05 versus influenza virus control) and no obvious cytopathic effects compared with an influenza virus control group and a group treated with the solvent used for preparation of the AgNPs. Hemagglutination assay indicated that AgNPs could significantly inhibit growth of the influenza virus in Madin-Darby canine kidney cells (P,0.01 versus the influenza virus control). AgNPs significantly reduced cell apoptosis induced by H3N2 influenza virus at three different treatment pathways (P,0.05 versus influenza virus control). H3N2 influenza viruses treated with AgNPs were analyzed by transmission electron microscopy and found to interact with each other, resulting in destruction of morphologic viral structures in a time-dependent manner in a time range of 30 minutes to 2 hours. In addition, intranasal AgNP administration in mice significantly enhanced survival after infection with the H3N2 influenza virus. Mice treated with AgNPs showed lower lung viral titer levels and minor pathologic lesions in lung tissue, and had a marked survival benefit during secondary intranasal passage in vivo. These results provide evidence that AgNPs have beneficial effects in preventing H3N2 influenza virus infection both in vitro and in vivo, and demonstrate that AgNPs can be used as potential therapeutics for inhibiting outbreaks of influenza.<br /

    DeepSpeed-Chat: Easy, Fast and Affordable RLHF Training of ChatGPT-like Models at All Scales

    Full text link
    ChatGPT-like models have revolutionized various applications in artificial intelligence, from summarization and coding to translation, matching or even surpassing human performance. However, the current landscape lacks an accessible, efficient, and cost-effective end-to-end RLHF (Reinforcement Learning with Human Feedback) training pipeline for these powerful models, particularly when training at the scale of billions of parameters. This paper introduces DeepSpeed-Chat, a novel system that democratizes RLHF training, making it accessible to the AI community. DeepSpeed-Chat offers three key capabilities: an easy-to-use training and inference experience for ChatGPT-like models, a DeepSpeed-RLHF pipeline that replicates the training pipeline from InstructGPT, and a robust DeepSpeed-RLHF system that combines various optimizations for training and inference in a unified way. The system delivers unparalleled efficiency and scalability, enabling training of models with hundreds of billions of parameters in record time and at a fraction of the cost. With this development, DeepSpeed-Chat paves the way for broader access to advanced RLHF training, even for data scientists with limited resources, thereby fostering innovation and further development in the field of AI.Comment: 14 pages, 7 figure

    Superior performance of aptamer in tumor penetration over antibody : implication of aptamer-based theranostics in solid tumors

    Get PDF
    Insufficient penetration of therapeutic agents into tumor tissues results in inadequate drug distribution and lower intracellular concentration of drugs, leading to the increase of drug resistance and resultant failure of cancer treatment. Targeted drug delivery to solid tumors followed by complete drug penetration and durable retention will significantly improve clinical outcomes of cancer therapy. Monoclonal antibodies have been commonly used in clinic for cancer treatment, but their limitation of penetrating into tumor tissues still remains because of their large size. Aptamers, as &quot;chemical antibodies&quot;, are 15-20 times smaller than antibodies. To explore whether aptamers are superior to antibodies in terms of tumor penetration, we carried out the first comprehensive study to compare the performance of an EpCAM aptamer with an EpCAM antibody in theranostic applications. Penetration and retention were studied in in vitro three-dimensional tumorspheres, in vivo live animal imaging and mouse colorectal cancer xenograft model. We found that the EpCAM aptamer can not only effectively penetrate into the tumorsphere cores but can also be retained by tumor sphere cells for at least 24 h, while limited tumor penetration by EpCAM antibody was observed after 4 h incubation. As observed from in vivo live animal imaging, EpCAM aptamers displayed a maximum tumor uptake at around 10 min followed by a rapid clearance after 80 min, while the signal of peak uptake and disappearance of antibody appeared at 3 h and 6 h after intravenous injection, respectively. The signal of PEGylated EpCAM aptamers in xenograft tumors was sustained for 26 h, which was 4.3-fold longer than that of the EpCAM antibody. Consistently, there were 1.67-fold and 6.6-fold higher accumulation of PEGylated aptamer in xenograft tumors than that of antibody, at 3 h and 24 h after intravenous administration, respectively. In addition, the aptamer achieved at least a 4-time better tumor penetration in xenograft tumors than that of the antibody at a 200 &mu;m distances from the blood vessels 3 h after intravenous injection. Taken together, these data indicate that aptmers are superior to antibodies in cancer theranostics due to their better tumor penetration, more homogeneous distribution and longer retention in tumor sites. Thus, aptamers are promising agents for targeted tumor therapeutics and molecular imaging

    BLOOM: A 176B-Parameter Open-Access Multilingual Language Model

    Full text link
    Large language models (LLMs) have been shown to be able to perform new tasks based on a few demonstrations or natural language instructions. While these capabilities have led to widespread adoption, most LLMs are developed by resource-rich organizations and are frequently kept from the public. As a step towards democratizing this powerful technology, we present BLOOM, a 176B-parameter open-access language model designed and built thanks to a collaboration of hundreds of researchers. BLOOM is a decoder-only Transformer language model that was trained on the ROOTS corpus, a dataset comprising hundreds of sources in 46 natural and 13 programming languages (59 in total). We find that BLOOM achieves competitive performance on a wide variety of benchmarks, with stronger results after undergoing multitask prompted finetuning. To facilitate future research and applications using LLMs, we publicly release our models and code under the Responsible AI License
    corecore