63 research outputs found

    Cappy: Outperforming and Boosting Large Multi-Task LMs with a Small Scorer

    Full text link
    Large language models (LLMs) such as T0, FLAN, and OPT-IML, excel in multi-tasking under a unified instruction-following paradigm, where they also exhibit remarkable generalization abilities to unseen tasks. Despite their impressive performance, these LLMs, with sizes ranging from several billion to hundreds of billions of parameters, demand substantial computational resources, making their training and inference expensive and inefficient. Furthermore, adapting these models to downstream applications, particularly complex tasks, is often unfeasible due to the extensive hardware requirements for finetuning, even when utilizing parameter-efficient approaches such as prompt tuning. Additionally, the most powerful multi-task LLMs, such as OPT-IML-175B and FLAN-PaLM-540B, are not publicly accessible, severely limiting their customization potential. To address these challenges, we introduce a pretrained small scorer, Cappy, designed to enhance the performance and efficiency of multi-task LLMs. With merely 360 million parameters, Cappy functions either independently on classification tasks or serve as an auxiliary component for LLMs, boosting their performance. Moreover, Cappy enables efficiently integrating downstream supervision without requiring LLM finetuning nor the access to their parameters. Our experiments demonstrate that, when working independently on 11 language understanding tasks from PromptSource, Cappy outperforms LLMs that are several orders of magnitude larger. Besides, on 45 complex tasks from BIG-Bench, Cappy boosts the performance of the advanced multi-task LLM, FLAN-T5, by a large margin. Furthermore, Cappy is flexible to cooperate with other LLM adaptations, including finetuning and in-context learning, offering additional performance enhancement.Comment: In proceedings of NeurIPS 2023; Code and model available at https://github.com/tanyuqian/cappy and https://huggingface.co/btan2/cappy-large, respectivel

    Redco: A Lightweight Tool to Automate Distributed Training of LLMs on Any GPU/TPUs

    Full text link
    The recent progress of AI can be largely attributed to large language models (LLMs). However, their escalating memory requirements introduce challenges for machine learning (ML) researchers and engineers. Addressing this requires developers to partition a large model to distribute it across multiple GPUs or TPUs. This necessitates considerable coding and intricate configuration efforts with existing model parallel tools, such as Megatron-LM, DeepSpeed, and Alpa. These tools require users' expertise in machine learning systems (MLSys), creating a bottleneck in LLM development, particularly for developers without MLSys background. In this work, we present Redco, a lightweight and user-friendly tool crafted to automate distributed training and inference for LLMs, as well as to simplify ML pipeline development. The design of Redco emphasizes two key aspects. Firstly, to automate model parallism, our study identifies two straightforward rules to generate tensor parallel strategies for any given LLM. Integrating these rules into Redco facilitates effortless distributed LLM training and inference, eliminating the need of additional coding or complex configurations. We demonstrate the effectiveness by applying Redco on a set of LLM architectures, such as GPT-J, LLaMA, T5, and OPT, up to the size of 66B. Secondly, we propose a mechanism that allows for the customization of diverse ML pipelines through the definition of merely three functions, eliminating redundant and formulaic code like multi-host related processing. This mechanism proves adaptable across a spectrum of ML algorithms, from foundational language modeling to complex algorithms like meta-learning and reinforcement learning. Consequently, Redco implementations exhibit much fewer code lines compared to their official counterparts.Comment: Released under Apache License 2.0 at https://github.com/tanyuqian/redc

    A solvent evaporation route towards fabrication of hierarchically porous ZSM-11 with highly accessible mesopores

    Get PDF
    A solvent evaporation route to generate an organosilane modified dry gel and its transformation into hierarchically porous ZSM-11 is reported. The material features good pore-connectivity and improved acid site accessibility towards bulky substrates.</p

    Portability and networked learning environments

    Get PDF
    Abstract The portability of educational software is defined as the likelihood of software usage, with or without adaptation, in an educational environment different from that for which it was originally designed and produced. Barriers and research relevant to the portability of electronic learning resources are discussed and organised into a portability-limiting factors model. With the increase in number and scope of networked learning environments, portability issues take on a new dimension. Using electronic (study) books as an example, the portability problem space of networked learning environments is explored

    Phase Stability of Hexagonal/cubic Boron Nitride Nanocomposites

    Full text link
    Boron nitride (BN) is an exceptional material and among its polymorphs, two-dimensional (2D) hexagonal and three-dimensional (3D) cubic BN (h-BN and c-BN) phases are most common. The phase stability regimes of these BN phases are still under debate and phase transformations of h-BN/c-BN remain a topic of interest. Here, we investigate the phase stability of 2D/3D h-BN/c-BN nanocomposites and show that the co-existence of two phases can lead to strong non-linear optical properties and low thermal conductivity at room temperature. Furthermore, spark-plasma sintering of the nanocomposite shows complete phase transformation to 2D h-BN with improved crystalline quality, where 3D c-BN grain sizes governs the nucleation and growth kinetics. Our demonstration might be insightful in phase engineering of BN polymorphs based nanocomposites with desirable properties for optoelectronics and thermal energy management applications.Comment: 29 pages, 5 figure

    Cross-cultural portability of educational software: A communication-oriented approach

    No full text
    corecore