66 research outputs found

    Towards Mitigating Architecture Overfitting in Dataset Distillation

    Full text link
    Dataset distillation methods have demonstrated remarkable performance for neural networks trained with very limited training data. However, a significant challenge arises in the form of architecture overfitting: the distilled training data synthesized by a specific network architecture (i.e., training network) generates poor performance when trained by other network architectures (i.e., test networks). This paper addresses this issue and proposes a series of approaches in both architecture designs and training schemes which can be adopted together to boost the generalization performance across different network architectures on the distilled training data. We conduct extensive experiments to demonstrate the effectiveness and generality of our methods. Particularly, across various scenarios involving different sizes of distilled data, our approaches achieve comparable or superior performance to existing methods when training on the distilled data using networks with larger capacities

    Lightning Attention-2: A Free Lunch for Handling Unlimited Sequence Lengths in Large Language Models

    Full text link
    Linear attention is an efficient attention mechanism that has recently emerged as a promising alternative to conventional softmax attention. With its ability to process tokens in linear computational complexities, linear attention, in theory, can handle sequences of unlimited length without sacrificing speed, i.e., maintaining a constant training speed for various sequence lengths with a fixed memory consumption. However, due to the issue with cumulative summation (cumsum), current linear attention algorithms cannot demonstrate their theoretical advantage in a causal setting. In this paper, we present Lightning Attention-2, the first linear attention implementation that enables linear attention to realize its theoretical computational benefits. To achieve this, we leverage the thought of tiling, separately handling the intra-block and inter-block components in linear attention calculation. Specifically, we utilize the conventional attention computation mechanism for the intra-blocks and apply linear attention kernel tricks for the inter-blocks. A tiling technique is adopted through both forward and backward procedures to take full advantage of the GPU hardware. We implement our algorithm in Triton to make it IO-aware and hardware-friendly. Various experiments are conducted on different model sizes and sequence lengths. Lightning Attention-2 retains consistent training and inference speed regardless of input sequence length and is significantly faster than other attention mechanisms. The source code is available at https://github.com/OpenNLPLab/lightning-attention.Comment: Technical Report. Yiran Zhong is the corresponding author. The source code is available at https://github.com/OpenNLPLab/lightning-attentio

    TransNormerLLM: A Faster and Better Large Language Model with Improved TransNormer

    Full text link
    We present TransNormerLLM, the first linear attention-based Large Language Model (LLM) that outperforms conventional softmax attention-based models in terms of both accuracy and efficiency. TransNormerLLM evolves from the previous linear attention architecture TransNormer by making advanced modifications that include positional embedding, linear attention acceleration, gating mechanisms, tensor normalization, and inference acceleration and stabilization. Specifically, we use LRPE together with an exponential decay to avoid attention dilution issues while allowing the model to retain global interactions between tokens. Additionally, we propose Lightning Attention, a cutting-edge technique that accelerates linear attention by more than twice in runtime and reduces memory usage by a remarkable four times. To further enhance the performance of TransNormer, we leverage a gating mechanism for smooth training and a new tensor normalization scheme to accelerate the model, resulting in an impressive acceleration of over 20%20\%. Furthermore, we develop a robust inference algorithm that ensures numerical stability and consistent inference speed, regardless of the sequence length, showcasing superior efficiency during both training and inference stages. We also implement an efficient model parallel schema for TransNormerLLM, enabling seamless deployment on large-scale clusters and facilitating expansion to even more extensive models, i.e., LLMs with 175B parameters. We validate our model design through a series of ablations and train models with sizes of 385M, 1B, and 7B on our self-collected corpus. Benchmark results demonstrate that our models not only match the performance of state-of-the-art LLMs with Transformer but are also significantly faster. Code is released at: https://github.com/OpenNLPLab/TransnormerLLM.Comment: Technical Report. Yiran Zhong is the corresponding author. Zhen Qin, Dong Li, Weigao Sun, Weixuan Sun, Xuyang Shen contribute equally to this paper. Code is released at: https://github.com/OpenNLPLab/TransnormerLL

    Fine-grained Audible Video Description

    Full text link
    We explore a new task for audio-visual-language modeling called fine-grained audible video description (FAVD). It aims to provide detailed textual descriptions for the given audible videos, including the appearance and spatial locations of each object, the actions of moving objects, and the sounds in videos. Existing visual-language modeling tasks often concentrate on visual cues in videos while undervaluing the language and audio modalities. On the other hand, FAVD requires not only audio-visual-language modeling skills but also paragraph-level language generation abilities. We construct the first fine-grained audible video description benchmark (FAVDBench) to facilitate this research. For each video clip, we first provide a one-sentence summary of the video, ie, the caption, followed by 4-6 sentences describing the visual details and 1-2 audio-related descriptions at the end. The descriptions are provided in both English and Chinese. We create two new metrics for this task: an EntityScore to gauge the completeness of entities in the visual descriptions, and an AudioScore to assess the audio descriptions. As a preliminary approach to this task, we propose an audio-visual-language transformer that extends existing video captioning model with an additional audio branch. We combine the masked language modeling and auto-regressive language modeling losses to optimize our model so that it can produce paragraph-level descriptions. We illustrate the efficiency of our model in audio-visual-language modeling by evaluating it against the proposed benchmark using both conventional captioning metrics and our proposed metrics. We further put our benchmark to the test in video generation models, demonstrating that employing fine-grained video descriptions can create more intricate videos than using captions.Comment: accpeted to CVPR 2023, Xuyang Shen, Dong Li and Jinxing Zhou contribute equally, code link: github.com/OpenNLPLab/FAVDBench, dataset link: www.avlbench.opennlplab.c

    Novel nickel foam with multiple microchannels as combustion reaction support for the self-heating methanol steam reforming microreactor

    Get PDF
    To improve hydrogen production performance of self-heating methanol steam reforming (MSR) microreactor, novel nickel foam with multiple microchannels was proposed as combustion reaction support. A wall temperature comparison of the methanol combustion microreactors with nickel foam catalyst support and particles catalyst support in the combustion reaction process was performed. According to the numerical simulation result of combustion reaction of nickel foam, the shape and size of multiple microchannels of nickel foam were determined. The laser processing was then used to fabricate the multiple microchannels of nickel foam. The experimental results show that the methanol combustion microreactor with nickel foam loaded with Pt catalyst exhibits similar wall temperature distribution with the methanol combustion microreactor with Pt/γ-Al2O3 particles reaction support. Compared with the nickel foam without a microchannel, the maximum temperature difference (ΔTmax) and the maximum temperature of nickel foam with multiple microchannels were decreased, respectively, by 57.8% and 33.8 °C when 1.1 mL/min methanol flow rate was used. Hydrogen production performance of the self-heating MSR microreactor using the nickel foam with multiple microchannels increased by about 21% when 430 °C reforming temperature and 4 mL/h methanol–water mixture flow rate were performed

    Intensified paraglacial slope failures due to accelerating downwasting of a temperate glacier in Mt. Gongga, southeastern Tibetan Plateau

    Get PDF
    Topographic development via paraglacial slope failure (PSF) represents a complex interplay between geological structure, climate, and glacial denudation. Southeastern Tibet has experienced amongst the highest rates of ice mass loss in High Mountain Asia in recent decades, but few studies have focused on the implications of this mass loss on the stability of paraglacial slopes. We used repeat satellite- and unpiloted aerial vehicle (UAV)-derived imagery between 1990 and 2020 as the basis for mapping PSFs from slopes adjacent to Hailuogou Glacier (HLG), a 5 km long monsoon temperate valley glacier in the Mt. Gongga region. We observed recent lowering of the glacier tongue surface at rates of up to 0.88 m a−1 in the period 2000 to 2016, whilst overall paraglacial bare ground area (PBGA) on glacier-adjacent slopes increased from 0.31 ± 0.27 km2 in 1990 to 1.38 ± 0.06 km2 in 2020. Decadal PBGA expansion rates were ∼ 0.01 km2 a−1, 0.02 km2 a−1, and 0.08 km2 in the periods 1990–2000, 2000–2011, and 2011–2020 respectively, indicating an increasing rate of expansion of PBGA. Three types of PSFs, including rockfalls, sediment-mantled slope slides, and headward gully erosion, were mapped, with a total area of 0.75 ± 0.03 km2 in 2020. South-facing valley slopes (true left of the glacier) exhibited more destabilization (56 % of the total PSF area) than north-facing (true right) valley slopes (44 % of the total PSF area). Deformation of sediment-mantled moraine slopes (mean 1.65–2.63 ± 0.04 cm d−1) and an increase in erosion activity in ice-marginal tributary valleys caused by a drop in local base level (gully headward erosion rates are 0.76–3.39 cm d−1) have occurred in tandem with recent glacier downwasting. We also observe deformation of glacier ice, possibly driven by destabilization of lateral moraine, as has been reported in other deglaciating mountain glacier catchments. The formation, evolution, and future trajectory of PSFs at HLG (as well as other monsoon-dominated deglaciating mountain areas) are related to glacial history, including recent rapid downwasting leading to the exposure of steep, unstable bedrock and moraine slopes, and climatic conditions that promote slope instability, such as very high seasonal precipitation and seasonal temperature fluctuations that are conducive to freeze–thaw and ice segregation processes

    Disruption of a GATA4/Ankrd1 Signaling Axis in Cardiomyocytes Leads to Sarcomere Disarray: Implications for Anthracycline Cardiomyopathy

    Get PDF
    Doxorubicin (Adriamycin) is an effective anti-cancer drug, but its clinical usage is limited by a dose-dependent cardiotoxicity characterized by widespread sarcomere disarray and loss of myofilaments. Cardiac ankyrin repeat protein (CARP, ANKRD1) is a transcriptional regulatory protein that is extremely susceptible to doxorubicin; however, the mechanism(s) of doxorubicin-induced CARP depletion and its specific role in cardiomyocytes have not been completely defined. We report that doxorubicin treatment in cardiomyocytes resulted in inhibition of CARP transcription, depletion of CARP protein levels, inhibition of myofilament gene transcription, and marked sarcomere disarray. Knockdown of CARP with small interfering RNA (siRNA) similarly inhibited myofilament gene transcription and disrupted cardiomyocyte sarcomere structure. Adenoviral overexpression of CARP, however, was unable to rescue the doxorubicin-induced sarcomere disarray phenotype. Doxorubicin also induced depletion of the cardiac transcription factor GATA4 in cardiomyocytes. CARP expression is regulated in part by GATA4, prompting us to examine the relationship between GATA4 and CARP in cardiomyocytes. We show in co-transfection experiments that GATA4 operates upstream of CARP by activating the proximal CARP promoter. GATA4-siRNA knockdown in cardiomyocytes inhibited CARP expression and myofilament gene transcription, and induced extensive sarcomere disarray. Adenoviral overexpression of GATA4 (AdV-GATA4) in cardiomyocytes prior to doxorubicin exposure maintained GATA4 levels, modestly restored CARP levels, and attenuated sarcomere disarray. Interestingly, siRNA-mediated depletion of CARP completely abolished the Adv-GATA4 rescue of the doxorubicin-induced sarcomere phenotype. These data demonstrate co-dependent roles for GATA4 and CARP in regulating sarcomere gene expression and maintaining sarcomeric organization in cardiomyocytes in culture. The data further suggests that concurrent depletion of GATA4 and CARP in cardiomyocytes by doxorubicin contributes in large part to myofibrillar disarray and the overall pathophysiology of anthracycline cardiomyopathy

    A Few-Shot Learning-Based EEG and Stage Transition Sequence Generator for Improving Sleep Staging Performance

    No full text
    In this study, generative adversarial networks named SleepGAN are proposed to expand the training set for automatic sleep stage classification tasks by generating both electroencephalogram (EEG) epochs and sequence relationships of sleep stages. In order to reach high accuracy, most existing classification methods require substantial amounts of training data, but obtaining such quantities of real EEG epochs is expensive and time-consuming. We introduce few-shot learning, which is a method of training a GAN using a very small set of training data. This paper presents progressive Wasserstein divergence generative adversarial networks (GANs) and a relational memory generator to generate EEG epochs and stage transition sequences, respectively. For the evaluation of our generated data, we use single-channel EEGs from the public dataset Sleep-EDF. The addition of our augmented data and sequence to the training set was shown to improve the performance of the classification model. The accuracy of the model increased by approximately 1% after incorporating generated EEG epochs. Adding both the augmented data and sequence to the training set resulted in a further increase of 3%, from the original accuracy of 79.40% to 83.06%. The result proves that SleepGAN is a set of GANs capable of generating realistic EEG epochs and transition sequences under the condition of insufficient training data and can be used to enlarge the training dataset and improve the performance of sleep stage classification models in clinical practice
    • …
    corecore