66 research outputs found
Towards Mitigating Architecture Overfitting in Dataset Distillation
Dataset distillation methods have demonstrated remarkable performance for
neural networks trained with very limited training data. However, a significant
challenge arises in the form of architecture overfitting: the distilled
training data synthesized by a specific network architecture (i.e., training
network) generates poor performance when trained by other network architectures
(i.e., test networks). This paper addresses this issue and proposes a series of
approaches in both architecture designs and training schemes which can be
adopted together to boost the generalization performance across different
network architectures on the distilled training data. We conduct extensive
experiments to demonstrate the effectiveness and generality of our methods.
Particularly, across various scenarios involving different sizes of distilled
data, our approaches achieve comparable or superior performance to existing
methods when training on the distilled data using networks with larger
capacities
Lightning Attention-2: A Free Lunch for Handling Unlimited Sequence Lengths in Large Language Models
Linear attention is an efficient attention mechanism that has recently
emerged as a promising alternative to conventional softmax attention. With its
ability to process tokens in linear computational complexities, linear
attention, in theory, can handle sequences of unlimited length without
sacrificing speed, i.e., maintaining a constant training speed for various
sequence lengths with a fixed memory consumption. However, due to the issue
with cumulative summation (cumsum), current linear attention algorithms cannot
demonstrate their theoretical advantage in a causal setting. In this paper, we
present Lightning Attention-2, the first linear attention implementation that
enables linear attention to realize its theoretical computational benefits. To
achieve this, we leverage the thought of tiling, separately handling the
intra-block and inter-block components in linear attention calculation.
Specifically, we utilize the conventional attention computation mechanism for
the intra-blocks and apply linear attention kernel tricks for the inter-blocks.
A tiling technique is adopted through both forward and backward procedures to
take full advantage of the GPU hardware. We implement our algorithm in Triton
to make it IO-aware and hardware-friendly. Various experiments are conducted on
different model sizes and sequence lengths. Lightning Attention-2 retains
consistent training and inference speed regardless of input sequence length and
is significantly faster than other attention mechanisms. The source code is
available at https://github.com/OpenNLPLab/lightning-attention.Comment: Technical Report. Yiran Zhong is the corresponding author. The source
code is available at https://github.com/OpenNLPLab/lightning-attentio
TransNormerLLM: A Faster and Better Large Language Model with Improved TransNormer
We present TransNormerLLM, the first linear attention-based Large Language
Model (LLM) that outperforms conventional softmax attention-based models in
terms of both accuracy and efficiency. TransNormerLLM evolves from the previous
linear attention architecture TransNormer by making advanced modifications that
include positional embedding, linear attention acceleration, gating mechanisms,
tensor normalization, and inference acceleration and stabilization.
Specifically, we use LRPE together with an exponential decay to avoid attention
dilution issues while allowing the model to retain global interactions between
tokens. Additionally, we propose Lightning Attention, a cutting-edge technique
that accelerates linear attention by more than twice in runtime and reduces
memory usage by a remarkable four times. To further enhance the performance of
TransNormer, we leverage a gating mechanism for smooth training and a new
tensor normalization scheme to accelerate the model, resulting in an impressive
acceleration of over . Furthermore, we develop a robust inference
algorithm that ensures numerical stability and consistent inference speed,
regardless of the sequence length, showcasing superior efficiency during both
training and inference stages. We also implement an efficient model parallel
schema for TransNormerLLM, enabling seamless deployment on large-scale clusters
and facilitating expansion to even more extensive models, i.e., LLMs with 175B
parameters. We validate our model design through a series of ablations and
train models with sizes of 385M, 1B, and 7B on our self-collected corpus.
Benchmark results demonstrate that our models not only match the performance of
state-of-the-art LLMs with Transformer but are also significantly faster. Code
is released at: https://github.com/OpenNLPLab/TransnormerLLM.Comment: Technical Report. Yiran Zhong is the corresponding author. Zhen Qin,
Dong Li, Weigao Sun, Weixuan Sun, Xuyang Shen contribute equally to this
paper. Code is released at: https://github.com/OpenNLPLab/TransnormerLL
Fine-grained Audible Video Description
We explore a new task for audio-visual-language modeling called fine-grained
audible video description (FAVD). It aims to provide detailed textual
descriptions for the given audible videos, including the appearance and spatial
locations of each object, the actions of moving objects, and the sounds in
videos. Existing visual-language modeling tasks often concentrate on visual
cues in videos while undervaluing the language and audio modalities. On the
other hand, FAVD requires not only audio-visual-language modeling skills but
also paragraph-level language generation abilities. We construct the first
fine-grained audible video description benchmark (FAVDBench) to facilitate this
research. For each video clip, we first provide a one-sentence summary of the
video, ie, the caption, followed by 4-6 sentences describing the visual details
and 1-2 audio-related descriptions at the end. The descriptions are provided in
both English and Chinese. We create two new metrics for this task: an
EntityScore to gauge the completeness of entities in the visual descriptions,
and an AudioScore to assess the audio descriptions. As a preliminary approach
to this task, we propose an audio-visual-language transformer that extends
existing video captioning model with an additional audio branch. We combine the
masked language modeling and auto-regressive language modeling losses to
optimize our model so that it can produce paragraph-level descriptions. We
illustrate the efficiency of our model in audio-visual-language modeling by
evaluating it against the proposed benchmark using both conventional captioning
metrics and our proposed metrics. We further put our benchmark to the test in
video generation models, demonstrating that employing fine-grained video
descriptions can create more intricate videos than using captions.Comment: accpeted to CVPR 2023, Xuyang Shen, Dong Li and Jinxing Zhou
contribute equally, code link: github.com/OpenNLPLab/FAVDBench, dataset link:
www.avlbench.opennlplab.c
Novel nickel foam with multiple microchannels as combustion reaction support for the self-heating methanol steam reforming microreactor
To improve hydrogen production performance of self-heating methanol steam reforming (MSR) microreactor, novel nickel foam with multiple microchannels was proposed as combustion reaction support. A wall temperature comparison of the methanol combustion microreactors with nickel foam catalyst support and particles catalyst support in the combustion reaction process was performed. According to the numerical simulation result of combustion reaction of nickel foam, the shape and size of multiple microchannels of nickel foam were determined. The laser processing was then used to fabricate the multiple microchannels of nickel foam. The experimental results show that the methanol combustion microreactor with nickel foam loaded with Pt catalyst exhibits similar wall temperature distribution with the methanol combustion microreactor with Pt/γ-Al2O3 particles reaction support. Compared with the nickel foam without a microchannel, the maximum temperature difference (ΔTmax) and the maximum temperature of nickel foam with multiple microchannels were decreased, respectively, by 57.8% and 33.8 °C when 1.1 mL/min methanol flow rate was used. Hydrogen production performance of the self-heating MSR microreactor using the nickel foam with multiple microchannels increased by about 21% when 430 °C reforming temperature and 4 mL/h methanol–water mixture flow rate were performed
Intensified paraglacial slope failures due to accelerating downwasting of a temperate glacier in Mt. Gongga, southeastern Tibetan Plateau
Topographic development via paraglacial slope failure (PSF) represents a complex interplay between geological structure, climate, and glacial denudation. Southeastern Tibet has experienced amongst the highest rates of ice mass loss in High Mountain Asia in recent decades, but few studies have focused on the implications of this mass loss on the stability of paraglacial slopes. We used repeat satellite- and unpiloted aerial vehicle (UAV)-derived imagery between 1990 and 2020 as the basis for mapping PSFs from slopes adjacent to Hailuogou Glacier (HLG), a 5 km long monsoon temperate valley glacier in the Mt. Gongga region. We observed recent lowering of the glacier tongue surface at rates of up to 0.88 m a−1 in the period 2000 to 2016, whilst overall paraglacial bare ground area (PBGA) on glacier-adjacent slopes increased from 0.31 ± 0.27 km2 in 1990 to 1.38 ± 0.06 km2 in 2020. Decadal PBGA expansion rates were ∼ 0.01 km2 a−1, 0.02 km2 a−1, and 0.08 km2 in the periods 1990–2000, 2000–2011, and 2011–2020 respectively, indicating an increasing rate of expansion of PBGA. Three types of PSFs, including rockfalls, sediment-mantled slope slides, and headward gully erosion, were mapped, with a total area of 0.75 ± 0.03 km2 in 2020. South-facing valley slopes (true left of the glacier) exhibited more destabilization (56 % of the total PSF area) than north-facing (true right) valley slopes (44 % of the total PSF area). Deformation of sediment-mantled moraine slopes (mean 1.65–2.63 ± 0.04 cm d−1) and an increase in erosion activity in ice-marginal tributary valleys caused by a drop in local base level (gully headward erosion rates are 0.76–3.39 cm d−1) have occurred in tandem with recent glacier downwasting. We also observe deformation of glacier ice, possibly driven by destabilization of lateral moraine, as has been reported in other deglaciating mountain glacier catchments. The formation, evolution, and future trajectory of PSFs at HLG (as well as other monsoon-dominated deglaciating mountain areas) are related to glacial history, including recent rapid downwasting leading to the exposure of steep, unstable bedrock and moraine slopes, and climatic conditions that promote slope instability, such as very high seasonal precipitation and seasonal temperature fluctuations that are conducive to freeze–thaw and ice segregation processes
Disruption of a GATA4/Ankrd1 Signaling Axis in Cardiomyocytes Leads to Sarcomere Disarray: Implications for Anthracycline Cardiomyopathy
Doxorubicin (Adriamycin) is an effective anti-cancer drug, but its clinical usage is limited by a dose-dependent cardiotoxicity characterized by widespread sarcomere disarray and loss of myofilaments. Cardiac ankyrin repeat protein (CARP, ANKRD1) is a transcriptional regulatory protein that is extremely susceptible to doxorubicin; however, the mechanism(s) of doxorubicin-induced CARP depletion and its specific role in cardiomyocytes have not been completely defined. We report that doxorubicin treatment in cardiomyocytes resulted in inhibition of CARP transcription, depletion of CARP protein levels, inhibition of myofilament gene transcription, and marked sarcomere disarray. Knockdown of CARP with small interfering RNA (siRNA) similarly inhibited myofilament gene transcription and disrupted cardiomyocyte sarcomere structure. Adenoviral overexpression of CARP, however, was unable to rescue the doxorubicin-induced sarcomere disarray phenotype. Doxorubicin also induced depletion of the cardiac transcription factor GATA4 in cardiomyocytes. CARP expression is regulated in part by GATA4, prompting us to examine the relationship between GATA4 and CARP in cardiomyocytes. We show in co-transfection experiments that GATA4 operates upstream of CARP by activating the proximal CARP promoter. GATA4-siRNA knockdown in cardiomyocytes inhibited CARP expression and myofilament gene transcription, and induced extensive sarcomere disarray. Adenoviral overexpression of GATA4 (AdV-GATA4) in cardiomyocytes prior to doxorubicin exposure maintained GATA4 levels, modestly restored CARP levels, and attenuated sarcomere disarray. Interestingly, siRNA-mediated depletion of CARP completely abolished the Adv-GATA4 rescue of the doxorubicin-induced sarcomere phenotype. These data demonstrate co-dependent roles for GATA4 and CARP in regulating sarcomere gene expression and maintaining sarcomeric organization in cardiomyocytes in culture. The data further suggests that concurrent depletion of GATA4 and CARP in cardiomyocytes by doxorubicin contributes in large part to myofibrillar disarray and the overall pathophysiology of anthracycline cardiomyopathy
A Few-Shot Learning-Based EEG and Stage Transition Sequence Generator for Improving Sleep Staging Performance
In this study, generative adversarial networks named SleepGAN are proposed to expand the training set for automatic sleep stage classification tasks by generating both electroencephalogram (EEG) epochs and sequence relationships of sleep stages. In order to reach high accuracy, most existing classification methods require substantial amounts of training data, but obtaining such quantities of real EEG epochs is expensive and time-consuming. We introduce few-shot learning, which is a method of training a GAN using a very small set of training data. This paper presents progressive Wasserstein divergence generative adversarial networks (GANs) and a relational memory generator to generate EEG epochs and stage transition sequences, respectively. For the evaluation of our generated data, we use single-channel EEGs from the public dataset Sleep-EDF. The addition of our augmented data and sequence to the training set was shown to improve the performance of the classification model. The accuracy of the model increased by approximately 1% after incorporating generated EEG epochs. Adding both the augmented data and sequence to the training set resulted in a further increase of 3%, from the original accuracy of 79.40% to 83.06%. The result proves that SleepGAN is a set of GANs capable of generating realistic EEG epochs and transition sequences under the condition of insufficient training data and can be used to enlarge the training dataset and improve the performance of sleep stage classification models in clinical practice
- …