130 research outputs found
Sheared LLaMA: Accelerating Language Model Pre-training via Structured Pruning
The popularity of LLaMA (Touvron et al., 2023a;b) and other recently emerged
moderate-sized large language models (LLMs) highlights the potential of
building smaller yet powerful LLMs. Regardless, the cost of training such
models from scratch on trillions of tokens remains high. In this work, we study
structured pruning as an effective means to develop smaller LLMs from
pre-trained, larger models. Our approach employs two key techniques: (1)
targeted structured pruning, which prunes a larger model to a specified target
shape by removing layers, heads, and intermediate and hidden dimensions in an
end-to-end manner, and (2) dynamic batch loading, which dynamically updates the
composition of sampled data in each training batch based on varying losses
across different domains. We demonstrate the efficacy of our approach by
presenting the Sheared-LLaMA series, pruning the LLaMA2-7B model down to 1.3B
and 2.7B parameters. Sheared-LLaMA models outperform state-of-the-art
open-source models of equivalent sizes, such as Pythia, INCITE, and OpenLLaMA
models, on a wide range of downstream and instruction tuning evaluations, while
requiring only 3% of compute compared to training such models from scratch.
This work provides compelling evidence that leveraging existing LLMs with
structured pruning is a far more cost-effective approach for building smaller
LLMs.Comment: The code and models are available at
https://github.com/princeton-nlp/LLM-Shearin
Arena: A Learning-based Synchronization Scheme for Hierarchical Federated Learning--Technical Report
Federated learning (FL) enables collaborative model training among
distributed devices without data sharing, but existing FL suffers from poor
scalability because of global model synchronization. To address this issue,
hierarchical federated learning (HFL) has been recently proposed to let edge
servers aggregate models of devices in proximity, while synchronizing via the
cloud periodically. However, a critical open challenge about how to make a good
synchronization scheme (when devices and edges should be synchronized) is still
unsolved. Devices are heterogeneous in computing and communication capability,
and their data could be non-IID. No existing work can well synchronize various
roles (\textit{e.g.}, devices and edges) in HFL to guarantee high learning
efficiency and accuracy. In this paper, we propose a learning-based
synchronization scheme for HFL systems. By collecting data such as edge models,
CPU usage, communication time, \textit{etc}., we design a deep reinforcement
learning-based approach to decide the frequencies of cloud aggregation and edge
aggregation, respectively. The proposed scheme well considers device
heterogeneity, non-IID data and device mobility, to maximize the training model
accuracy while minimizing the energy overhead. Meanwhile, the convergence bound
of the proposed synchronization scheme has been analyzed. And we build an HFL
testbed and conduct the experiments with real data obtained from Raspberry Pi
and Alibaba Cloud. Extensive experiments under various settings are conducted
to confirm the effectiveness of \textit{Arena}
Systematic Analysis of Impact of Sampling Regions and Storage Methods on Fecal Gut Microbiome and Metabolome Profiles.
The contribution of human gastrointestinal (GI) microbiota and metabolites to host health has recently become much clearer. However, many confounding factors can influence the accuracy of gut microbiome and metabolome studies, resulting in inconsistencies in published results. In this study, we systematically investigated the effects of fecal sampling regions and storage and retrieval conditions on gut microbiome and metabolite profiles from three healthy children. Our analysis indicated that compared to homogenized and snap-frozen samples (standard control [SC]), different sampling regions did not affect microbial community alpha diversity, while a total of 22 of 176 identified metabolites varied significantly across different sampling regions. In contrast, storage conditions significantly influenced the microbiome and metabolome. Short-term room temperature storage had a minimal effect on the microbiome and metabolome profiles. Sample storage in RNALater showed a significant level of variation in both microbiome and metabolome profiles, independent of the storage or retrieval conditions. The effect of RNALater on the metabolome was stronger than the effect on the microbiome, and individual variability between study participants outweighed the effect of RNALater on the microbiome. We conclude that homogenizing stool samples was critical for metabolomic analysis but not necessary for microbiome analysis. Short-term room temperature storage had a minimal effect on the microbiome and metabolome profiles and is recommended for short-term fecal sample storage. In addition, our study indicates that the use of RNALater as a storage medium of stool samples for microbial and metabolomic analyses is not recommended.IMPORTANCE The gastrointestinal microbiome and metabolome can provide a new angle to understand the development of health and disease. Stool samples are most frequently used for large-scale cohort studies. Standardized procedures for stool sample handling and storage can be a determining factor for performing microbiome or metabolome studies. In this study, we focused on the effects of stool sampling regions and stool sample storage conditions on variations in the gut microbiome composition and metabolome profile
Differentially Private Learning with Per-Sample Adaptive Clipping
Privacy in AI remains a topic that draws attention from researchers and the
general public in recent years. As one way to implement privacy-preserving AI,
differentially private learning is a framework that enables AI models to use
differential privacy (DP). To achieve DP in the learning process, existing
algorithms typically limit the magnitude of gradients with a constant clipping,
which requires carefully tuned due to its significant impact on model
performance. As a solution to this issue, latest works NSGD and Auto-S
innovatively propose to use normalization instead of clipping to avoid
hyperparameter tuning. However, normalization-based approaches like NSGD and
Auto-S rely on a monotonic weight function, which imposes excessive weight on
small gradient samples and introduces extra deviation to the update. In this
paper, we propose a Differentially Private Per-Sample Adaptive Clipping
(DP-PSAC) algorithm based on a non-monotonic adaptive weight function, which
guarantees privacy without the typical hyperparameter tuning process of using a
constant clipping while significantly reducing the deviation between the update
and true batch-averaged gradient. We provide a rigorous theoretical convergence
analysis and show that with convergence rate at the same order, the proposed
algorithm achieves a lower non-vanishing bound, which is maintained over
training iterations, compared with NSGD/Auto-S. In addition, through extensive
experimental evaluation, we show that DP-PSAC outperforms or matches the
state-of-the-art methods on multiple main-stream vision and language tasks.Comment: To appear in AAAI 2023, Revised acknowledgments and citation
Bi-Drop: Enhancing Fine-tuning Generalization via Synchronous sub-net Estimation and Optimization
Pretrained language models have achieved remarkable success in natural
language understanding. However, fine-tuning pretrained models on limited
training data tends to overfit and thus diminish performance. This paper
presents Bi-Drop, a fine-tuning strategy that selectively updates model
parameters using gradients from various sub-nets dynamically generated by
dropout. The sub-net estimation of Bi-Drop is performed in an in-batch manner,
so it overcomes the problem of hysteresis in sub-net updating, which is
possessed by previous methods that perform asynchronous sub-net estimation.
Also, Bi-Drop needs only one mini-batch to estimate the sub-net so it achieves
higher utility of training data. Experiments on the GLUE benchmark demonstrate
that Bi-Drop consistently outperforms previous fine-tuning methods.
Furthermore, empirical results also show that Bi-Drop exhibits excellent
generalization ability and robustness for domain transfer, data imbalance, and
low-resource scenarios.Comment: EMNLP 2023 Findings. Camera-ready version. Co-first authors with
equal contribution
The association between retina thinning and hippocampal atrophy in Alzheimer’s disease and mild cognitive impairment: a meta-analysis and systematic review
IntroductionThe retina is the “window” of the central nervous system. Previous studies discovered that retinal thickness degenerates through the pathological process of the Alzheimer’s disease (AD) continuum. Hippocampal atrophy is one of the typical clinical features and diagnostic criteria of AD. Former studies have described retinal thinning in normal aging subjects and AD patients, yet the association between retinal thickness and hippocampal atrophy in AD is unclear. The optical coherence tomography (OCT) technique has access the non-invasive to retinal images and magnetic resonance imaging can outline the volume of the hippocampus. Thus, we aim to quantify the correlation between these two parameters to identify whether the retina can be a new biomarker for early AD detection.MethodsWe systematically searched the PubMed, Embase, and Web of Science databases from inception to May 2023 for studies investigating the correlation between retinal thickness and hippocampal volume. The Newcastle-Ottawa Quality Assessment Scale (NOS) was used to assess the study quality. Pooled correlation coefficient r values were combined after Fisher’s Z transformation. Moderator effects were detected through subgroup analysis and the meta-regression method.ResultsOf the 1,596 citations initially identified, we excluded 1,062 studies after screening the titles and abstract (animal models, n = 99; irrelevant literature, n = 963). Twelve studies met the inclusion criteria, among which three studies were excluded due to unextractable data. Nine studies were eligible for this meta-analysis. A positive moderate correlation between the retinal thickness was discovered in all participants of with AD, mild cognitive impairment (MCI), and normal controls (NC) (r = 0.3469, 95% CI: 0.2490–0.4377, I2 = 5.0%), which was significantly higher than that of the AD group (r = 0.1209, 95% CI:0.0905–0.1510, I2 = 0.0%) (p < 0.05). Among different layers, the peripapillary retinal nerve fiber layer (pRNFL) indicated a moderate positive correlation with hippocampal volume (r = 0.1209, 95% CI:0.0905–0.1510, I2 = 0.0%). The retinal pigmented epithelium (RPE) was also positively correlated [r = 0.1421, 95% CI:(−0.0447–0.3192), I2 = 84.1%]. The retinal layers and participants were the main overall heterogeneity sources. Correlation in the bilateral hemisphere did not show a significant difference.ConclusionThe correlation between RNFL thickness and hippocampal volume is more predominant in both NC and AD groups than other layers. Whole retinal thickness is positively correlated to hippocampal volume not only in AD continuum, especially in MCI, but also in NC.Systematic review registrationhttps://www.crd.york.ac.uk/PROSPERO/, CRD42022328088
The Long Non-Coding RNA ENST00000537266 and ENST00000426615 Influence Papillary Thyroid Cancer Cell Proliferation and Motility
- …