77 research outputs found

    Learning to Skip for Language Modeling

    Full text link
    Overparameterized large-scale language models have impressive generalization performance of in-context few-shot learning. However, most language models allocate the same amount of parameters or computation to each token, disregarding the complexity or importance of the input data. We argue that in language model pretraining, a variable amount of computation should be assigned to different tokens, and this can be efficiently achieved via a simple routing mechanism. Different from conventional early stopping techniques where tokens can early exit at only early layers, we propose a more general method that dynamically skips the execution of a layer (or module) for any input token with a binary router. In our extensive evaluation across 24 NLP tasks, we demonstrate that the proposed method can significantly improve the 1-shot performance compared to other competitive baselines only at mild extra cost for inference

    An Obligate Role of Oxytocin Neurons in Diet Induced Energy Expenditure

    Get PDF
    Oxytocin neurons represent one of the major subsets of neurons in the paraventricular hypothalamus (PVH), a critical brain region for energy homeostasis. Despite substantial evidence supporting a role of oxytocin in body weight regulation, it remains controversial whether oxytocin neurons directly regulate body weight homeostasis, feeding or energy expenditure. Pharmacologic doses of oxytocin suppress feeding through a proposed melanocortin responsive projection from the PVH to the hindbrain. In contrast, deficiency in oxytocin or its receptor leads to reduced energy expenditure without feeding abnormalities. To test the physiological function of oxytocin neurons, we specifically ablated oxytocin neurons in adult mice. Our results show that oxytocin neuron ablation in adult animals has no effect on body weight, food intake or energy expenditure on a regular diet. Interestingly, male mice lacking oxytocin neurons are more sensitive to high fat diet-induced obesity due solely to reduced energy expenditure. In addition, despite a normal food intake, these mice exhibit a blunted food intake response to leptin administration. Thus, our study suggests that oxytocin neurons are required to resist the obesity associated with a high fat diet; but their role in feeding is permissive and can be compensated for by redundant pathways

    Genomic insights into local adaptation and future climate-induced vulnerability of a keystone forest tree in East Asia

    Get PDF
    Assessment of population vulnerability and adaptive capacity under climate change is crucial for informing conservation strategies. Sang et al. assemble a reference genome for Populus koreana and combine population genomics and modelling to predict spatiotemporal responses to climate change.Rapid global climate change is posing a substantial threat to biodiversity. The assessment of population vulnerability and adaptive capacity under climate change is crucial for informing conservation and mitigation strategies. Here we generate a chromosome-scale genome assembly and re-sequence genomes of 230 individuals collected from 24 populations for Populus koreana, a pioneer and keystone tree species in temperate forests of East Asia. We integrate population genomics and environmental variables to reveal a set of climate-associated single-nucleotide polymorphisms, insertion/deletions and structural variations, especially numerous adaptive non-coding variants distributed across the genome. We incorporate these variants into an environmental modeling scheme to predict a highly spatiotemporal shift of this species in response to future climate change. We further identify the most vulnerable populations that need conservation priority and many candidate genes and variants that may be useful for forest tree breeding with special aims. Our findings highlight the importance of integrating genomic and environmental data to predict adaptive capacity of a key forest to rapid climate change in the future

    BigSSL: Exploring the Frontier of Large-Scale Semi-Supervised Learning for Automatic Speech Recognition

    Full text link
    We summarize the results of a host of efforts using giant automatic speech recognition (ASR) models pre-trained using large, diverse unlabeled datasets containing approximately a million hours of audio. We find that the combination of pre-training, self-training and scaling up model size greatly increases data efficiency, even for extremely large tasks with tens of thousands of hours of labeled data. In particular, on an ASR task with 34k hours of labeled data, by fine-tuning an 8 billion parameter pre-trained Conformer model we can match state-of-the-art (SoTA) performance with only 3% of the training data and significantly improve SoTA with the full training set. We also report on the universal benefits gained from using big pre-trained and self-trained models for a large set of downstream tasks that cover a wide range of speech domains and span multiple orders of magnitudes of dataset sizes, including obtaining SoTA performance on many public benchmarks. In addition, we utilize the learned representation of pre-trained networks to achieve SoTA results on non-ASR tasks.Comment: 14 pages, 7 figures, 13 tables; v2: minor corrections, reference baselines and bibliography updated; v3: corrections based on reviewer feedback, bibliography update

    PaLI-X: On Scaling up a Multilingual Vision and Language Model

    Full text link
    We present the training recipe and results of scaling up PaLI-X, a multilingual vision and language model, both in terms of size of the components and the breadth of its training task mixture. Our model achieves new levels of performance on a wide-range of varied and complex tasks, including multiple image-based captioning and question-answering tasks, image-based document understanding and few-shot (in-context) learning, as well as object detection, video question answering, and video captioning. PaLI-X advances the state-of-the-art on most vision-and-language benchmarks considered (25+ of them). Finally, we observe emerging capabilities, such as complex counting and multilingual object detection, tasks that are not explicitly in the training mix
    • …
    corecore