49 research outputs found
SWARM Parallelism: Training Large Models Can Be Surprisingly Communication-Efficient
Many deep learning applications benefit from using large models with billions
of parameters. Training these models is notoriously expensive due to the need
for specialized HPC clusters. In this work, we consider alternative setups for
training large models: using cheap "preemptible" instances or pooling existing
resources from multiple regions. We analyze the performance of existing
model-parallel algorithms in these conditions and find configurations where
training larger models becomes less communication-intensive. Based on these
findings, we propose SWARM parallelism, a model-parallel training algorithm
designed for poorly connected, heterogeneous and unreliable devices. SWARM
creates temporary randomized pipelines between nodes that are rebalanced in
case of failure. We empirically validate our findings and compare SWARM
parallelism with existing large-scale training approaches. Finally, we combine
our insights with compression strategies to train a large Transformer language
model with 1B shared parameters (approximately 13B before sharing) on
preemptible T4 GPUs with less than 200Mb/s network.Comment: Accepted to International Conference on Machine Learning (ICML) 2023.
25 pages, 8 figure
Universal Oligonucleotide Microarray for Sub-Typing of Influenza A Virus
A universal microchip was developed for genotyping Influenza A viruses. It contains two sets of oligonucleotide probes allowing viruses to be classified by the subtypes of hemagglutinin (H1–H13, H15, H16) and neuraminidase (N1–N9). Additional sets of probes are used to detect H1N1 swine influenza viruses. Selection of probes was done in two steps. Initially, amino acid sequences specific to each subtype were identified, and then the most specific and representative oligonucleotide probes were selected. Overall, between 19 and 24 probes were used to identify each subtype of hemagglutinin (HA) and neuraminidase (NA). Genotyping included preparation of fluorescently labeled PCR amplicons of influenza virus cDNA and their hybridization to microarrays of specific oligonucleotide probes. Out of 40 samples tested, 36 unambiguously identified HA and NA subtypes of Influenza A virus
Distinct glutaminyl cyclase expression in Edinger–Westphal nucleus, locus coeruleus and nucleus basalis Meynert contributes to pGlu-Aβ pathology in Alzheimer’s disease
Glutaminyl cyclase (QC) was discovered recently as the enzyme catalyzing the pyroglutamate (pGlu or pE) modification of N-terminally truncated Alzheimer’s disease (AD) Aβ peptides in vivo. This modification confers resistance to proteolysis, rapid aggregation and neurotoxicity and can be prevented by QC inhibitors in vitro and in vivo, as shown in transgenic animal models. However, in mouse brain QC is only expressed by a relatively low proportion of neurons in most neocortical and hippocampal subregions. Here, we demonstrate that QC is highly abundant in subcortical brain nuclei severely affected in AD. In particular, QC is expressed by virtually all urocortin-1-positive, but not by cholinergic neurons of the Edinger–Westphal nucleus, by noradrenergic locus coeruleus and by cholinergic nucleus basalis magnocellularis neurons in mouse brain. In human brain, QC is expressed by both, urocortin-1 and cholinergic Edinger–Westphal neurons and by locus coeruleus and nucleus basalis Meynert neurons. In brains from AD patients, these neuronal populations displayed intraneuronal pE-Aβ immunoreactivity and morphological signs of degeneration as well as extracellular pE-Aβ deposits. Adjacent AD brain structures lacking QC expression and brains from control subjects were devoid of such aggregates. This is the first demonstration of QC expression and pE-Aβ formation in subcortical brain regions affected in AD. Our results may explain the high vulnerability of defined subcortical neuronal populations and their central target areas in AD as a consequence of QC expression and pE-Aβ formation
BLOOM: A 176B-Parameter Open-Access Multilingual Language Model
Large language models (LLMs) have been shown to be able to perform new tasks
based on a few demonstrations or natural language instructions. While these
capabilities have led to widespread adoption, most LLMs are developed by
resource-rich organizations and are frequently kept from the public. As a step
towards democratizing this powerful technology, we present BLOOM, a
176B-parameter open-access language model designed and built thanks to a
collaboration of hundreds of researchers. BLOOM is a decoder-only Transformer
language model that was trained on the ROOTS corpus, a dataset comprising
hundreds of sources in 46 natural and 13 programming languages (59 in total).
We find that BLOOM achieves competitive performance on a wide variety of
benchmarks, with stronger results after undergoing multitask prompted
finetuning. To facilitate future research and applications using LLMs, we
publicly release our models and code under the Responsible AI License
Distributed Methods with Compressed Communication for Solving Variational Inequalities, with Theoretical Guarantees
Variational inequalities in general and saddle point problems in particular
are increasingly relevant in machine learning applications, including
adversarial learning, GANs, transport and robust optimization. With increasing
data and problem sizes necessary to train high performing models across various
applications, we need to rely on parallel and distributed computing. However,
in distributed training, communication among the compute nodes is a key
bottleneck during training, and this problem is exacerbated for high
dimensional and over-parameterized models. Due to these considerations, it is
important to equip existing methods with strategies that would allow to reduce
the volume of transmitted information during training while obtaining a model
of comparable quality. In this paper, we present the first theoretically
grounded distributed methods for solving variational inequalities and saddle
point problems using compressed communication: MASHA1 and MASHA2. Our theory
and methods allow for the use of both unbiased (such as Rand; MASHA1) and
contractive (such as Top; MASHA2) compressors. New algorithms support
bidirectional compressions, and also can be modified for stochastic setting
with batches and for federated learning with partial participation of clients.
We empirically validated our conclusions using two experimental setups: a
standard bilinear min-max problem, and large-scale distributed adversarial
training of transformers.Comment: Big update in v2: 71 pages, 7 algorithms, 7 theorems. New analysis
for contractive compression, non-monotone analysis for unbiased and
contractive compressions, partial participatio
Biosynthesis lipase from the fungus
The article provides a method of isolating a lipase producer and identification methods by classical methods in microbiology of an isolated micromycete producing lipase. The species affiliation of Penicillium hordei has been determined. The temperature of micromycete cultivation was studied, as well as the dynamics of Penicillium hordei lipase biosynthesis. Screening plans were also carried out (Plackett-Berman Plan), then optimization according to the steep climb plan and an experiment based on the plan of a full-factor experiment (three-level CFE) was also conducted to determine the optimal concentrations that are significant for lipase biosynthesis of components. A mathematical model is constructed that describes the relationship between the influence of two significant factors on the volumetric activity of P. Hordei culture fluid. The most optimal composition for the culture fluid for lipase biosynthesis, Penicillium hordei micromycete, was compiled
Training Transformers Together
The infrastructure necessary for training state-of-the-art models is becoming
overly expensive, which makes training such models affordable only to large
corporations and institutions. Recent work proposes several methods for
training such models collaboratively, i.e., by pooling together hardware from
many independent parties and training a shared model over the Internet. In this
demonstration, we collaboratively trained a text-to-image transformer similar
to OpenAI DALL-E. We invited the viewers to join the ongoing training run,
showing them instructions on how to contribute using the available hardware. We
explained how to address the engineering challenges associated with such a
training run (slow communication, limited memory, uneven performance between
devices, and security concerns) and discussed how the viewers can set up
collaborative training runs themselves. Finally, we show that the resulting
model generates images of reasonable quality on a number of prompts.Comment: Accepted to NeurIPS 2021 Demonstration Track. 10 pages, 2 figures.
Link: https://training-transformers-together.github.i