4,212 research outputs found
Total positivity for matroid Schubert varieties
We define the totally nonnegative matroid Schubert variety of
a linear subspace . We show that is a
regular CW complex homeomorphic to a closed ball, with strata indexed by pairs
of acyclic flats of the oriented matroid of . This closely resembles the
regularity theorem for totally nonnegative generalized flag varieties. As a
corollary, we obtain a regular CW structure on the real matroid Schubert
variety of .Comment: Comments welcome
DeepSpeed Data Efficiency: Improving Deep Learning Model Quality and Training Efficiency via Efficient Data Sampling and Routing
Recent advances on deep learning models come at the price of formidable
training cost. The increasing model size is one of the root causes, but another
less-emphasized fact is that data scale is actually increasing at a similar
speed as model scale, and the training cost is proportional to both of them.
Compared to the rapidly evolving model architecture, how to efficiently use the
training data (especially for the expensive foundation model pretraining) is
both less explored and difficult to realize due to the lack of a convenient
framework that focuses on data efficiency capabilities. To this end, we present
DeepSpeed Data Efficiency, a framework that makes better use of data, increases
training efficiency, and improves model quality. Specifically, we propose and
combine two data efficiency techniques: efficient data sampling via a general
curriculum learning library, and efficient data routing via a novel random
layerwise token dropping technique. For GPT-3 1.3B language model pretraining,
our work achieves 12.5x less data/time/cost (\$3.7K if rent on Azure), while
still maintaining 95% of model quality compared to baseline with full data and
cost (\$46.3K). For GPT-3 1.3B and BERT-large pretraining, our work can also
achieve the same model quality with up to 2x less data/time/cost, or achieve
better model quality under same data/time/cost. DeepSpeed Data Efficiency is
easy to use and tune, enabling us to easily apply it and verify its benefit on
additional tasks including GPT-3 MoE model pretraining and small-scale
GPT-2/ViT finetuning.Comment: Published in AAAI 2024 Main Technical Track. Equal contribution by
the first 3 authors. Code has been released as a part of
https://github.com/microsoft/DeepSpeed. Part of this paper is from our
previous arxiv report (arXiv:2211.11586
Random-LTD: Random and Layerwise Token Dropping Brings Efficient Training for Large-scale Transformers
Large-scale transformer models have become the de-facto architectures for
various machine learning applications, e.g., CV and NLP. However, those large
models also introduce prohibitive training costs. To mitigate this issue, we
propose a novel random and layerwise token dropping method (random-LTD), which
skips the computation of a subset of the input tokens at all middle layers.
Particularly, random-LTD achieves considerable speedups and comparable accuracy
as the standard training baseline. Compared to other token dropping methods,
random-LTD does not require (1) any importance score-based metrics, (2) any
special token treatment (e.g., [CLS]), and (3) many layers in full sequence
length training except the first and the last layers. Besides, a new LayerToken
learning rate schedule is proposed for pretraining problems that resolve the
heavy tuning requirement for our proposed training mechanism. Finally, we
demonstrate that random-LTD can be applied to broader applications, including
GPT and BERT pretraining as well as ViT and GPT finetuning tasks. Our results
show that random-LTD can save about 33.3% theoretical compute cost and 25.6%
wall-clock training time while achieving similar zero-shot evaluations on
GPT-31.3B as compared to baseline.Comment: 22 page
Molecular gas and star formation in nearby starburst galaxy mergers
We employ the Feedback In Realistic Environments (FIRE-2) physics model to
study how the properties of giant molecular clouds (GMCs) evolve during galaxy
mergers. We conduct a pixel-by-pixel analysis of molecular gas properties in
both the simulated control galaxies and galaxy major mergers. The simulated
GMC-pixels in the control galaxies follow a similar trend in a diagram of
velocity dispersion () versus gas surface density
() to the one observed in local spiral galaxies in the
Physics at High Angular resolution in Nearby GalaxieS (PHANGS) survey. For
GMC-pixels in simulated mergers, we see a significant increase of factor of 5 -
10 in both and , which puts these pixels
above the trend of PHANGS galaxies in the vs
diagram. This deviation may indicate that GMCs in the simulated mergers are
much less gravitationally bound compared with simulated control galaxies with
virial parameter () reaching 10 - 100. Furthermore, we
find that the increase in happens at the same time as
the increase in global star formation rate (SFR), which suggests stellar
feedback is responsible for dispersing the gas. We also find that the gas
depletion time is significantly lower for high GMCs
during a starburst event. This is in contrast to the simple physical picture
that low GMCs are easier to collapse and form stars on
shorter depletion times. This might suggest that some other physical mechanisms
besides self-gravity are helping the GMCs in starbursting mergers collapse and
form stars.Comment: 22 pages, 11 figures. Accepted to ApJ. Link to animation update
ZeRO++: Extremely Efficient Collective Communication for Giant Model Training
Zero Redundancy Optimizer (ZeRO) has been used to train a wide range of large
language models on massive GPUs clusters due to its ease of use, efficiency,
and good scalability. However, when training on low-bandwidth clusters, or at
scale which forces batch size per GPU to be small, ZeRO's effective throughput
is limited because of high communication volume from gathering weights in
forward pass, backward pass, and averaging gradients. This paper introduces
three communication volume reduction techniques, which we collectively refer to
as ZeRO++, targeting each of the communication collectives in ZeRO. First is
block-quantization based all-gather. Second is data remapping that trades-off
communication for more memory. Third is a novel all-to-all based quantized
gradient averaging paradigm as replacement of reduce-scatter collective, which
preserves accuracy despite communicating low precision data. Collectively,
ZeRO++ reduces communication volume of ZeRO by 4x, enabling up to 2.16x better
throughput at 384 GPU scale.Comment: 12 page
Uptake and toxicity studies of poly-acrylic acid functionalized silicon nanoparticles in cultured mammalian cells
Poly-acrylic acid (PAAc) terminated silicon nanoparticles (SiNPs) have been synthesized and employed as a synchronous fluorescent signal indicator in a series of cultured mammalian cells: HHL5, HepG2 and 3T3-L1. Their biological effects on cell growth and proliferation in both human and mouse cell lines have been studied. There was no evidence of in vitro cytotoxity in the cells exposed to PAAc terminated SiNPS when assessed by cell morphology, cell proliferation and viability, and DNA damage assays. The uptake of the nanocrystals by both HepG2 and 3T3-L1 cells was investigated by confocal microscopy and flow cytometry, which showed a clear time-dependence at higher concentrations. Reconstructed 3-D confocal microscope images exhibited that the PAAc-SiNPs were evenly distributed throughout the cytosol rather than attached to outer membrane. This study provides fundamental evidence for the safe application and further modification of silicon nanoparticles, which could broaden their application as cell markers in living systems and in micelle encapsulated drug delivery systems
Clinical Features, Survival and Prognostic Factors of Glycogen-Rich Clear Cell Carcinoma (GRCC) of the Breast in the U.S. Population
The World Health Organization (WHO) defines glycogen-rich clear cell carcinoma (GRCC) of the breast as a carcinoma with glycogen accumulation in more than 90% of its tumor cells. Due to the rarity of this disease, its reported survival and clinical associations have been inconsistent due to reliance on case reports and limited case series. As a result, the prognostic implication of this cancer subtype remains unclear. Using the U.S. Surveillance, Epidemiology, and End Results (SEER) program database, we compared the incidence, demographics and prognostic factors of 155 cases of GRCC of the breast to 1,251,584 cases of other (non-GRCC) breast carcinomas. We demonstrate that GRCC is more likely to be identified as high grade, advanced stage, and more likely to have triple negative receptor status. GRCC cases display a poorer prognosis than non-GRCC carcinomas of the breast irrespective of age, AJCC staging, tumor grade, joint hormone receptor/human epidermal growth factor receptor 2 (HER2) status, and treatment. Similar to non-GRCC carcinomas, older age and higher American Joint Committee on Cancer (AJCC)/TNM staging were associated with poorer prognosis for GRCC, while treatment with surgery and radiation were associated with improved survival. Radiation, specifically in the setting of breast-conserving surgery, further improved survival compared to surgery alone. Our study highlights the poorer prognosis associated with glycogen accumulation in breast cancers and hence stresses the importance of identifying this more aggressive tumor type
Some investigations into non passive listening
Our knowledge of the function of the auditory nervous system is based upon a wealth of data obtained, for the most part, in anaesthetised animals. More recently, it has been generally acknowledged that factors such as attention profoundly modulate the activity of sensory systems and this can take place at many levels of processing. Imaging studies, in particular, have revealed the greater activation of auditory areas and areas outside of sensory processing areas when attending to a stimulus. We present here a brief review of the consequences of such non-passive listening and go on to describe some of the experiments we are conducting to investigate them. In imaging studies, using fMRI, we can demonstrate the activation of attention networks that are non-specific to the sensory modality as well as greater and different activation of the areas of the supra-temporal plane that includes primary and secondary auditory areas. The profuse descending connections of the auditory system seem likely to be part of the mechanisms subserving attention to sound. These are generally thought to be largely inactivated by anaesthesia. However, we have been able to demonstrate that even in an anaesthetised preparation, removing the descending control from the cortex leads to quite profound changes in the temporal patterns of activation by sounds in thalamus and inferior colliculus. Some of these effects seem to be specific to the ear of stimulation and affect interaural processing. To bridge these observations we are developing an awake behaving preparation involving freely moving animals in which it will be possible to investigate the effects of consciousness (by contrasting awake and anaesthetized), passive and active listening
Don’t turn your back on the symptoms of psychosis : a proof-of-principle, quasi-experimental public health trial to reduce the duration of untreated psychosis in Birmingham, UK
Background: Reducing the duration of untreated psychosis (DUP) is an aspiration of international guidelines for first episode psychosis; however, public health initiatives have met with mixed results. Systematic reviews suggest that greater focus on the sources of delay within care pathways, (which will vary between healthcare settings) is needed to achieve sustainable reductions in DUP (BJP 198: 256-263; 2011).
Methods/Design: A quasi-experimental trial, comparing a targeted intervention area with a ‘detection as usual’ area in the same city. A proof-of–principle trial, no a priori assumptions are made regarding effect size; key outcome will be an estimate of the potential effect size for a definitive trial. DUP and number of new cases will be collected over an 18-month period in target and control areas and compared; historical data on DUP collected in both areas over the previous three years, will serve as a benchmark. The intervention will focus on reducing two significant DUP component delays within the overall care pathway: delays within the mental health service and help-seeking delay.
Discussion: This pragmatic trial will be the first to target known delays within the care pathway for those with a first episode of psychosis. If successful, this will provide a generalizable methodology that can be implemented in a variety of healthcare contexts with differing sources of delay.
Trial registration: http://www.controlled-trials.com/ISRCTN45058713
Keywords: Public mental health campaign, First-episode psychosis, Early detection, Duration of untreated psychosis, Youth mental healt
RenAIssance: A Survey into AI Text-to-Image Generation in the Era of Large Model
Text-to-image generation (TTI) refers to the usage of models that could
process text input and generate high fidelity images based on text
descriptions. Text-to-image generation using neural networks could be traced
back to the emergence of Generative Adversial Network (GAN), followed by the
autoregressive Transformer. Diffusion models are one prominent type of
generative model used for the generation of images through the systematic
introduction of noises with repeating steps. As an effect of the impressive
results of diffusion models on image synthesis, it has been cemented as the
major image decoder used by text-to-image models and brought text-to-image
generation to the forefront of machine-learning (ML) research. In the era of
large models, scaling up model size and the integration with large language
models have further improved the performance of TTI models, resulting the
generation result nearly indistinguishable from real-world images,
revolutionizing the way we retrieval images. Our explorative study has
incentivised us to think that there are further ways of scaling text-to-image
models with the combination of innovative model architectures and prediction
enhancement techniques. We have divided the work of this survey into five main
sections wherein we detail the frameworks of major literature in order to delve
into the different types of text-to-image generation methods. Following this we
provide a detailed comparison and critique of these methods and offer possible
pathways of improvement for future work. In the future work, we argue that TTI
development could yield impressive productivity improvements for creation,
particularly in the context of the AIGC era, and could be extended to more
complex tasks such as video generation and 3D generation
- …