Search CORE

4,212 research outputs found

Total positivity for matroid Schubert varieties

Author: He Xuhua
Simpson Connor
Xie Kaitao
Publication venue
Publication date: 29/10/2023
Field of study

We define the totally nonnegative matroid Schubert variety

\mathcal Y_V

of a linear subspace

V \subset \mathbb R^n

. We show that

\mathcal Y_V

is a regular CW complex homeomorphic to a closed ball, with strata indexed by pairs of acyclic flats of the oriented matroid of

V

. This closely resembles the regularity theorem for totally nonnegative generalized flag varieties. As a corollary, we obtain a regular CW structure on the real matroid Schubert variety of

V

.Comment: Comments welcome

arXiv.org e-Print Archive

DeepSpeed Data Efficiency: Improving Deep Learning Model Quality and Training Efficiency via Efficient Data Sampling and Routing

Author: He Yuxiong
Holmes Connor
Li Cheng
Li Conglong
Wu Xiaoxia
Yao Zhewei
Zhang Minjia
Publication venue
Publication date: 14/01/2024
Field of study

Recent advances on deep learning models come at the price of formidable training cost. The increasing model size is one of the root causes, but another less-emphasized fact is that data scale is actually increasing at a similar speed as model scale, and the training cost is proportional to both of them. Compared to the rapidly evolving model architecture, how to efficiently use the training data (especially for the expensive foundation model pretraining) is both less explored and difficult to realize due to the lack of a convenient framework that focuses on data efficiency capabilities. To this end, we present DeepSpeed Data Efficiency, a framework that makes better use of data, increases training efficiency, and improves model quality. Specifically, we propose and combine two data efficiency techniques: efficient data sampling via a general curriculum learning library, and efficient data routing via a novel random layerwise token dropping technique. For GPT-3 1.3B language model pretraining, our work achieves 12.5x less data/time/cost (\$3.7K if rent on Azure), while still maintaining 95% of model quality compared to baseline with full data and cost (\$46.3K). For GPT-3 1.3B and BERT-large pretraining, our work can also achieve the same model quality with up to 2x less data/time/cost, or achieve better model quality under same data/time/cost. DeepSpeed Data Efficiency is easy to use and tune, enabling us to easily apply it and verify its benefit on additional tasks including GPT-3 MoE model pretraining and small-scale GPT-2/ViT finetuning.Comment: Published in AAAI 2024 Main Technical Track. Equal contribution by the first 3 authors. Code has been released as a part of https://github.com/microsoft/DeepSpeed. Part of this paper is from our previous arxiv report (arXiv:2211.11586

arXiv.org e-Print Archive

Random-LTD: Random and Layerwise Token Dropping Brings Efficient Training for Large-scale Transformers

Author: He Yuxiong
Holmes Connor
Li Cheng
Li Conglong
Wu Xiaoxia
Yao Zhewei
Zhang Minjia
Publication venue
Publication date: 17/11/2022
Field of study

Large-scale transformer models have become the de-facto architectures for various machine learning applications, e.g., CV and NLP. However, those large models also introduce prohibitive training costs. To mitigate this issue, we propose a novel random and layerwise token dropping method (random-LTD), which skips the computation of a subset of the input tokens at all middle layers. Particularly, random-LTD achieves considerable speedups and comparable accuracy as the standard training baseline. Compared to other token dropping methods, random-LTD does not require (1) any importance score-based metrics, (2) any special token treatment (e.g., [CLS]), and (3) many layers in full sequence length training except the first and the last layers. Besides, a new LayerToken learning rate schedule is proposed for pretraining problems that resolve the heavy tuning requirement for our proposed training mechanism. Finally, we demonstrate that random-LTD can be applied to broader applications, including GPT and BERT pretraining as well as ViT and GPT finetuning tasks. Our results show that random-LTD can save about 33.3% theoretical compute cost and 25.6% wall-clock training time while achieving similar zero-shot evaluations on GPT-31.3B as compared to baseline.Comment: 22 page

arXiv.org e-Print Archive

Molecular gas and star formation in nearby starburst galaxy mergers

Author: Bottrell Connor
Burkhart Blakesley
Hayward Christopher C.
He Hao
Hernquist Lars
Moreno Jorge
Twum Angela
Wilson Christine
Publication venue
Publication date: 31/03/2023
Field of study

We employ the Feedback In Realistic Environments (FIRE-2) physics model to study how the properties of giant molecular clouds (GMCs) evolve during galaxy mergers. We conduct a pixel-by-pixel analysis of molecular gas properties in both the simulated control galaxies and galaxy major mergers. The simulated GMC-pixels in the control galaxies follow a similar trend in a diagram of velocity dispersion (

\sigma_v

) versus gas surface density (

\Sigma_{\mathrm{mol}}

) to the one observed in local spiral galaxies in the Physics at High Angular resolution in Nearby GalaxieS (PHANGS) survey. For GMC-pixels in simulated mergers, we see a significant increase of factor of 5 - 10 in both

\Sigma_{\mathrm{mol}}

and

\sigma_v

, which puts these pixels above the trend of PHANGS galaxies in the

\sigma_v

\Sigma_{\mathrm{mol}}

diagram. This deviation may indicate that GMCs in the simulated mergers are much less gravitationally bound compared with simulated control galaxies with virial parameter (

\alpha_{\mathrm{vir}}

) reaching 10 - 100. Furthermore, we find that the increase in

\alpha_{\mathrm{vir}}

happens at the same time as the increase in global star formation rate (SFR), which suggests stellar feedback is responsible for dispersing the gas. We also find that the gas depletion time is significantly lower for high

\alpha_{\mathrm{vir}}

GMCs during a starburst event. This is in contrast to the simple physical picture that low

\alpha_{\mathrm{vir}}

GMCs are easier to collapse and form stars on shorter depletion times. This might suggest that some other physical mechanisms besides self-gravity are helping the GMCs in starbursting mergers collapse and form stars.Comment: 22 pages, 11 figures. Accepted to ApJ. Link to animation update

arXiv.org e-Print Archive

ZeRO++: Extremely Efficient Collective Communication for Giant Model Training

Author: He Yuxiong
Holmes Connor
Jacobs Sam Ade
Qin Heyang
Rajbhandari Samyam
Ruwase Olatunji
Wang Guanhua
Yan Feng
Yang Lei
Publication venue
Publication date: 16/06/2023
Field of study

Zero Redundancy Optimizer (ZeRO) has been used to train a wide range of large language models on massive GPUs clusters due to its ease of use, efficiency, and good scalability. However, when training on low-bandwidth clusters, or at scale which forces batch size per GPU to be small, ZeRO's effective throughput is limited because of high communication volume from gathering weights in forward pass, backward pass, and averaging gradients. This paper introduces three communication volume reduction techniques, which we collectively refer to as ZeRO++, targeting each of the communication collectives in ZeRO. First is block-quantization based all-gather. Second is data remapping that trades-off communication for more memory. Third is a novel all-to-all based quantized gradient averaging paradigm as replacement of reduce-scatter collective, which preserves accuracy despite communicating low precision data. Collectively, ZeRO++ reduces communication volume of ZeRO by 4x, enabling up to 2.16x better throughput at 384 GPU scale.Comment: 12 page

arXiv.org e-Print Archive

Uptake and toxicity studies of poly-acrylic acid functionalized silicon nanoparticles in cultured mammalian cells

Author: Alsharif
Altman
Arya
Bauer
Belomoin
Brunner
Chang
Chao
Chao
Connor
Coxon
Cui
Dachs
Delerue
Derfus
Dickinson
dos Santos
English
Ferrati
Fujioka
Gao
Goodman
Green
Hardman
He
He
Ho
Hoshino
Hu
Hu
Jamieson
Lewinski
Limbach
Liu
Magrez
Manna
Michalet
Monteiro-Riviere
Mosmann
Rogozhina
Ruizendaal
Ryman-Rasmussen
Sato
Sayes
Selvan
Seotsanyana-Mokhosi
Shukla
Sieval
Stark
Thanh
Thurn
Tilley
Vivero-Escoto
Wallart
Wang
Wang
Wang
Wong
Xu
Šiller
Publication venue: 'Wiley'
Publication date: 16/02/2012
Field of study

Poly-acrylic acid (PAAc) terminated silicon nanoparticles (SiNPs) have been synthesized and employed as a synchronous fluorescent signal indicator in a series of cultured mammalian cells: HHL5, HepG2 and 3T3-L1. Their biological effects on cell growth and proliferation in both human and mouse cell lines have been studied. There was no evidence of in vitro cytotoxity in the cells exposed to PAAc terminated SiNPS when assessed by cell morphology, cell proliferation and viability, and DNA damage assays. The uptake of the nanocrystals by both HepG2 and 3T3-L1 cells was investigated by confocal microscopy and flow cytometry, which showed a clear time-dependence at higher concentrations. Reconstructed 3-D confocal microscope images exhibited that the PAAc-SiNPs were evenly distributed throughout the cytosol rather than attached to outer membrane. This study provides fundamental evidence for the safe application and further modification of silicon nanoparticles, which could broaden their application as cell markers in living systems and in micelle encapsulated drug delivery systems

Crossref

University of East Anglia digital repository

Clinical Features, Survival and Prognostic Factors of Glycogen-Rich Clear Cell Carcinoma (GRCC) of the Breast in the U.S. Population

Author: Cheng Simon K.
Gentry Matthew S.
Guo Hua
He Chunyan
Hibshoosh Hanina
Kinslow Connor J.
Sun Ramon C.
Zhou Zhengqiu
Publication venue: UKnowledge
Publication date: 14/02/2019
Field of study

The World Health Organization (WHO) defines glycogen-rich clear cell carcinoma (GRCC) of the breast as a carcinoma with glycogen accumulation in more than 90% of its tumor cells. Due to the rarity of this disease, its reported survival and clinical associations have been inconsistent due to reliance on case reports and limited case series. As a result, the prognostic implication of this cancer subtype remains unclear. Using the U.S. Surveillance, Epidemiology, and End Results (SEER) program database, we compared the incidence, demographics and prognostic factors of 155 cases of GRCC of the breast to 1,251,584 cases of other (non-GRCC) breast carcinomas. We demonstrate that GRCC is more likely to be identified as high grade, advanced stage, and more likely to have triple negative receptor status. GRCC cases display a poorer prognosis than non-GRCC carcinomas of the breast irrespective of age, AJCC staging, tumor grade, joint hormone receptor/human epidermal growth factor receptor 2 (HER2) status, and treatment. Similar to non-GRCC carcinomas, older age and higher American Joint Committee on Cancer (AJCC)/TNM staging were associated with poorer prognosis for GRCC, while treatment with surgery and radiation were associated with improved survival. Radiation, specifically in the setting of breast-conserving surgery, further improved survival compared to surgery alone. Our study highlights the poorer prognosis associated with glycogen accumulation in breast cancers and hence stresses the importance of identifying this more aggressive tumor type

University of Kentucky

Some investigations into non passive listening

Author: A.R. Palmer
Anderson
Anllo-Vento
Benson
Benson
C. Sumner
Carlyon
Carlyon
Connor
Cusack
D.A. Hall
D.J.K. Barrett
D.R. Moore
Degerman
Ehret
Gaese
Griffiths
Griffiths
Griffiths
Hall
Hart
He
He
Hine
Hocherman
Hubel
K. Nakamoto
Kacelnik
Kanwisher
Karnath
Lomber
Luck
Ma
Macken
Maeder
Miller
Murphy
Naatanen
Newsome
Pfingst
Populin
Pressnitzer
Reynolds
Reynolds
Rutkowski
Rutkowski
Ryan
S. Jones
Salzman
Shackleton
Sillito
Syka
Tootell
Wallace
Wallace
Wallace
Wallace
Wallace
Wallace
Wallace
Wang
Weinberger
Winer
Winer
Winer
Yan
Zatorre
Zhang
Publication venue: Elsevier (not including Cell Press)
Publication date: 21/12/2006
Field of study

Our knowledge of the function of the auditory nervous system is based upon a wealth of data obtained, for the most part, in anaesthetised animals. More recently, it has been generally acknowledged that factors such as attention profoundly modulate the activity of sensory systems and this can take place at many levels of processing. Imaging studies, in particular, have revealed the greater activation of auditory areas and areas outside of sensory processing areas when attending to a stimulus. We present here a brief review of the consequences of such non-passive listening and go on to describe some of the experiments we are conducting to investigate them. In imaging studies, using fMRI, we can demonstrate the activation of attention networks that are non-specific to the sensory modality as well as greater and different activation of the areas of the supra-temporal plane that includes primary and secondary auditory areas. The profuse descending connections of the auditory system seem likely to be part of the mechanisms subserving attention to sound. These are generally thought to be largely inactivated by anaesthesia. However, we have been able to demonstrate that even in an anaesthetised preparation, removing the descending control from the cortex leads to quite profound changes in the temporal patterns of activation by sounds in thalamus and inferior colliculus. Some of these effects seem to be specific to the ear of stimulation and affect interaural processing. To bridge these observations we are developing an awake behaving preparation involving freely moving animals in which it will be possible to investigate the effects of consciousness (by contrasting awake and anaesthetized), passive and active listening

Crossref

Heriot Watt Pure

Nottingham Trent Institutional Repository (IRep)

The University of Manchester - Institutional Repository

Don’t turn your back on the symptoms of psychosis : a proof-of-principle, quasi-experimental public health trial to reduce the duration of untreated psychosis in Birmingham, UK

Author: A Malla
A Yung
AK Malla
B Lloyd-Evans
BB Sheitman
C Office
Charlotte Connor
Colin Palmer
D Linszen
Department of Health
H Goldman
H Krstev
HE Lester
Helen Lester
I Melle
Index of deprivation
J Prochaska
K Brunet
L Green
M Birchwood
M Marshall
Max Birchwood
Nick Freemantle
P Dolan
Paul Patterson
RJ Drake
S Carbone
S Rashid
SP Singh
SR Kay
Sunita Channa
Swaran Singh
TK Larsen
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2013
Field of study

Background: Reducing the duration of untreated psychosis (DUP) is an aspiration of international guidelines for first episode psychosis; however, public health initiatives have met with mixed results. Systematic reviews suggest that greater focus on the sources of delay within care pathways, (which will vary between healthcare settings) is needed to achieve sustainable reductions in DUP (BJP 198: 256-263; 2011). Methods/Design: A quasi-experimental trial, comparing a targeted intervention area with a ‘detection as usual’ area in the same city. A proof-of–principle trial, no a priori assumptions are made regarding effect size; key outcome will be an estimate of the potential effect size for a definitive trial. DUP and number of new cases will be collected over an 18-month period in target and control areas and compared; historical data on DUP collected in both areas over the previous three years, will serve as a benchmark. The intervention will focus on reducing two significant DUP component delays within the overall care pathway: delays within the mental health service and help-seeking delay. Discussion: This pragmatic trial will be the first to target known delays within the care pathway for those with a first episode of psychosis. If successful, this will provide a generalizable methodology that can be implemented in a variety of healthcare contexts with differing sources of delay. Trial registration: http://www.controlled-trials.com/ISRCTN45058713 Keywords: Public mental health campaign, First-episode psychosis, Early detection, Duration of untreated psychosis, Youth mental healt

Crossref

Springer - Publisher Connector

University of Birmingham Research Portal

UCL Discovery

PubMed Central

Warwick Research Archives Portal Repository

University of Melbourne Institutional Repository

RenAIssance: A Survey into AI Text-to-Image Generation in the Era of Large Model

Author: Bie Fengxiang
Clifton David A.
Ghanem Adam
Golnari Pareesa
He Yuxiong
Holmes Connor
Song Shuaiwen Leon
Tao Dacheng
Wu Xiaoxia
Yang Yibo
Yao Zhewei
Zhang Minjia
Zhou Zhongzhu
Publication venue
Publication date: 01/09/2023
Field of study

Text-to-image generation (TTI) refers to the usage of models that could process text input and generate high fidelity images based on text descriptions. Text-to-image generation using neural networks could be traced back to the emergence of Generative Adversial Network (GAN), followed by the autoregressive Transformer. Diffusion models are one prominent type of generative model used for the generation of images through the systematic introduction of noises with repeating steps. As an effect of the impressive results of diffusion models on image synthesis, it has been cemented as the major image decoder used by text-to-image models and brought text-to-image generation to the forefront of machine-learning (ML) research. In the era of large models, scaling up model size and the integration with large language models have further improved the performance of TTI models, resulting the generation result nearly indistinguishable from real-world images, revolutionizing the way we retrieval images. Our explorative study has incentivised us to think that there are further ways of scaling text-to-image models with the combination of innovative model architectures and prediction enhancement techniques. We have divided the work of this survey into five main sections wherein we detail the frameworks of major literature in order to delve into the different types of text-to-image generation methods. Following this we provide a detailed comparison and critique of these methods and offer possible pathways of improvement for future work. In the future work, we argue that TTI development could yield impressive productivity improvements for creation, particularly in the context of the AIGC era, and could be extended to more complex tasks such as video generation and 3D generation

arXiv.org e-Print Archive