4,095 research outputs found

    Total positivity for matroid Schubert varieties

    Full text link
    We define the totally nonnegative matroid Schubert variety YV\mathcal Y_V of a linear subspace V⊂RnV \subset \mathbb R^n. We show that YV\mathcal Y_V is a regular CW complex homeomorphic to a closed ball, with strata indexed by pairs of acyclic flats of the oriented matroid of VV. This closely resembles the regularity theorem for totally nonnegative generalized flag varieties. As a corollary, we obtain a regular CW structure on the real matroid Schubert variety of VV.Comment: Comments welcome

    Random-LTD: Random and Layerwise Token Dropping Brings Efficient Training for Large-scale Transformers

    Full text link
    Large-scale transformer models have become the de-facto architectures for various machine learning applications, e.g., CV and NLP. However, those large models also introduce prohibitive training costs. To mitigate this issue, we propose a novel random and layerwise token dropping method (random-LTD), which skips the computation of a subset of the input tokens at all middle layers. Particularly, random-LTD achieves considerable speedups and comparable accuracy as the standard training baseline. Compared to other token dropping methods, random-LTD does not require (1) any importance score-based metrics, (2) any special token treatment (e.g., [CLS]), and (3) many layers in full sequence length training except the first and the last layers. Besides, a new LayerToken learning rate schedule is proposed for pretraining problems that resolve the heavy tuning requirement for our proposed training mechanism. Finally, we demonstrate that random-LTD can be applied to broader applications, including GPT and BERT pretraining as well as ViT and GPT finetuning tasks. Our results show that random-LTD can save about 33.3% theoretical compute cost and 25.6% wall-clock training time while achieving similar zero-shot evaluations on GPT-31.3B as compared to baseline.Comment: 22 page

    Molecular gas and star formation in nearby starburst galaxy mergers

    Full text link
    We employ the Feedback In Realistic Environments (FIRE-2) physics model to study how the properties of giant molecular clouds (GMCs) evolve during galaxy mergers. We conduct a pixel-by-pixel analysis of molecular gas properties in both the simulated control galaxies and galaxy major mergers. The simulated GMC-pixels in the control galaxies follow a similar trend in a diagram of velocity dispersion (σv\sigma_v) versus gas surface density (ÎŁmol\Sigma_{\mathrm{mol}}) to the one observed in local spiral galaxies in the Physics at High Angular resolution in Nearby GalaxieS (PHANGS) survey. For GMC-pixels in simulated mergers, we see a significant increase of factor of 5 - 10 in both ÎŁmol\Sigma_{\mathrm{mol}} and σv\sigma_v, which puts these pixels above the trend of PHANGS galaxies in the σv\sigma_v vs ÎŁmol\Sigma_{\mathrm{mol}} diagram. This deviation may indicate that GMCs in the simulated mergers are much less gravitationally bound compared with simulated control galaxies with virial parameter (αvir\alpha_{\mathrm{vir}}) reaching 10 - 100. Furthermore, we find that the increase in αvir\alpha_{\mathrm{vir}} happens at the same time as the increase in global star formation rate (SFR), which suggests stellar feedback is responsible for dispersing the gas. We also find that the gas depletion time is significantly lower for high αvir\alpha_{\mathrm{vir}} GMCs during a starburst event. This is in contrast to the simple physical picture that low αvir\alpha_{\mathrm{vir}} GMCs are easier to collapse and form stars on shorter depletion times. This might suggest that some other physical mechanisms besides self-gravity are helping the GMCs in starbursting mergers collapse and form stars.Comment: 22 pages, 11 figures. Accepted to ApJ. Link to animation update

    ZeRO++: Extremely Efficient Collective Communication for Giant Model Training

    Full text link
    Zero Redundancy Optimizer (ZeRO) has been used to train a wide range of large language models on massive GPUs clusters due to its ease of use, efficiency, and good scalability. However, when training on low-bandwidth clusters, or at scale which forces batch size per GPU to be small, ZeRO's effective throughput is limited because of high communication volume from gathering weights in forward pass, backward pass, and averaging gradients. This paper introduces three communication volume reduction techniques, which we collectively refer to as ZeRO++, targeting each of the communication collectives in ZeRO. First is block-quantization based all-gather. Second is data remapping that trades-off communication for more memory. Third is a novel all-to-all based quantized gradient averaging paradigm as replacement of reduce-scatter collective, which preserves accuracy despite communicating low precision data. Collectively, ZeRO++ reduces communication volume of ZeRO by 4x, enabling up to 2.16x better throughput at 384 GPU scale.Comment: 12 page

    Uptake and toxicity studies of poly-acrylic acid functionalized silicon nanoparticles in cultured mammalian cells

    Get PDF
    Poly-acrylic acid (PAAc) terminated silicon nanoparticles (SiNPs) have been synthesized and employed as a synchronous fluorescent signal indicator in a series of cultured mammalian cells: HHL5, HepG2 and 3T3-L1. Their biological effects on cell growth and proliferation in both human and mouse cell lines have been studied. There was no evidence of in vitro cytotoxity in the cells exposed to PAAc terminated SiNPS when assessed by cell morphology, cell proliferation and viability, and DNA damage assays. The uptake of the nanocrystals by both HepG2 and 3T3-L1 cells was investigated by confocal microscopy and flow cytometry, which showed a clear time-dependence at higher concentrations. Reconstructed 3-D confocal microscope images exhibited that the PAAc-SiNPs were evenly distributed throughout the cytosol rather than attached to outer membrane. This study provides fundamental evidence for the safe application and further modification of silicon nanoparticles, which could broaden their application as cell markers in living systems and in micelle encapsulated drug delivery systems

    Clinical Features, Survival and Prognostic Factors of Glycogen-Rich Clear Cell Carcinoma (GRCC) of the Breast in the U.S. Population

    Get PDF
    The World Health Organization (WHO) defines glycogen-rich clear cell carcinoma (GRCC) of the breast as a carcinoma with glycogen accumulation in more than 90% of its tumor cells. Due to the rarity of this disease, its reported survival and clinical associations have been inconsistent due to reliance on case reports and limited case series. As a result, the prognostic implication of this cancer subtype remains unclear. Using the U.S. Surveillance, Epidemiology, and End Results (SEER) program database, we compared the incidence, demographics and prognostic factors of 155 cases of GRCC of the breast to 1,251,584 cases of other (non-GRCC) breast carcinomas. We demonstrate that GRCC is more likely to be identified as high grade, advanced stage, and more likely to have triple negative receptor status. GRCC cases display a poorer prognosis than non-GRCC carcinomas of the breast irrespective of age, AJCC staging, tumor grade, joint hormone receptor/human epidermal growth factor receptor 2 (HER2) status, and treatment. Similar to non-GRCC carcinomas, older age and higher American Joint Committee on Cancer (AJCC)/TNM staging were associated with poorer prognosis for GRCC, while treatment with surgery and radiation were associated with improved survival. Radiation, specifically in the setting of breast-conserving surgery, further improved survival compared to surgery alone. Our study highlights the poorer prognosis associated with glycogen accumulation in breast cancers and hence stresses the importance of identifying this more aggressive tumor type

    Some investigations into non passive listening

    Get PDF
    Our knowledge of the function of the auditory nervous system is based upon a wealth of data obtained, for the most part, in anaesthetised animals. More recently, it has been generally acknowledged that factors such as attention profoundly modulate the activity of sensory systems and this can take place at many levels of processing. Imaging studies, in particular, have revealed the greater activation of auditory areas and areas outside of sensory processing areas when attending to a stimulus. We present here a brief review of the consequences of such non-passive listening and go on to describe some of the experiments we are conducting to investigate them. In imaging studies, using fMRI, we can demonstrate the activation of attention networks that are non-specific to the sensory modality as well as greater and different activation of the areas of the supra-temporal plane that includes primary and secondary auditory areas. The profuse descending connections of the auditory system seem likely to be part of the mechanisms subserving attention to sound. These are generally thought to be largely inactivated by anaesthesia. However, we have been able to demonstrate that even in an anaesthetised preparation, removing the descending control from the cortex leads to quite profound changes in the temporal patterns of activation by sounds in thalamus and inferior colliculus. Some of these effects seem to be specific to the ear of stimulation and affect interaural processing. To bridge these observations we are developing an awake behaving preparation involving freely moving animals in which it will be possible to investigate the effects of consciousness (by contrasting awake and anaesthetized), passive and active listening

    Don’t turn your back on the symptoms of psychosis : a proof-of-principle, quasi-experimental public health trial to reduce the duration of untreated psychosis in Birmingham, UK

    Get PDF
    Background: Reducing the duration of untreated psychosis (DUP) is an aspiration of international guidelines for first episode psychosis; however, public health initiatives have met with mixed results. Systematic reviews suggest that greater focus on the sources of delay within care pathways, (which will vary between healthcare settings) is needed to achieve sustainable reductions in DUP (BJP 198: 256-263; 2011). Methods/Design: A quasi-experimental trial, comparing a targeted intervention area with a ‘detection as usual’ area in the same city. A proof-of–principle trial, no a priori assumptions are made regarding effect size; key outcome will be an estimate of the potential effect size for a definitive trial. DUP and number of new cases will be collected over an 18-month period in target and control areas and compared; historical data on DUP collected in both areas over the previous three years, will serve as a benchmark. The intervention will focus on reducing two significant DUP component delays within the overall care pathway: delays within the mental health service and help-seeking delay. Discussion: This pragmatic trial will be the first to target known delays within the care pathway for those with a first episode of psychosis. If successful, this will provide a generalizable methodology that can be implemented in a variety of healthcare contexts with differing sources of delay. Trial registration: http://www.controlled-trials.com/ISRCTN45058713 Keywords: Public mental health campaign, First-episode psychosis, Early detection, Duration of untreated psychosis, Youth mental healt

    RenAIssance: A Survey into AI Text-to-Image Generation in the Era of Large Model

    Full text link
    Text-to-image generation (TTI) refers to the usage of models that could process text input and generate high fidelity images based on text descriptions. Text-to-image generation using neural networks could be traced back to the emergence of Generative Adversial Network (GAN), followed by the autoregressive Transformer. Diffusion models are one prominent type of generative model used for the generation of images through the systematic introduction of noises with repeating steps. As an effect of the impressive results of diffusion models on image synthesis, it has been cemented as the major image decoder used by text-to-image models and brought text-to-image generation to the forefront of machine-learning (ML) research. In the era of large models, scaling up model size and the integration with large language models have further improved the performance of TTI models, resulting the generation result nearly indistinguishable from real-world images, revolutionizing the way we retrieval images. Our explorative study has incentivised us to think that there are further ways of scaling text-to-image models with the combination of innovative model architectures and prediction enhancement techniques. We have divided the work of this survey into five main sections wherein we detail the frameworks of major literature in order to delve into the different types of text-to-image generation methods. Following this we provide a detailed comparison and critique of these methods and offer possible pathways of improvement for future work. In the future work, we argue that TTI development could yield impressive productivity improvements for creation, particularly in the context of the AIGC era, and could be extended to more complex tasks such as video generation and 3D generation
    • 

    corecore