156 research outputs found

    Stochastic Collapse: How Gradient Noise Attracts SGD Dynamics Towards Simpler Subnetworks

    Full text link
    In this work, we reveal a strong implicit bias of stochastic gradient descent (SGD) that drives overly expressive networks to much simpler subnetworks, thereby dramatically reducing the number of independent parameters, and improving generalization. To reveal this bias, we identify invariant sets, or subsets of parameter space that remain unmodified by SGD. We focus on two classes of invariant sets that correspond to simpler (sparse or low-rank) subnetworks and commonly appear in modern architectures. Our analysis uncovers that SGD exhibits a property of stochastic attractivity towards these simpler invariant sets. We establish a sufficient condition for stochastic attractivity based on a competition between the loss landscape's curvature around the invariant set and the noise introduced by stochastic gradients. Remarkably, we find that an increased level of noise strengthens attractivity, leading to the emergence of attractive invariant sets associated with saddle-points or local maxima of the train loss. We observe empirically the existence of attractive invariant sets in trained deep neural networks, implying that SGD dynamics often collapses to simple subnetworks with either vanishing or redundant neurons. We further demonstrate how this simplifying process of stochastic collapse benefits generalization in a linear teacher-student framework. Finally, through this analysis, we mechanistically explain why early training with large learning rates for extended periods benefits subsequent generalization.Comment: 37 pages, 12 figures, NeurIPS 202

    The Asymmetric Maximum Margin Bias of Quasi-Homogeneous Neural Networks

    Full text link
    In this work, we explore the maximum-margin bias of quasi-homogeneous neural networks trained with gradient flow on an exponential loss and past a point of separability. We introduce the class of quasi-homogeneous models, which is expressive enough to describe nearly all neural networks with homogeneous activations, even those with biases, residual connections, and normalization layers, while structured enough to enable geometric analysis of its gradient dynamics. Using this analysis, we generalize the existing results of maximum-margin bias for homogeneous networks to this richer class of models. We find that gradient flow implicitly favors a subset of the parameters, unlike in the case of a homogeneous model where all parameters are treated equally. We demonstrate through simple examples how this strong favoritism toward minimizing an asymmetric norm can degrade the robustness of quasi-homogeneous models. On the other hand, we conjecture that this norm-minimization discards, when possible, unnecessary higher-order parameters, reducing the model to a sparser parameterization. Lastly, by applying our theorem to sufficiently expressive neural networks with normalization layers, we reveal a universal mechanism behind the empirical phenomenon of Neural Collapse.Comment: 33 pages, 5 figure

    Nonlinear Quantum Behavior of Ultrashort-Pulse Optical Parametric Oscillators

    Get PDF
    The quantum features of ultrashort-pulse optical parametric oscillators (OPOs) are theoretically investigated in the nonlinear regime near and above threshold. Starting from basic premises of input-output theory, we derive a general quantum model for pulsed OPOs subject to χ(2) interactions between a multimode signal cavity and a non-resonant broadband pump field, elucidating time scale conditions required for such pulsed OPOs to admit an input-output description. By employing a supermode decomposition of the nonlinear Lindblad operators governing pump-signal interactions, we perform multimode quantum simulations in the regime of strong nonlinearity and study effects such as pump depletion and corrections to the squeezing spectrum of the linearized model. We observe non-Gaussian states with Wigner function negativity and show that multimode interactions with the pump can act as decoherence channels

    Hybrid Simulation between Molecular Dynamics and Binary Collision Approximation Codes for Hydrogen injection onto Carbon Materials

    Full text link
    Molecular dynamics (MD) simulation with modified Brenner's reactive empirical bond order (REBO) potential is a powerful tool to investigate plasma wall interaction on divertor plates in a nuclear fusion device. However, MD simulation box's size is less than several nm for the performance of a computer. To extend the size of the MD simulation, we develop a hybrid simulation code between MD code using REBO potential and binary collision approximation (BCA) code. Using the BCA code instead of computing all particles with a high kinetic energy for every step in the MD simulation, considerable computation time is saved. By demonstrating a hydrogen atom injection on a graphite by the hybrid simulation code, it is found that the hybrid simulation code works efficiently in a large simulation box.Comment: 5 pages, 5 figure

    Interplay of cytokines in the pathophysiology of atopic dermatitis: insights from Murin models and human

    Get PDF
    The pathogenesis of atopic dermatitis (AD) is understood to be crucially influenced by three main factors: dysregulation of the immune response, barrier dysfunction, and pruritus. In the lesional skin of AD, various innate immune cells, including Th2 cells, type 2 innate lymphoid cells (ILC2s), and basophils, produce Th2 cytokines [interleukin (IL)-4, IL-5, IL-13, IL-31]. Alarmins such as TSLP, IL-25, and IL-33 are also produced by epidermal keratinocytes, amplifying type 2 inflammation. In the chronic phase, not only Th2 cells but also Th22 and Th17 cells increase in number, leading to suppression of filaggrin expression by IL-4, IL-13, and IL-22, which further deteriorates the epidermal barrier function. Dupilumab, which targets IL-4 and IL-13, has shown efficacy in treating moderate to severe AD. Nemolizumab, targeting IL-31RA, effectively reduces pruritus in AD patients. In addition, clinical trials with fezakinumab, targeting IL-22, have demonstrated promising results, particularly in severe AD cases. Conversely, in murine models of AD, several cytokines, initially regarded as promising therapeutic targets, have not demonstrated sufficient efficacy in clinical trials. IL-33 has been identified as a potent activator of immune cells, exacerbating AD in murine models and correlating with disease severity in human patients. However, treatments targeting IL-33 have not shown sufficient efficacy in clinical trials. Similarly, thymic stromal lymphopoietin (TSLP), integral to type 2 immune responses, induces dermatitis in animal models and is elevated in human AD, yet clinical treatments like tezepelumab exhibit limited efficacy. Therapies targeting IL-1α, IL-5, and IL-17 also failed to achieve sufficient efficacy in clinical trials. It has become clear that for treating AD, IL-4, IL-13, and IL-31 are relevant therapeutic targets during the acute phase, while IL-22 emerges as a target in more severe cases. This delineation underscores the necessity of considering distinct pathophysiological aspects and therapeutic targets in AD between mouse models and humans. Consequently, this review delineates the distinct roles of cytokines in the pathogenesis of AD, juxtaposing their significance in human AD from clinical trials against insights gleaned from AD mouse models. This approach will improve our understanding of interspecies variation and facilitate a deeper insight into the pathogenesis of AD in humans

    High-Dimensional Non-Convex Landscapes and Gradient Descent Dynamics

    Full text link
    In these lecture notes we present different methods and concepts developed in statistical physics to analyze gradient descent dynamics in high-dimensional non-convex landscapes. Our aim is to show how approaches developed in physics, mainly statistical physics of disordered systems, can be used to tackle open questions on high-dimensional dynamics in Machine Learning.Comment: Lectures given by G. Biroli at the 2022 Les Houches Summer School "Statistical Physics and Machine Learning

    Nonlinear Quantum Behavior of Ultrashort-Pulse Optical Parametric Oscillators

    Get PDF
    The quantum features of ultrashort-pulse optical parametric oscillators (OPOs) are theoretically investigated in the nonlinear regime near and above threshold. Starting from basic premises of input-output theory, we derive a general quantum model for pulsed OPOs subject to χ(2) interactions between a multimode signal cavity and a non-resonant broadband pump field, elucidating time scale conditions required for such pulsed OPOs to admit an input-output description. By employing a supermode decomposition of the nonlinear Lindblad operators governing pump-signal interactions, we perform multimode quantum simulations in the regime of strong nonlinearity and study effects such as pump depletion and corrections to the squeezing spectrum of the linearized model. We observe non-Gaussian states with Wigner function negativity and show that multimode interactions with the pump can act as decoherence channels

    Quantitative activation-induced manganese-enhanced MRI reveals severity of Parkinson’s disease in mice

    Get PDF
    We demonstrate that activation-induced manganese-enhanced magnetic resonance imaging with quantitative determination of the longitudinal relaxation time (qAIM-MRI) reveals the severity of Parkinson’s disease (PD) in mice. We first show that manganese ion-accumulation depends on neuronal activity. A highly active region was then observed by qAIM-MRI in the caudate-putamen in PD-model mice that was significantly correlated to the severity of PD, suggesting its involvement in the expression of PD symptoms

    Advanced gastrointestinal stromal tumor with intracerebral hemorrhage during sunitinib treatment\n

    Get PDF
     Herein, a 70-year-old female was initially treated with sunitinib 50 mg/day to treat an imatinib-resistant gastrointestinal stromal tumor. After sunitinib initiation, nausea, hypertension, hepatic dysfunction, anorexia, fatigue, thrombocytopenia, epistaxis, and palmoplantar erythrodysesthesia syndrome developed; the dose was reduced to 25 mg/day. Subsequently, adverse events improved, and from the fifth course onward, sunitinib 37.5 mg/day was continued. Approximately 11 months after initiating sunitinib therapy, the patient developed disturbance of consciousness, aphasia, and left hemiplegia. Computed tomography of the head revealed intracerebral hemorrhage, and the patient was hospitalized. No brain metastases, cerebral aneurysms, or cerebral arteriovenous malformations were observed. Sunitinib-induced hypertensive cerebral hemorrhage was suspected as the cause of intracerebral hemorrhage. Conservative treatments, such as antihypertensive drugs, were administered without surgical treatment. The symptoms and intracerebral hemorrhage gradually improved, and the patient was discharged from the hospital. Intracerebral hemorrhage with sunitinib is extremely rare, but has a high mortality rate. During sunitinib treatment, controlling blood pressure and thrombocytopenia is important to prevent bleeding
    corecore