151 research outputs found

    Stochastic Collapse: How Gradient Noise Attracts SGD Dynamics Towards Simpler Subnetworks

    Full text link
    In this work, we reveal a strong implicit bias of stochastic gradient descent (SGD) that drives overly expressive networks to much simpler subnetworks, thereby dramatically reducing the number of independent parameters, and improving generalization. To reveal this bias, we identify invariant sets, or subsets of parameter space that remain unmodified by SGD. We focus on two classes of invariant sets that correspond to simpler (sparse or low-rank) subnetworks and commonly appear in modern architectures. Our analysis uncovers that SGD exhibits a property of stochastic attractivity towards these simpler invariant sets. We establish a sufficient condition for stochastic attractivity based on a competition between the loss landscape's curvature around the invariant set and the noise introduced by stochastic gradients. Remarkably, we find that an increased level of noise strengthens attractivity, leading to the emergence of attractive invariant sets associated with saddle-points or local maxima of the train loss. We observe empirically the existence of attractive invariant sets in trained deep neural networks, implying that SGD dynamics often collapses to simple subnetworks with either vanishing or redundant neurons. We further demonstrate how this simplifying process of stochastic collapse benefits generalization in a linear teacher-student framework. Finally, through this analysis, we mechanistically explain why early training with large learning rates for extended periods benefits subsequent generalization.Comment: 37 pages, 12 figures, NeurIPS 202

    The Asymmetric Maximum Margin Bias of Quasi-Homogeneous Neural Networks

    Full text link
    In this work, we explore the maximum-margin bias of quasi-homogeneous neural networks trained with gradient flow on an exponential loss and past a point of separability. We introduce the class of quasi-homogeneous models, which is expressive enough to describe nearly all neural networks with homogeneous activations, even those with biases, residual connections, and normalization layers, while structured enough to enable geometric analysis of its gradient dynamics. Using this analysis, we generalize the existing results of maximum-margin bias for homogeneous networks to this richer class of models. We find that gradient flow implicitly favors a subset of the parameters, unlike in the case of a homogeneous model where all parameters are treated equally. We demonstrate through simple examples how this strong favoritism toward minimizing an asymmetric norm can degrade the robustness of quasi-homogeneous models. On the other hand, we conjecture that this norm-minimization discards, when possible, unnecessary higher-order parameters, reducing the model to a sparser parameterization. Lastly, by applying our theorem to sufficiently expressive neural networks with normalization layers, we reveal a universal mechanism behind the empirical phenomenon of Neural Collapse.Comment: 33 pages, 5 figure

    Nonlinear Quantum Behavior of Ultrashort-Pulse Optical Parametric Oscillators

    Get PDF
    The quantum features of ultrashort-pulse optical parametric oscillators (OPOs) are theoretically investigated in the nonlinear regime near and above threshold. Starting from basic premises of input-output theory, we derive a general quantum model for pulsed OPOs subject to χ(2) interactions between a multimode signal cavity and a non-resonant broadband pump field, elucidating time scale conditions required for such pulsed OPOs to admit an input-output description. By employing a supermode decomposition of the nonlinear Lindblad operators governing pump-signal interactions, we perform multimode quantum simulations in the regime of strong nonlinearity and study effects such as pump depletion and corrections to the squeezing spectrum of the linearized model. We observe non-Gaussian states with Wigner function negativity and show that multimode interactions with the pump can act as decoherence channels

    Hybrid Simulation between Molecular Dynamics and Binary Collision Approximation Codes for Hydrogen injection onto Carbon Materials

    Full text link
    Molecular dynamics (MD) simulation with modified Brenner's reactive empirical bond order (REBO) potential is a powerful tool to investigate plasma wall interaction on divertor plates in a nuclear fusion device. However, MD simulation box's size is less than several nm for the performance of a computer. To extend the size of the MD simulation, we develop a hybrid simulation code between MD code using REBO potential and binary collision approximation (BCA) code. Using the BCA code instead of computing all particles with a high kinetic energy for every step in the MD simulation, considerable computation time is saved. By demonstrating a hydrogen atom injection on a graphite by the hybrid simulation code, it is found that the hybrid simulation code works efficiently in a large simulation box.Comment: 5 pages, 5 figure

    High-Dimensional Non-Convex Landscapes and Gradient Descent Dynamics

    Full text link
    In these lecture notes we present different methods and concepts developed in statistical physics to analyze gradient descent dynamics in high-dimensional non-convex landscapes. Our aim is to show how approaches developed in physics, mainly statistical physics of disordered systems, can be used to tackle open questions on high-dimensional dynamics in Machine Learning.Comment: Lectures given by G. Biroli at the 2022 Les Houches Summer School "Statistical Physics and Machine Learning

    Nonlinear Quantum Behavior of Ultrashort-Pulse Optical Parametric Oscillators

    Get PDF
    The quantum features of ultrashort-pulse optical parametric oscillators (OPOs) are theoretically investigated in the nonlinear regime near and above threshold. Starting from basic premises of input-output theory, we derive a general quantum model for pulsed OPOs subject to χ(2) interactions between a multimode signal cavity and a non-resonant broadband pump field, elucidating time scale conditions required for such pulsed OPOs to admit an input-output description. By employing a supermode decomposition of the nonlinear Lindblad operators governing pump-signal interactions, we perform multimode quantum simulations in the regime of strong nonlinearity and study effects such as pump depletion and corrections to the squeezing spectrum of the linearized model. We observe non-Gaussian states with Wigner function negativity and show that multimode interactions with the pump can act as decoherence channels

    Quantitative activation-induced manganese-enhanced MRI reveals severity of Parkinson’s disease in mice

    Get PDF
    We demonstrate that activation-induced manganese-enhanced magnetic resonance imaging with quantitative determination of the longitudinal relaxation time (qAIM-MRI) reveals the severity of Parkinson’s disease (PD) in mice. We first show that manganese ion-accumulation depends on neuronal activity. A highly active region was then observed by qAIM-MRI in the caudate-putamen in PD-model mice that was significantly correlated to the severity of PD, suggesting its involvement in the expression of PD symptoms

    Advanced gastrointestinal stromal tumor with intracerebral hemorrhage during sunitinib treatment\n

    Get PDF
     Herein, a 70-year-old female was initially treated with sunitinib 50 mg/day to treat an imatinib-resistant gastrointestinal stromal tumor. After sunitinib initiation, nausea, hypertension, hepatic dysfunction, anorexia, fatigue, thrombocytopenia, epistaxis, and palmoplantar erythrodysesthesia syndrome developed; the dose was reduced to 25 mg/day. Subsequently, adverse events improved, and from the fifth course onward, sunitinib 37.5 mg/day was continued. Approximately 11 months after initiating sunitinib therapy, the patient developed disturbance of consciousness, aphasia, and left hemiplegia. Computed tomography of the head revealed intracerebral hemorrhage, and the patient was hospitalized. No brain metastases, cerebral aneurysms, or cerebral arteriovenous malformations were observed. Sunitinib-induced hypertensive cerebral hemorrhage was suspected as the cause of intracerebral hemorrhage. Conservative treatments, such as antihypertensive drugs, were administered without surgical treatment. The symptoms and intracerebral hemorrhage gradually improved, and the patient was discharged from the hospital. Intracerebral hemorrhage with sunitinib is extremely rare, but has a high mortality rate. During sunitinib treatment, controlling blood pressure and thrombocytopenia is important to prevent bleeding

    Association analysis of toll-like receptor 4 polymorphisms in Japanese primary biliary cirrhosis

    Get PDF
    Primary biliary cirrhosis (PBC) is characterized by portal inflammation and immune-mediated destruction of intrahepatic bile ducts that often result in liver failure. Toll-like receptor (TLR) 4 recognizes lipopolysaccharides of Gram-negative bacteria. Infectious agents have been suspected to play a crucial role in PBC pathogenesis since TLR4 expression was found in bile duct epithelial cells and periportal hepatocytes in liver tissues of PBC. To assess the potential contribution of TLR4 SNPs to the development of this disease, we genotyped five SNPs in TLR4 in 261 PBC patients and 359 controls using a TaqMan assay. No significant positive associations with either PBC susceptibility or progression were uncovered. These results indicate that TLR4 polymorphisms do not play a prominent role in the development of PBC in Japanese patients. (C) 2012 American Society for Histocompatibility and Immunogenetics. Published by Elsevier Inc. All rights reserved.ArticleHUMAN IMMUNOLOGY. 74(2):219-222 (2013)journal articl
    corecore