151 research outputs found
Stochastic Collapse: How Gradient Noise Attracts SGD Dynamics Towards Simpler Subnetworks
In this work, we reveal a strong implicit bias of stochastic gradient descent
(SGD) that drives overly expressive networks to much simpler subnetworks,
thereby dramatically reducing the number of independent parameters, and
improving generalization. To reveal this bias, we identify invariant sets, or
subsets of parameter space that remain unmodified by SGD. We focus on two
classes of invariant sets that correspond to simpler (sparse or low-rank)
subnetworks and commonly appear in modern architectures. Our analysis uncovers
that SGD exhibits a property of stochastic attractivity towards these simpler
invariant sets. We establish a sufficient condition for stochastic attractivity
based on a competition between the loss landscape's curvature around the
invariant set and the noise introduced by stochastic gradients. Remarkably, we
find that an increased level of noise strengthens attractivity, leading to the
emergence of attractive invariant sets associated with saddle-points or local
maxima of the train loss. We observe empirically the existence of attractive
invariant sets in trained deep neural networks, implying that SGD dynamics
often collapses to simple subnetworks with either vanishing or redundant
neurons. We further demonstrate how this simplifying process of stochastic
collapse benefits generalization in a linear teacher-student framework.
Finally, through this analysis, we mechanistically explain why early training
with large learning rates for extended periods benefits subsequent
generalization.Comment: 37 pages, 12 figures, NeurIPS 202
The Asymmetric Maximum Margin Bias of Quasi-Homogeneous Neural Networks
In this work, we explore the maximum-margin bias of quasi-homogeneous neural
networks trained with gradient flow on an exponential loss and past a point of
separability. We introduce the class of quasi-homogeneous models, which is
expressive enough to describe nearly all neural networks with homogeneous
activations, even those with biases, residual connections, and normalization
layers, while structured enough to enable geometric analysis of its gradient
dynamics. Using this analysis, we generalize the existing results of
maximum-margin bias for homogeneous networks to this richer class of models. We
find that gradient flow implicitly favors a subset of the parameters, unlike in
the case of a homogeneous model where all parameters are treated equally. We
demonstrate through simple examples how this strong favoritism toward
minimizing an asymmetric norm can degrade the robustness of quasi-homogeneous
models. On the other hand, we conjecture that this norm-minimization discards,
when possible, unnecessary higher-order parameters, reducing the model to a
sparser parameterization. Lastly, by applying our theorem to sufficiently
expressive neural networks with normalization layers, we reveal a universal
mechanism behind the empirical phenomenon of Neural Collapse.Comment: 33 pages, 5 figure
Nonlinear Quantum Behavior of Ultrashort-Pulse Optical Parametric Oscillators
The quantum features of ultrashort-pulse optical parametric oscillators (OPOs) are theoretically investigated in the nonlinear regime near and above threshold. Starting from basic premises of input-output theory, we derive a general quantum model for pulsed OPOs subject to χ(2) interactions between a multimode signal cavity and a non-resonant broadband pump field, elucidating time scale conditions required for such pulsed OPOs to admit an input-output description. By employing a supermode decomposition of the nonlinear Lindblad operators governing pump-signal interactions, we perform multimode quantum simulations in the regime of strong nonlinearity and study effects such as pump depletion and corrections to the squeezing spectrum of the linearized model. We observe non-Gaussian states with Wigner function negativity and show that multimode interactions with the pump can act as decoherence channels
Hybrid Simulation between Molecular Dynamics and Binary Collision Approximation Codes for Hydrogen injection onto Carbon Materials
Molecular dynamics (MD) simulation with modified Brenner's reactive empirical
bond order (REBO) potential is a powerful tool to investigate plasma wall
interaction on divertor plates in a nuclear fusion device. However, MD
simulation box's size is less than several nm for the performance of a
computer. To extend the size of the MD simulation, we develop a hybrid
simulation code between MD code using REBO potential and binary collision
approximation (BCA) code. Using the BCA code instead of computing all particles
with a high kinetic energy for every step in the MD simulation, considerable
computation time is saved. By demonstrating a hydrogen atom injection on a
graphite by the hybrid simulation code, it is found that the hybrid simulation
code works efficiently in a large simulation box.Comment: 5 pages, 5 figure
High-Dimensional Non-Convex Landscapes and Gradient Descent Dynamics
In these lecture notes we present different methods and concepts developed in
statistical physics to analyze gradient descent dynamics in high-dimensional
non-convex landscapes. Our aim is to show how approaches developed in physics,
mainly statistical physics of disordered systems, can be used to tackle open
questions on high-dimensional dynamics in Machine Learning.Comment: Lectures given by G. Biroli at the 2022 Les Houches Summer School
"Statistical Physics and Machine Learning
Nonlinear Quantum Behavior of Ultrashort-Pulse Optical Parametric Oscillators
The quantum features of ultrashort-pulse optical parametric oscillators (OPOs) are theoretically investigated in the nonlinear regime near and above threshold. Starting from basic premises of input-output theory, we derive a general quantum model for pulsed OPOs subject to χ(2) interactions between a multimode signal cavity and a non-resonant broadband pump field, elucidating time scale conditions required for such pulsed OPOs to admit an input-output description. By employing a supermode decomposition of the nonlinear Lindblad operators governing pump-signal interactions, we perform multimode quantum simulations in the regime of strong nonlinearity and study effects such as pump depletion and corrections to the squeezing spectrum of the linearized model. We observe non-Gaussian states with Wigner function negativity and show that multimode interactions with the pump can act as decoherence channels
Quantitative activation-induced manganese-enhanced MRI reveals severity of Parkinson’s disease in mice
We demonstrate that activation-induced manganese-enhanced magnetic resonance imaging with quantitative determination of the longitudinal relaxation time (qAIM-MRI) reveals the severity of Parkinson’s disease (PD) in mice. We first show that manganese ion-accumulation depends on neuronal activity. A highly active region was then observed by qAIM-MRI in the caudate-putamen in PD-model mice that was significantly correlated to the severity of PD, suggesting its involvement in the expression of PD symptoms
Advanced gastrointestinal stromal tumor with intracerebral hemorrhage during sunitinib treatment\n
Herein, a 70-year-old female was initially treated with sunitinib 50 mg/day to treat an imatinib-resistant gastrointestinal stromal tumor. After sunitinib initiation, nausea, hypertension, hepatic dysfunction, anorexia, fatigue, thrombocytopenia, epistaxis, and palmoplantar erythrodysesthesia syndrome developed; the dose was reduced to 25 mg/day. Subsequently, adverse events improved, and from the fifth course onward, sunitinib 37.5 mg/day was continued. Approximately 11 months after initiating sunitinib therapy, the patient developed disturbance of consciousness, aphasia, and left hemiplegia. Computed tomography of the head revealed intracerebral hemorrhage, and the patient was hospitalized. No brain metastases, cerebral aneurysms, or cerebral arteriovenous malformations were observed. Sunitinib-induced hypertensive cerebral hemorrhage was suspected as the cause of intracerebral hemorrhage. Conservative treatments, such as antihypertensive drugs, were administered without surgical treatment. The symptoms and intracerebral hemorrhage gradually improved, and the patient was discharged from the hospital. Intracerebral hemorrhage with sunitinib is extremely rare, but has a high mortality rate. During sunitinib treatment, controlling blood pressure and thrombocytopenia is important to prevent bleeding
Association analysis of toll-like receptor 4 polymorphisms in Japanese primary biliary cirrhosis
Primary biliary cirrhosis (PBC) is characterized by portal inflammation and immune-mediated destruction of intrahepatic bile ducts that often result in liver failure. Toll-like receptor (TLR) 4 recognizes lipopolysaccharides of Gram-negative bacteria. Infectious agents have been suspected to play a crucial role in PBC pathogenesis since TLR4 expression was found in bile duct epithelial cells and periportal hepatocytes in liver tissues of PBC. To assess the potential contribution of TLR4 SNPs to the development of this disease, we genotyped five SNPs in TLR4 in 261 PBC patients and 359 controls using a TaqMan assay. No significant positive associations with either PBC susceptibility or progression were uncovered. These results indicate that TLR4 polymorphisms do not play a prominent role in the development of PBC in Japanese patients. (C) 2012 American Society for Histocompatibility and Immunogenetics. Published by Elsevier Inc. All rights reserved.ArticleHUMAN IMMUNOLOGY. 74(2):219-222 (2013)journal articl
- …