4,885 research outputs found

    Benign Oscillation of Stochastic Gradient Descent with Large Learning Rates

    Full text link
    In this work, we theoretically investigate the generalization properties of neural networks (NN) trained by stochastic gradient descent (SGD) algorithm with large learning rates. Under such a training regime, our finding is that, the oscillation of the NN weights caused by the large learning rate SGD training turns out to be beneficial to the generalization of the NN, which potentially improves over the same NN trained by SGD with small learning rates that converges more smoothly. In view of this finding, we call such a phenomenon "benign oscillation". Our theory towards demystifying such a phenomenon builds upon the feature learning perspective of deep learning. Specifically, we consider a feature-noise data generation model that consists of (i) weak features which have a small â„“2\ell_2-norm and appear in each data point; (ii) strong features which have a larger â„“2\ell_2-norm but only appear in a certain fraction of all data points; and (iii) noise. We prove that NNs trained by oscillating SGD with a large learning rate can effectively learn the weak features in the presence of those strong features. In contrast, NNs trained by SGD with a small learning rate can only learn the strong features but makes little progress in learning the weak features. Consequently, when it comes to the new testing data which consist of only weak features, the NN trained by oscillating SGD with a large learning rate could still make correct predictions consistently, while the NN trained by small learning rate SGD fails. Our theory sheds light on how large learning rate training benefits the generalization of NNs. Experimental results demonstrate our finding on "benign oscillation".Comment: 63 pages, 10 figure

    Adaptive Control Based On Neural Network

    Get PDF

    Fenton Reagent Oxidation and Decolorizing Reaction Kinetics of Reactive Red SBE

    Get PDF
    AbstractFenton reagent was employed to treat and decolorize the wastewater of Reactive Red SBE by on-line spectrophotometry. The effects of initial FeSO4 concentration, initial H2O2 concentration, pH, reactive red SBE and temperature on the decoloration of reactive red SBE were investigated. The results show that Fenton oxidize process follows pseudo first order kinetics in the first stage and reaction activation energy is 2.608 kJ/mol. The decolorizing reaction rate constants (k) increase with the rise of FeSO4 concentration, H2O2 concentration, temperature, but decrease with the rise of reactive red SBE, and the optimum pH is 3. Initial FeSO4 concentration and initial H2O2 concentration against k are linear correlation

    A Robust and Constrained Multi-Agent Reinforcement Learning Framework for Electric Vehicle AMoD Systems

    Full text link
    Electric vehicles (EVs) play critical roles in autonomous mobility-on-demand (AMoD) systems, but their unique charging patterns increase the model uncertainties in AMoD systems (e.g. state transition probability). Since there usually exists a mismatch between the training and test (true) environments, incorporating model uncertainty into system design is of critical importance in real-world applications. However, model uncertainties have not been considered explicitly in EV AMoD system rebalancing by existing literature yet and remain an urgent and challenging task. In this work, we design a robust and constrained multi-agent reinforcement learning (MARL) framework with transition kernel uncertainty for the EV rebalancing and charging problem. We then propose a robust and constrained MARL algorithm (ROCOMA) that trains a robust EV rebalancing policy to balance the supply-demand ratio and the charging utilization rate across the whole city under state transition uncertainty. Experiments show that the ROCOMA can learn an effective and robust rebalancing policy. It outperforms non-robust MARL methods when there are model uncertainties. It increases the system fairness by 19.6% and decreases the rebalancing costs by 75.8%.Comment: 8 page

    Phase retrieval from single biomolecule diffraction pattern

    Full text link
    In this paper, we propose the SPR (sparse phase retrieval) method, which is a new phase retrieval method for coherent x-ray diffraction imaging (CXDI). Conventional phase retrieval methods effectively solve the problem for high signal-to-noise ratio measurements, but would not be sufficient for single biomolecular imaging which is expected to be realized with femto-second x-ray free electron laser pulses. The SPR method is based on the Bayesian statistics. It does not need to set the object boundary constraint that is required by the commonly used hybrid input-output (HIO) method, instead a prior distribution is defined with an exponential distribution and used for the estimation. Simulation results demonstrate that the proposed method reconstructs the electron density under a noisy condition even some central pixels are masked.Comment: 13 pages, 13 figures, submitted for a journa

    GDL-DS: A Benchmark for Geometric Deep Learning under Distribution Shifts

    Full text link
    Geometric deep learning (GDL) has gained significant attention in various scientific fields, chiefly for its proficiency in modeling data with intricate geometric structures. Yet, very few works have delved into its capability of tackling the distribution shift problem, a prevalent challenge in many relevant applications. To bridge this gap, we propose GDL-DS, a comprehensive benchmark designed for evaluating the performance of GDL models in scenarios with distribution shifts. Our evaluation datasets cover diverse scientific domains from particle physics and materials science to biochemistry, and encapsulate a broad spectrum of distribution shifts including conditional, covariate, and concept shifts. Furthermore, we study three levels of information access from the out-of-distribution (OOD) testing data, including no OOD information, only OOD features without labels, and OOD features with a few labels. Overall, our benchmark results in 30 different experiment settings, and evaluates 3 GDL backbones and 11 learning algorithms in each setting. A thorough analysis of the evaluation results is provided, poised to illuminate insights for DGL researchers and domain practitioners who are to use DGL in their applications.Comment: Code and data are available at https://github.com/Graph-COM/GDL_D

    Application of Machine Learning Method to Model-Based Library Approach to Critical Dimension Measurement by CD-SEM

    Full text link
    The model-based library (MBL) method has already been established for the accurate measurement of critical dimension (CD) of semiconductor linewidth from a critical dimension scanning electron microscope (CD-SEM) image. In this work the MBL method has been further investigated by combing the CD-SEM image simulation with a neural network algorithm. The secondary electron linescan profiles were calculated at first by a Monte Carlo simulation method, enabling to obtain the dependence of linescan profiles on the selected values of various geometrical parameters (e.g., top CD, sidewall angle and height) for Si and Au trapezoidal line structures. The machine learning methods have then been applied to predicate the linescan profiles from a randomly selected training set of the calculated profiles. The predicted results agree very well with the calculated profiles with the standard deviation of 0.1% and 6% for the relative error distributions of Si and Au line structures, respectively. This result shows that the machine learning methods can be practically applied to the MBL method for the purpose of reducing the library size, accelerating the construction of the MBL database and enriching the content of an available MBL database
    • …
    corecore