840 research outputs found
Enhanced Adaptive Gradient Algorithms for Nonconvex-PL Minimax Optimization
In the paper, we study a class of nonconvex nonconcave minimax optimization
problems (i.e., ), where is possible nonconvex in
, and it is nonconcave and satisfies the Polyak-Lojasiewicz (PL) condition
in . Moreover, we propose a class of enhanced momentum-based gradient
descent ascent methods (i.e., MSGDA and AdaMSGDA) to solve these stochastic
Nonconvex-PL minimax problems. In particular, our AdaMSGDA algorithm can use
various adaptive learning rates in updating the variables and without
relying on any global and coordinate-wise adaptive learning rates.
Theoretically, we present an effective convergence analysis framework for our
methods. Specifically, we prove that our MSGDA and AdaMSGDA methods have the
best known sample (gradient) complexity of only requiring
one sample at each loop in finding an -stationary solution (i.e.,
, where ). This
manuscript commemorates the mathematician Boris Polyak (1935-2023).Comment: 30 page
Algorithmic Behaviours of Adagrad in Underdetermined Linear Regression
With the high use of over-parameterized data in deep learning, the choice of optimizer in training plays a big role in a model’s ability to generalize well due to the existence of solution selection bias. We consider the popular adaptive gradient method: Adagrad, and aim to study its convergence and algorithmic biases in the underdetermined linear regression regime. First we prove that Adagrad converges in this problem regime. Subsequently, we empirically find that when using sufficiently small step sizes, Adagrad promotes diffuse solutions, in the sense of uniformity among the coordinates of the solution. Additionally, when compared to gradient descent, we see empirically and show theoretically that Adagrad’s solution, under the same conditions, exhibits greater diffusion compared to the solution obtained through gradient descent. This behaviour is unexpected as conventional data science encourages the utilization of optimizers that attain sparser solutions. This preference arises due to some inherent advantages such as helping to prevent overfitting, and reducing the dimensionality of the data. However, we show that in the application of interpolation, diffuse solutions yield beneficial results when compared to solutions with localization; Namely, we experimentally observe the success of diffuse solutions when interpolating a line via the weighted sum of spike-like functions. The thesis concludes with some suggestions to possible extensions of the content in future work
Electron Thermal Runaway in Atmospheric Electrified Gases: a microscopic approach
Thesis elaborated from 2018 to 2023 at the Instituto de Astrofísica de Andalucía under the supervision of Alejandro Luque (Granada, Spain) and Nikolai Lehtinen (Bergen, Norway). This thesis presents a new database of atmospheric electron-molecule collision cross sections which was published separately under the DOI :
With this new database and a new super-electron management algorithm which significantly enhances high-energy electron statistics at previously unresolved ratios, the thesis explores general facets of the electron thermal runaway process relevant to atmospheric discharges under various conditions of the temperature and gas composition as can be encountered in the wake and formation of discharge channels
Near-Optimal Non-Convex Stochastic Optimization under Generalized Smoothness
The generalized smooth condition, -smoothness, has triggered
people's interest since it is more realistic in many optimization problems
shown by both empirical and theoretical evidence. Two recent works established
the sample complexity to obtain an -stationary
point. However, both require a large batch size on the order of
, which is not only computationally burdensome
but also unsuitable for streaming applications. Additionally, these existing
convergence bounds are established only for the expected rate, which is
inadequate as they do not supply a useful performance guarantee on a single
run. In this work, we solve the prior two problems simultaneously by revisiting
a simple variant of the STORM algorithm. Specifically, under the
-smoothness and affine-type noises, we establish the first
near-optimal high-probability sample
complexity where is the failure probability. Besides, for the
same algorithm, we also recover the optimal sample
complexity for the expected convergence with improved dependence on the
problem-dependent parameter. More importantly, our convergence results only
require a constant batch size in contrast to the previous works.Comment: The whole paper is rewritten with new results in V
Stochastic Constrained DRO with a Complexity Independent of Sample Size
Distributionally Robust Optimization (DRO), as a popular method to train
robust models against distribution shift between training and test sets, has
received tremendous attention in recent years. In this paper, we propose and
analyze stochastic algorithms that apply to both non-convex and convex losses
for solving Kullback Leibler divergence constrained DRO problem. Compared with
existing methods solving this problem, our stochastic algorithms not only enjoy
competitive if not better complexity independent of sample size but also just
require a constant batch size at every iteration, which is more practical for
broad applications. We establish a nearly optimal complexity bound for finding
an stationary solution for non-convex losses and an optimal
complexity for finding an optimal solution for convex losses.
Empirical studies demonstrate the effectiveness of the proposed algorithms for
solving non-convex and convex constrained DRO problems.Comment: 37 pages, 16 figure
Personalized Federated Learning via ADMM with Moreau Envelope
Personalized federated learning (PFL) is an approach proposed to address the
issue of poor convergence on heterogeneous data. However, most existing PFL
frameworks require strong assumptions for convergence. In this paper, we
propose an alternating direction method of multipliers (ADMM) for training PFL
models with Moreau envelope (FLAME), which achieves a sublinear convergence
rate, relying on the relatively weak assumption of gradient Lipschitz
continuity. Moreover, due to the gradient-free nature of ADMM, FLAME alleviates
the need for hyperparameter tuning, particularly in avoiding the adjustment of
the learning rate when training the global model. In addition, we propose a
biased client selection strategy to expedite the convergence of training of PFL
models. Our theoretical analysis establishes the global convergence under both
unbiased and biased client selection strategies. Our experiments validate that
FLAME, when trained on heterogeneous data, outperforms state-of-the-art methods
in terms of model performance. Regarding communication efficiency, it exhibits
an average speedup of 3.75x compared to the baselines. Furthermore,
experimental results validate that the biased client selection strategy speeds
up the convergence of both personalized and global models.Comment: 15 page
Convergence of Adam under Relaxed Assumptions
In this paper, we provide a rigorous proof of convergence of the Adaptive
Moment Estimate (Adam) algorithm for a wide class of optimization objectives.
Despite the popularity and efficiency of the Adam algorithm in training deep
neural networks, its theoretical properties are not yet fully understood, and
existing convergence proofs require unrealistically strong assumptions, such as
globally bounded gradients, to show the convergence to stationary points. In
this paper, we show that Adam provably converges to -stationary
points with gradient complexity under far more
realistic conditions. The key to our analysis is a new proof of boundedness of
gradients along the optimization trajectory of Adam, under a generalized
smoothness assumption according to which the local smoothness (i.e., Hessian
norm when it exists) is bounded by a sub-quadratic function of the gradient
norm. Moreover, we propose a variance-reduced version of Adam with an
accelerated gradient complexity of .Comment: 33 page
Single-Call Stochastic Extragradient Methods for Structured Non-monotone Variational Inequalities: Improved Analysis under Weaker Conditions
Single-call stochastic extragradient methods, like stochastic past
extragradient (SPEG) and stochastic optimistic gradient (SOG), have gained a
lot of interest in recent years and are one of the most efficient algorithms
for solving large-scale min-max optimization and variational inequalities
problems (VIP) appearing in various machine learning tasks. However, despite
their undoubted popularity, current convergence analyses of SPEG and SOG
require a bounded variance assumption. In addition, several important questions
regarding the convergence properties of these methods are still open, including
mini-batching, efficient step-size selection, and convergence guarantees under
different sampling strategies. In this work, we address these questions and
provide convergence guarantees for two large classes of structured non-monotone
VIPs: (i) quasi-strongly monotone problems (a generalization of strongly
monotone problems) and (ii) weak Minty variational inequalities (a
generalization of monotone and Minty VIPs). We introduce the expected residual
condition, explain its benefits, and show how it can be used to obtain a
strictly weaker bound than previously used growth conditions, expected
co-coercivity, or bounded variance assumptions. Equipped with this condition,
we provide theoretical guarantees for the convergence of single-call
extragradient methods for different step-size selections, including constant,
decreasing, and step-size-switching rules. Furthermore, our convergence
analysis holds under the arbitrary sampling paradigm, which includes importance
sampling and various mini-batching strategies as special cases.Comment: 37th Conference on Neural Information Processing Systems (NeurIPS
2023
Development of a Moving Front Kinetic Monte Carlo Algorithm to Simulate Moving Interface Systems
Moving interfaces play vital and crucial roles in a wide variety of different natural, technological, and industrial processes, including solids dissolution, capillary action, sessile droplet spreading, and superhydrophobicity. In each of these systems, the fundamental process behaviour is entirely dependent on the interface and on the underlying physics governing its movement. As a result, there is significant interest in studying and developing models to capture the behaviour of these moving interface systems over a wide variety of different applications. However, the simulation techniques used to model moving interfaces are limited in their application, as the molecular-level models are unable to simulate interface behaviour over large spatial and temporal scales, whereas the large-scale modeling techniques cannot account for the nanoscale processes that govern the interface behaviour or the molecular-scale fluctuations and deviations in the interface. Furthermore, methods developed to bridge the gap between the two scales are prone to error-induced force imbalances at the interface that can result in fictitious behaviour.
In order to overcome these challenges, this study developed a novel kinetic Monte Carlo (kMC)-based modelling technique referred to as Moving Front kMC (MFkMC) to adequately and efficiently capture the molecular-scale events and forces governing the moving interface behaviour over large length and timescales. This framework was designed to capture the movement of transiently-varying interfaces in a kinetic-like manner so that its movement can be described using Monte Carlo sampling. The MFkMC algorithm accomplishes this task by evaluating the behaviour of the interfacial molecules and assigning kinetic Monte Carlo-style rate equations that describe the transition probability that a molecule would advance into the neighbouring phase, displacing an interfacial molecule from the opposing phase and thus changing the interface. The proposed algorithm was subsequently used to capture the moving interface behaviour within crystal dissolution, capillary rise, and sessile droplet spreading on both smooth and superhydrophobic surfaces. The individual system models for each application were used to analyze the behaviour within each application and to tackle challenges within each field.
The MFkMC modelling method was initially used to capture crystal dissolution for applications in pharmaceutical drug delivery. The developed model was designed to predict the dissolution of a wide variety of crystalline minerals, regardless of their composition and crystal structure. The MFkMC approach was compared against a standard kMC model of the same system to validate the MFkMC approach and highlight its advantages and limitations. The proposed framework was used to explore ways of enhancing crystal dissolution processes by assessing the variability from environmental uncertainties and by performing robust optimization to improve the dissolution performance. The approach was used to simulate calcium carbonate dissolution within the human gastrointestinal system. Polynomial chaos expansions (PCEs) were used to propagate the parametric uncertainty through the kMC model. Robust optimization was subsequently performed to determine the crystal design parameters that achieve target dissolution specifications using low-order PCE coefficient models (LPCMs). The results showcased the applicability of the kMC crystal dissolution model and the need to account for dissolution uncertainty within key biological applications.
The MFkMC approach was additionally used to capture capillary rise in cavities of different shapes. The proposed model was adapted to capture the movement of a fluid-fluid interface, such as the moving interface present in capillary action studies, using kMC type approaches based on the forces acting locally upon the interface. The proposed force balance-based MFkMC (FB-MFkMC) expressions were subsequently coupled with capillary action force balance equations to capture capillary rise within any axisymmetric cavity. The developed model was validated against known analytical models that capture capillary rise dynamics in perfect cylinders. Furthermore, the resulting multiscale model was used to analyze capillary rise within axisymmetric cavities of irregular shape and in cylinders subject to surface roughness. These studies highlighted that the FB-MFkMC algorithm can capture the macroscale behaviour of a system subject to molecular-level irregularities such as surface roughness. Furthermore, they highlighted that phenomena such as roughness can significantly affect moving interface behaviour and highlight the need to accommodate for these phenomena.
MFkMC was furthermore extended to capture sessile droplet spreading on a smooth surface. The developed approach adapted the capillary action FB-MFkMC model to capture the spreading behaviour of a droplet based on the force balance acting upon the droplet interface, which was developed using analytical inertial and capillary expressions from the literature. This study furthermore derived a new semi-empirical expression to depict the viscous damping force acting on the droplet. The developed viscous force term depends on a fitted parameter c, whose value was observed to vary solely depending on the droplet liquid as captured predominantly by the droplet Ohnesorge number. The proposed approach was subsequently validated using data obtained both from conducted experiments and from the literature to support the robustness of the framework. The predictive capabilities of the developed model were further inspected to provide insights on the sessile droplet system behaviour.
The developed FB-MFkMC model was additionally modified to capture sessile droplet spreading on pillared superhydrophobic surfaces (SHSs). These adjustments included developing the Periodic Unit (PU) method of capturing periodic SHS pillar arrays and accommodating for the changes necessary to capture the droplet spreading behaviour across the gaps between the pillars (i.e., Cassie mode wetting). The proposed SHS-based FB-MFkMC (SHS-MFkMC) model was furthermore adapted to accommodate for spontaneous Cassie-to-Wenzel (C2W) droplet transitions on the solid surface. The capabilities of the full SHS-MFkMC model to capture both radial sessile droplet spread and spontaneous C2W transitions were compared to experimental results from within the literature. Furthermore, a sensitivity analysis was conducted to assess the effects of the various system parameters on the model performance and compare them with the expected system results
- …