22 research outputs found
On the optimal combination of tensor optimization methods
We consider the minimization problem of a sum of a number of functions having Lipshitz p -th order derivatives with different Lipschitz constants. In this case, to accelerate optimization, we propose a general framework allowing to obtain near-optimal oracle complexity for each function in the sum separately, meaning, in particular, that the oracle for a function with lower Lipschitz constant is called a smaller number of times. As a building block, we extend the current theory of tensor methods and show how to generalize near-optimal tensor methods to work with inexact tensor step. Further, we investigate the situation when the functions in the sum have Lipschitz derivatives of a different order. For this situation, we propose a generic way to separate the oracle complexity between the parts of the sum. Our method is not optimal, which leads to an open problem of the optimal combination of oracles of a different order
Cubic Regularization is the Key! The First Accelerated Quasi-Newton Method with a Global Convergence Rate of for Convex Functions
In this paper, we propose the first Quasi-Newton method with a global
convergence rate of for general convex functions. Quasi-Newton
methods, such as BFGS, SR-1, are well-known for their impressive practical
performance. However, they may be slower than gradient descent for general
convex functions, with the best theoretical rate of . This gap
between impressive practical performance and poor theoretical guarantees was an
open question for a long period of time. In this paper, we make a significant
step to close this gap. We improve upon the existing rate and propose the Cubic
Regularized Quasi-Newton Method with a convergence rate of . The key
to achieving this improvement is to use the Cubic Regularized Newton Method
over the Damped Newton Method as an outer method, where the Quasi-Newton update
is an inexact Hessian approximation. Using this approach, we propose the first
Accelerated Quasi-Newton method with a global convergence rate of
for general convex functions. In special cases where we can improve the
precision of the approximation, we achieve a global convergence rate of
, which is faster than any first-order method. To make these methods
practical, we introduce the Adaptive Inexact Cubic Regularized Newton Method
and its accelerated version, which provide real-time control of the
approximation error. We show that the proposed methods have impressive
practical performance and outperform both first and second-order methods
Efficient Numerical Methods to Solve Sparse Linear Equations with Application to PageRank
In this paper, we propose three methods to solve the PageRank problem for the
transition matrices with both row and column sparsity. Our methods reduce the
PageRank problem to the convex optimization problem over the simplex. The first
algorithm is based on the gradient descent in L1 norm instead of the Euclidean
one. The second algorithm extends the Frank-Wolfe to support sparse gradient
updates. The third algorithm stands for the mirror descent algorithm with a
randomized projection. We proof converges rates for these methods for sparse
problems as well as numerical experiments support their effectiveness.Comment: 26 page
Suppressing Poisoning Attacks on Federated Learning for Medical Imaging
Collaboration among multiple data-owning entities (e.g., hospitals) can
accelerate the training process and yield better machine learning models due to
the availability and diversity of data. However, privacy concerns make it
challenging to exchange data while preserving confidentiality. Federated
Learning (FL) is a promising solution that enables collaborative training
through exchange of model parameters instead of raw data. However, most
existing FL solutions work under the assumption that participating clients are
\emph{honest} and thus can fail against poisoning attacks from malicious
parties, whose goal is to deteriorate the global model performance. In this
work, we propose a robust aggregation rule called Distance-based Outlier
Suppression (DOS) that is resilient to byzantine failures. The proposed method
computes the distance between local parameter updates of different clients and
obtains an outlier score for each client using Copula-based Outlier Detection
(COPOD). The resulting outlier scores are converted into normalized weights
using a softmax function, and a weighted average of the local parameters is
used for updating the global model. DOS aggregation can effectively suppress
parameter updates from malicious clients without the need for any
hyperparameter selection, even when the data distributions are heterogeneous.
Evaluation on two medical imaging datasets (CheXpert and HAM10000) demonstrates
the higher robustness of DOS method against a variety of poisoning attacks in
comparison to other state-of-the-art methods. The code can be found here
https://github.com/Naiftt/SPAFD
Exploiting higher-order derivatives in convex optimization methods
Exploiting higher-order derivatives in convex optimization is known at least
since 1970's. In each iteration higher-order (also called tensor) methods
minimize a regularized Taylor expansion of the objective function, which leads
to faster convergence rates if the corresponding higher-order derivative is
Lipschitz-continuous. Recently a series of lower iteration complexity bounds
for such methods were proved, and a gap between upper an lower complexity
bounds was revealed. Moreover, it was shown that such methods can be
implementable since the appropriately regularized Taylor expansion of a convex
function is also convex and, thus, can be minimized in polynomial time. Only
very recently an algorithm with optimal convergence rate was
proposed for minimizing convex functions with Lipschitz -th derivative. For
convex functions with Lipschitz third derivative, these developments allowed to
propose a second-order method with convergence rate , which is faster
than the rate of existing second-order methods