22 research outputs found

    On the optimal combination of tensor optimization methods

    Get PDF
    We consider the minimization problem of a sum of a number of functions having Lipshitz p -th order derivatives with different Lipschitz constants. In this case, to accelerate optimization, we propose a general framework allowing to obtain near-optimal oracle complexity for each function in the sum separately, meaning, in particular, that the oracle for a function with lower Lipschitz constant is called a smaller number of times. As a building block, we extend the current theory of tensor methods and show how to generalize near-optimal tensor methods to work with inexact tensor step. Further, we investigate the situation when the functions in the sum have Lipschitz derivatives of a different order. For this situation, we propose a generic way to separate the oracle complexity between the parts of the sum. Our method is not optimal, which leads to an open problem of the optimal combination of oracles of a different order

    Cubic Regularization is the Key! The First Accelerated Quasi-Newton Method with a Global Convergence Rate of O(k2)O(k^{-2}) for Convex Functions

    Full text link
    In this paper, we propose the first Quasi-Newton method with a global convergence rate of O(k1)O(k^{-1}) for general convex functions. Quasi-Newton methods, such as BFGS, SR-1, are well-known for their impressive practical performance. However, they may be slower than gradient descent for general convex functions, with the best theoretical rate of O(k1/3)O(k^{-1/3}). This gap between impressive practical performance and poor theoretical guarantees was an open question for a long period of time. In this paper, we make a significant step to close this gap. We improve upon the existing rate and propose the Cubic Regularized Quasi-Newton Method with a convergence rate of O(k1)O(k^{-1}). The key to achieving this improvement is to use the Cubic Regularized Newton Method over the Damped Newton Method as an outer method, where the Quasi-Newton update is an inexact Hessian approximation. Using this approach, we propose the first Accelerated Quasi-Newton method with a global convergence rate of O(k2)O(k^{-2}) for general convex functions. In special cases where we can improve the precision of the approximation, we achieve a global convergence rate of O(k3)O(k^{-3}), which is faster than any first-order method. To make these methods practical, we introduce the Adaptive Inexact Cubic Regularized Newton Method and its accelerated version, which provide real-time control of the approximation error. We show that the proposed methods have impressive practical performance and outperform both first and second-order methods

    Efficient Numerical Methods to Solve Sparse Linear Equations with Application to PageRank

    Full text link
    In this paper, we propose three methods to solve the PageRank problem for the transition matrices with both row and column sparsity. Our methods reduce the PageRank problem to the convex optimization problem over the simplex. The first algorithm is based on the gradient descent in L1 norm instead of the Euclidean one. The second algorithm extends the Frank-Wolfe to support sparse gradient updates. The third algorithm stands for the mirror descent algorithm with a randomized projection. We proof converges rates for these methods for sparse problems as well as numerical experiments support their effectiveness.Comment: 26 page

    Suppressing Poisoning Attacks on Federated Learning for Medical Imaging

    Full text link
    Collaboration among multiple data-owning entities (e.g., hospitals) can accelerate the training process and yield better machine learning models due to the availability and diversity of data. However, privacy concerns make it challenging to exchange data while preserving confidentiality. Federated Learning (FL) is a promising solution that enables collaborative training through exchange of model parameters instead of raw data. However, most existing FL solutions work under the assumption that participating clients are \emph{honest} and thus can fail against poisoning attacks from malicious parties, whose goal is to deteriorate the global model performance. In this work, we propose a robust aggregation rule called Distance-based Outlier Suppression (DOS) that is resilient to byzantine failures. The proposed method computes the distance between local parameter updates of different clients and obtains an outlier score for each client using Copula-based Outlier Detection (COPOD). The resulting outlier scores are converted into normalized weights using a softmax function, and a weighted average of the local parameters is used for updating the global model. DOS aggregation can effectively suppress parameter updates from malicious clients without the need for any hyperparameter selection, even when the data distributions are heterogeneous. Evaluation on two medical imaging datasets (CheXpert and HAM10000) demonstrates the higher robustness of DOS method against a variety of poisoning attacks in comparison to other state-of-the-art methods. The code can be found here https://github.com/Naiftt/SPAFD

    Exploiting higher-order derivatives in convex optimization methods

    Full text link
    Exploiting higher-order derivatives in convex optimization is known at least since 1970's. In each iteration higher-order (also called tensor) methods minimize a regularized Taylor expansion of the objective function, which leads to faster convergence rates if the corresponding higher-order derivative is Lipschitz-continuous. Recently a series of lower iteration complexity bounds for such methods were proved, and a gap between upper an lower complexity bounds was revealed. Moreover, it was shown that such methods can be implementable since the appropriately regularized Taylor expansion of a convex function is also convex and, thus, can be minimized in polynomial time. Only very recently an algorithm with optimal convergence rate 1/k(3p+1)/21/k^{(3p+1)/2} was proposed for minimizing convex functions with Lipschitz pp-th derivative. For convex functions with Lipschitz third derivative, these developments allowed to propose a second-order method with convergence rate 1/k51/k^5, which is faster than the rate 1/k3.51/k^{3.5} of existing second-order methods
    corecore