Search CORE

19 research outputs found

Polyak Steps for Adaptive Fast Gradient Method

Author: Barré Mathieu
d'Aspremont Alexandre
Publication venue
Publication date: 07/06/2019
Field of study

Accelerated algorithms for minimizing smooth strongly convex functions usually require knowledge of the strong convexity parameter

\mu

. In the case of an unknown

\mu

, current adaptive techniques are based on restart schemes. When the optimal value

f^*

is known, these strategies recover the accelerated linear convergence bound without additional grid search. In this paper we propose a new approach that has the same bound without any restart, using an online estimation of strong convexity parameter. We show the robustness of the Fast Gradient Method when using a sequence of upper bounds on

\mu

. We also present a good candidate for this estimate sequence and detail consistent empirical results

arXiv.org e-Print Archive

RAPID: Rapidly Accelerated Proximal Gradient Algorithms for Convex Minimization

Author: Saligrama Venkatesh
Zhang Ziming
Publication venue
Publication date: 18/06/2014
Field of study

In this paper, we propose a new algorithm to speed-up the convergence of accelerated proximal gradient (APG) methods. In order to minimize a convex function

f(\mathbf{x})

, our algorithm introduces a simple line search step after each proximal gradient step in APG so that a biconvex function

f(\theta\mathbf{x})

is minimized over scalar variable

\theta>0

while fixing variable

\mathbf{x}

. We propose two new ways of constructing the auxiliary variables in APG based on the intermediate solutions of the proximal gradient and the line search steps. We prove that at arbitrary iteration step

t (t\geq1)

, our algorithm can achieve a smaller upper-bound for the gap between the current and optimal objective values than those in the traditional APG methods such as FISTA, making it converge faster in practice. In fact, our algorithm can be potentially applied to many important convex optimization problems, such as sparse linear regression and kernel SVMs. Our experimental results clearly demonstrate that our algorithm converges faster than APG in all of the applications above, even comparable to some sophisticated solvers

arXiv.org e-Print Archive

Deep Image Demosaicking using a Cascade of Convolutional Residual Denoising Networks

Author: Kokkinos Filippos
Lefkimmiatis Stamatios
Publication venue
Publication date: 12/07/2018
Field of study

Demosaicking and denoising are among the most crucial steps of modern digital camera pipelines and their joint treatment is a highly ill-posed inverse problem where at-least two-thirds of the information are missing and the rest are corrupted by noise. This poses a great challenge in obtaining meaningful reconstructions and a special care for the efficient treatment of the problem is required. While there are several machine learning approaches that have been recently introduced to deal with joint image demosaicking-denoising, in this work we propose a novel deep learning architecture which is inspired by powerful classical image regularization methods and large-scale convex optimization techniques. Consequently, our derived network is more transparent and has a clear interpretation compared to alternative competitive deep learning approaches. Our extensive experiments demonstrate that our network outperforms any previous approaches on both noisy and noise-free data. This improvement in reconstruction quality is attributed to the principled way we design our network architecture, which also requires fewer trainable parameters than the current state-of-the-art deep network solution. Finally, we show that our network has the ability to generalize well even when it is trained on small datasets, while keeping the overall number of trainable parameters low.Comment: Camera ready paper to appear in the Proceedings of ECCV 201

arXiv.org e-Print Archive

Deep Iterative Residual Convolutional Network for Single Image Super-Resolution

Author: Foresti Gian Luca
Micheloni Christian
Umer Rao Muhammad
Publication venue
Publication date: 07/09/2020
Field of study

Deep convolutional neural networks (CNNs) have recently achieved great success for single image super-resolution (SISR) task due to their powerful feature representation capabilities. The most recent deep learning based SISR methods focus on designing deeper / wider models to learn the non-linear mapping between low-resolution (LR) inputs and high-resolution (HR) outputs. These existing SR methods do not take into account the image observation (physical) model and thus require a large number of network's trainable parameters with a great volume of training data. To address these issues, we propose a deep Iterative Super-Resolution Residual Convolutional Network (ISRResCNet) that exploits the powerful image regularization and large-scale optimization techniques by training the deep network in an iterative manner with a residual learning approach. Extensive experimental results on various super-resolution benchmarks demonstrate that our method with a few trainable parameters improves the results for different scaling factors in comparison with the state-of-art methods.Comment: To be appeared in proceedings of the 25th IEEE International Conference on Pattern Recognition (ICPR). arXiv admin note: text overlap with arXiv:2005.00953, arXiv:2009.0369

arXiv.org e-Print Archive

Adaptive restart of accelerated gradient methods under local quadratic growth condition

Author: Fercoq Olivier
Qu Zheng
Publication venue: 'Oxford University Press (OUP)'
Publication date: 07/09/2017
Field of study

By analyzing accelerated proximal gradient methods under a local quadratic growth condition, we show that restarting these algorithms at any frequency gives a globally linearly convergent algorithm. This result was previously known only for long enough frequencies. Then, as the rate of convergence depends on the match between the frequency and the quadratic error bound, we design a scheme to automatically adapt the frequency of restart from the observed decrease of the norm of the gradient mapping. Our algorithm has a better theoretical bound than previously proposed methods for the adaptation to the quadratic error bound of the objective. We illustrate the efficiency of the algorithm on a Lasso problem and on a regularized logistic regression problem

arXiv.org e-Print Archive

A Generic online acceleration scheme for Optimization algorithms via Relaxation and Inertia

Author: Hendrickx Julien M.
Iutzeler Franck
Publication venue
Publication date: 21/02/2017
Field of study

We propose generic acceleration schemes for a wide class of optimization and iterative schemes based on relaxation and inertia. In particular, we introduce methods that automatically tunes the acceleration coefficients online, and establish their convergence. This is made possible by considering the class of fixed-points iterations over averaged operators which encompass gradient methods, ADMM, primal dual algorithms, an so on

arXiv.org e-Print Archive

Iterative Joint Image Demosaicking and Denoising using a Residual Denoising Network

Author: Kokkinos Filippos
Lefkimmiatis Stamatios
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 29/03/2019
Field of study

Modern digital cameras rely on the sequential execution of separate image processing steps to produce realistic images. The first two steps are usually related to denoising and demosaicking where the former aims to reduce noise from the sensor and the latter converts a series of light intensity readings to color images. Modern approaches try to jointly solve these problems, i.e. joint denoising-demosaicking which is an inherently ill-posed problem given that two-thirds of the intensity information is missing and the rest are perturbed by noise. While there are several machine learning systems that have been recently introduced to solve this problem, the majority of them relies on generic network architectures which do not explicitly take into account the physical image model. In this work we propose a novel algorithm which is inspired by powerful classical image regularization methods, large-scale optimization, and deep learning techniques. Consequently, our derived iterative optimization algorithm, which involves a trainable denoising network, has a transparent and clear interpretation compared to other black-box data driven approaches. Our extensive experimentation line demonstrates that our proposed method outperforms any previous approaches for both noisy and noise-free data across many different datasets. This improvement in reconstruction quality is attributed to the rigorous derivation of an iterative solution and the principled way we design our denoising network architecture, which as a result requires fewer trainable parameters than the current state-of-the-art solution and furthermore can be efficiently trained by using a significantly smaller number of training data than existing deep demosaicking networks. Code and results can be found at https://github.com/cig-skoltech/deep_demosaickComment: arXiv admin note: substantial text overlap with arXiv:1803.0521

arXiv.org e-Print Archive

When is a Convolutional Filter Easy To Learn?

Author: Du Simon S.
Lee Jason D.
Tian Yuandong
Publication venue
Publication date: 28/02/2018
Field of study

We analyze the convergence of (stochastic) gradient descent algorithm for learning a convolutional filter with Rectified Linear Unit (ReLU) activation function. Our analysis does not rely on any specific form of the input distribution and our proofs only use the definition of ReLU, in contrast with previous works that are restricted to standard Gaussian input. We show that (stochastic) gradient descent with random initialization can learn the convolutional filter in polynomial time and the convergence rate depends on the smoothness of the input distribution and the closeness of patches. To the best of our knowledge, this is the first recovery guarantee of gradient-based algorithms for convolutional filter on non-Gaussian input distributions. Our theory also justifies the two-stage learning rate strategy in deep neural networks. While our focus is theoretical, we also present experiments that illustrate our theoretical findings.Comment: Published as a conference paper at ICLR 201

arXiv.org e-Print Archive

Accelerated Optimization With Orthogonality Constraints

Author: Siegel Jonathan W.
Publication venue: 'Global Science Press'
Publication date: 06/01/2021
Field of study

We develop a generalization of Nesterov's accelerated gradient descent method which is designed to deal with orthogonality constraints. To demonstrate the effectiveness of our method, we perform numerical experiments which demonstrate that the number of iterations scales with the square root of the condition number, and also compare with existing state-of-the-art quasi-Newton methods on the Stiefel manifold. Our experiments show that our method outperforms existing state-of-the-art quasi-Newton methods on some large, ill-conditioned problems.Comment: 17 pages, 4 figure

arXiv.org e-Print Archive

A generic adaptive restart scheme with applications to saddle point algorithms

Author: Hinder Oliver
Lubin Miles
Publication venue
Publication date: 14/08/2020
Field of study

We provide a simple and generic adaptive restart scheme for convex optimization that is able to achieve worst-case bounds matching (up to constant multiplicative factors) optimal restart schemes that require knowledge of problem specific constants. The scheme triggers restarts whenever there is sufficient reduction of a distance-based potential function. This potential function is always computable. We apply the scheme to obtain the first adaptive restart algorithm for saddle-point algorithms including primal-dual hybrid gradient (PDHG) and extragradient. The method improves the worst-case bounds of PDHG on bilinear games, and numerical experiments on quadratic assignment problems and matrix games demonstrate dramatic improvements for obtaining high-accuracy solutions. Additionally, for accelerated gradient descent (AGD), this scheme obtains a worst-case bound within 60% of the bound achieved by the (unknown) optimal restart period when high accuracy is desired. In practice, the scheme is competitive with the heuristic of O'Donoghue and Candes (2015)

arXiv.org e-Print Archive