19 research outputs found
Polyak Steps for Adaptive Fast Gradient Method
Accelerated algorithms for minimizing smooth strongly convex functions
usually require knowledge of the strong convexity parameter . In the case
of an unknown , current adaptive techniques are based on restart schemes.
When the optimal value is known, these strategies recover the accelerated
linear convergence bound without additional grid search. In this paper we
propose a new approach that has the same bound without any restart, using an
online estimation of strong convexity parameter. We show the robustness of the
Fast Gradient Method when using a sequence of upper bounds on . We also
present a good candidate for this estimate sequence and detail consistent
empirical results
RAPID: Rapidly Accelerated Proximal Gradient Algorithms for Convex Minimization
In this paper, we propose a new algorithm to speed-up the convergence of
accelerated proximal gradient (APG) methods. In order to minimize a convex
function , our algorithm introduces a simple line search step
after each proximal gradient step in APG so that a biconvex function
is minimized over scalar variable while fixing
variable . We propose two new ways of constructing the auxiliary
variables in APG based on the intermediate solutions of the proximal gradient
and the line search steps. We prove that at arbitrary iteration step , our algorithm can achieve a smaller upper-bound for the gap between
the current and optimal objective values than those in the traditional APG
methods such as FISTA, making it converge faster in practice. In fact, our
algorithm can be potentially applied to many important convex optimization
problems, such as sparse linear regression and kernel SVMs. Our experimental
results clearly demonstrate that our algorithm converges faster than APG in all
of the applications above, even comparable to some sophisticated solvers
Deep Image Demosaicking using a Cascade of Convolutional Residual Denoising Networks
Demosaicking and denoising are among the most crucial steps of modern digital
camera pipelines and their joint treatment is a highly ill-posed inverse
problem where at-least two-thirds of the information are missing and the rest
are corrupted by noise. This poses a great challenge in obtaining meaningful
reconstructions and a special care for the efficient treatment of the problem
is required. While there are several machine learning approaches that have been
recently introduced to deal with joint image demosaicking-denoising, in this
work we propose a novel deep learning architecture which is inspired by
powerful classical image regularization methods and large-scale convex
optimization techniques. Consequently, our derived network is more transparent
and has a clear interpretation compared to alternative competitive deep
learning approaches. Our extensive experiments demonstrate that our network
outperforms any previous approaches on both noisy and noise-free data. This
improvement in reconstruction quality is attributed to the principled way we
design our network architecture, which also requires fewer trainable parameters
than the current state-of-the-art deep network solution. Finally, we show that
our network has the ability to generalize well even when it is trained on small
datasets, while keeping the overall number of trainable parameters low.Comment: Camera ready paper to appear in the Proceedings of ECCV 201
Deep Iterative Residual Convolutional Network for Single Image Super-Resolution
Deep convolutional neural networks (CNNs) have recently achieved great
success for single image super-resolution (SISR) task due to their powerful
feature representation capabilities. The most recent deep learning based SISR
methods focus on designing deeper / wider models to learn the non-linear
mapping between low-resolution (LR) inputs and high-resolution (HR) outputs.
These existing SR methods do not take into account the image observation
(physical) model and thus require a large number of network's trainable
parameters with a great volume of training data. To address these issues, we
propose a deep Iterative Super-Resolution Residual Convolutional Network
(ISRResCNet) that exploits the powerful image regularization and large-scale
optimization techniques by training the deep network in an iterative manner
with a residual learning approach. Extensive experimental results on various
super-resolution benchmarks demonstrate that our method with a few trainable
parameters improves the results for different scaling factors in comparison
with the state-of-art methods.Comment: To be appeared in proceedings of the 25th IEEE International
Conference on Pattern Recognition (ICPR). arXiv admin note: text overlap with
arXiv:2005.00953, arXiv:2009.0369
Adaptive restart of accelerated gradient methods under local quadratic growth condition
By analyzing accelerated proximal gradient methods under a local quadratic
growth condition, we show that restarting these algorithms at any frequency
gives a globally linearly convergent algorithm. This result was previously
known only for long enough frequencies. Then, as the rate of convergence
depends on the match between the frequency and the quadratic error bound, we
design a scheme to automatically adapt the frequency of restart from the
observed decrease of the norm of the gradient mapping. Our algorithm has a
better theoretical bound than previously proposed methods for the adaptation to
the quadratic error bound of the objective. We illustrate the efficiency of the
algorithm on a Lasso problem and on a regularized logistic regression problem
A Generic online acceleration scheme for Optimization algorithms via Relaxation and Inertia
We propose generic acceleration schemes for a wide class of optimization and
iterative schemes based on relaxation and inertia. In particular, we introduce
methods that automatically tunes the acceleration coefficients online, and
establish their convergence. This is made possible by considering the class of
fixed-points iterations over averaged operators which encompass gradient
methods, ADMM, primal dual algorithms, an so on
Iterative Joint Image Demosaicking and Denoising using a Residual Denoising Network
Modern digital cameras rely on the sequential execution of separate image
processing steps to produce realistic images. The first two steps are usually
related to denoising and demosaicking where the former aims to reduce noise
from the sensor and the latter converts a series of light intensity readings to
color images. Modern approaches try to jointly solve these problems, i.e. joint
denoising-demosaicking which is an inherently ill-posed problem given that
two-thirds of the intensity information is missing and the rest are perturbed
by noise. While there are several machine learning systems that have been
recently introduced to solve this problem, the majority of them relies on
generic network architectures which do not explicitly take into account the
physical image model. In this work we propose a novel algorithm which is
inspired by powerful classical image regularization methods, large-scale
optimization, and deep learning techniques. Consequently, our derived iterative
optimization algorithm, which involves a trainable denoising network, has a
transparent and clear interpretation compared to other black-box data driven
approaches. Our extensive experimentation line demonstrates that our proposed
method outperforms any previous approaches for both noisy and noise-free data
across many different datasets. This improvement in reconstruction quality is
attributed to the rigorous derivation of an iterative solution and the
principled way we design our denoising network architecture, which as a result
requires fewer trainable parameters than the current state-of-the-art solution
and furthermore can be efficiently trained by using a significantly smaller
number of training data than existing deep demosaicking networks. Code and
results can be found at https://github.com/cig-skoltech/deep_demosaickComment: arXiv admin note: substantial text overlap with arXiv:1803.0521
When is a Convolutional Filter Easy To Learn?
We analyze the convergence of (stochastic) gradient descent algorithm for
learning a convolutional filter with Rectified Linear Unit (ReLU) activation
function. Our analysis does not rely on any specific form of the input
distribution and our proofs only use the definition of ReLU, in contrast with
previous works that are restricted to standard Gaussian input. We show that
(stochastic) gradient descent with random initialization can learn the
convolutional filter in polynomial time and the convergence rate depends on the
smoothness of the input distribution and the closeness of patches. To the best
of our knowledge, this is the first recovery guarantee of gradient-based
algorithms for convolutional filter on non-Gaussian input distributions. Our
theory also justifies the two-stage learning rate strategy in deep neural
networks. While our focus is theoretical, we also present experiments that
illustrate our theoretical findings.Comment: Published as a conference paper at ICLR 201
Accelerated Optimization With Orthogonality Constraints
We develop a generalization of Nesterov's accelerated gradient descent method
which is designed to deal with orthogonality constraints. To demonstrate the
effectiveness of our method, we perform numerical experiments which demonstrate
that the number of iterations scales with the square root of the condition
number, and also compare with existing state-of-the-art quasi-Newton methods on
the Stiefel manifold. Our experiments show that our method outperforms existing
state-of-the-art quasi-Newton methods on some large, ill-conditioned problems.Comment: 17 pages, 4 figure
A generic adaptive restart scheme with applications to saddle point algorithms
We provide a simple and generic adaptive restart scheme for convex
optimization that is able to achieve worst-case bounds matching (up to constant
multiplicative factors) optimal restart schemes that require knowledge of
problem specific constants. The scheme triggers restarts whenever there is
sufficient reduction of a distance-based potential function. This potential
function is always computable.
We apply the scheme to obtain the first adaptive restart algorithm for
saddle-point algorithms including primal-dual hybrid gradient (PDHG) and
extragradient. The method improves the worst-case bounds of PDHG on bilinear
games, and numerical experiments on quadratic assignment problems and matrix
games demonstrate dramatic improvements for obtaining high-accuracy solutions.
Additionally, for accelerated gradient descent (AGD), this scheme obtains a
worst-case bound within 60% of the bound achieved by the (unknown) optimal
restart period when high accuracy is desired. In practice, the scheme is
competitive with the heuristic of O'Donoghue and Candes (2015)