26 research outputs found
A Simple Proximal Stochastic Gradient Method for Nonsmooth Nonconvex Optimization
We analyze stochastic gradient algorithms for optimizing nonconvex, nonsmooth
finite-sum problems. In particular, the objective function is given by the
summation of a differentiable (possibly nonconvex) component, together with a
possibly non-differentiable but convex component. We propose a proximal
stochastic gradient algorithm based on variance reduction, called ProxSVRG+.
Our main contribution lies in the analysis of ProxSVRG+. It recovers several
existing convergence results and improves/generalizes them (in terms of the
number of stochastic gradient oracle calls and proximal oracle calls). In
particular, ProxSVRG+ generalizes the best results given by the SCSG algorithm,
recently proposed by [Lei et al., 2017] for the smooth nonconvex case.
ProxSVRG+ is also more straightforward than SCSG and yields simpler analysis.
Moreover, ProxSVRG+ outperforms the deterministic proximal gradient descent
(ProxGD) for a wide range of minibatch sizes, which partially solves an open
problem proposed in [Reddi et al., 2016b]. Also, ProxSVRG+ uses much less
proximal oracle calls than ProxSVRG [Reddi et al., 2016b]. Moreover, for
nonconvex functions satisfied Polyak-\L{}ojasiewicz condition, we prove that
ProxSVRG+ achieves a global linear convergence rate without restart unlike
ProxSVRG. Thus, it can \emph{automatically} switch to the faster linear
convergence in some regions as long as the objective function satisfies the PL
condition locally in these regions. ProxSVRG+ also improves ProxGD and
ProxSVRG/SAGA, and generalizes the results of SCSG in this case. Finally, we
conduct several experiments and the experimental results are consistent with
the theoretical results.Comment: 32nd Conference on Neural Information Processing Systems (NeurIPS
2018
Convergence of Nonconvex PnP-ADMM with MMSE Denoisers
Plug-and-Play Alternating Direction Method of Multipliers (PnP-ADMM) is a
widely-used algorithm for solving inverse problems by integrating physical
measurement models and convolutional neural network (CNN) priors. PnP-ADMM has
been theoretically proven to converge for convex data-fidelity terms and
nonexpansive CNNs. It has however been observed that PnP-ADMM often empirically
converges even for expansive CNNs. This paper presents a theoretical
explanation for the observed stability of PnP-ADMM based on the interpretation
of the CNN prior as a minimum mean-squared error (MMSE) denoiser. Our
explanation parallels a similar argument recently made for the iterative
shrinkage/thresholding algorithm variant of PnP (PnP-ISTA) and relies on the
connection between MMSE denoisers and proximal operators. We also numerically
evaluate the performance gap between PnP-ADMM using a nonexpansive DnCNN
denoiser and expansive DRUNet denoiser, thus motivating the use of expansive
CNNs
Comparative Analysis of Parallel Brain Activity Mapping Algorithms for High Resolution Brain Models
En este artículo se propone un análisis comparativo entre versiones regulares y en paralelo de métodos de optimización FISTA y Tikhonov, para resolver el problema de mapeo cerebral a partir de EEG. La comparación se realiza en términos de la reducción del tiempo computacional y el error de estimación obtenido por los métodos paralelizados. Dos modelos de cabeza con alta y baja resolución son usados para la comparación de los algoritmos. Como resultado se puede ver que, si el número de procesos en paralelo se incrementa, el tiempo computacional disminuye significativamente para todos los modelos de cabeza, sin comprometer la calidad de la reconstrucción. Adicionalmente, se puede concluir que el uso de un modelo de cabeza de alta resolución resulta en una mejora de cualquier método de reconstrucción en términos de la resolución espacial.This paper proposes a comparative analysis between regular and parallel versions of FISTA and Tikhonov-like optimizations for solving the EEG brain mapping problem. Such comparison is performed in terms of computational time reduction and estimation error achieved by the parallelized methods. Two brain models (high- and low-resolution) are used to compare the algorithms. As a result, it can be seen that, if the number of parallel processes increases, computational time decreases significantly for all the head models used in this work, without compromising the reconstruction quality. In addition, it can be concluded that the use of a high-resolution head model produces an improvement in any source reconstruction method in terms of spatial resolution.
Asynchronous Optimization Methods for Efficient Training of Deep Neural Networks with Guarantees
Asynchronous distributed algorithms are a popular way to reduce
synchronization costs in large-scale optimization, and in particular for neural
network training. However, for nonsmooth and nonconvex objectives, few
convergence guarantees exist beyond cases where closed-form proximal operator
solutions are available. As most popular contemporary deep neural networks lead
to nonsmooth and nonconvex objectives, there is now a pressing need for such
convergence guarantees. In this paper, we analyze for the first time the
convergence of stochastic asynchronous optimization for this general class of
objectives. In particular, we focus on stochastic subgradient methods allowing
for block variable partitioning, where the shared-memory-based model is
asynchronously updated by concurrent processes. To this end, we first introduce
a probabilistic model which captures key features of real asynchronous
scheduling between concurrent processes; under this model, we establish
convergence with probability one to an invariant set for stochastic subgradient
methods with momentum.
From the practical perspective, one issue with the family of methods we
consider is that it is not efficiently supported by machine learning
frameworks, as they mostly focus on distributed data-parallel strategies. To
address this, we propose a new implementation strategy for shared-memory based
training of deep neural networks, whereby concurrent parameter servers are
utilized to train a partitioned but shared model in single- and multi-GPU
settings. Based on this implementation, we achieve on average 1.2x speed-up in
comparison to state-of-the-art training methods for popular image
classification tasks without compromising accuracy