25 research outputs found
Stochastic Methods for Composite and Weakly Convex Optimization Problems
We consider minimization of stochastic functionals that are compositions of a
(potentially) non-smooth convex function and smooth function and, more
generally, stochastic weakly-convex functionals. We develop a family of
stochastic methods---including a stochastic prox-linear algorithm and a
stochastic (generalized) sub-gradient procedure---and prove that, under mild
technical conditions, each converges to first-order stationary points of the
stochastic objective. We provide experiments further investigating our methods
on non-smooth phase retrieval problems; the experiments indicate the practical
effectiveness of the procedures
Minimization of nonsmooth nonconvex functions using inexact evaluations and its worst-case complexity
An adaptive regularization algorithm using inexact function and derivatives
evaluations is proposed for the solution of composite nonsmooth nonconvex
optimization. It is shown that this algorithm needs at most
evaluations of the problem's functions and
their derivatives for finding an -approximate first-order stationary
point. This complexity bound therefore generalizes that provided by [Bellavia,
Gurioli, Morini and Toint, 2018] for inexact methods for smooth nonconvex
problems, and is within a factor of the optimal bound known
for smooth and nonsmooth nonconvex minimization with exact evaluations. A
practically more restrictive variant of the algorithm with worst-case
complexity is also presented.Comment: 19 page
Convergence of a Stochastic Subgradient Method with Averaging for Nonsmooth Nonconvex Constrained Optimization
We prove convergence of a single time-scale stochastic subgradient method
with subgradient averaging for constrained problems with a nonsmooth and
nonconvex objective function having the property of generalized
differentiability. As a tool of our analysis, we also prove a chain rule on a
path for such functions
Automatic Registration and Clustering of Time Series
Clustering of time series data exhibits a number of challenges not present in
other settings, notably the problem of registration (alignment) of observed
signals. Typical approaches include pre-registration to a user-specified
template or time warping approaches which attempt to optimally align series
with a minimum of distortion. For many signals obtained from recording or
sensing devices, these methods may be unsuitable as a template signal is not
available for pre-registration, while the distortion of warping approaches may
obscure meaningful temporal information. We propose a new method for automatic
time series alignment within a clustering problem. Our approach, Temporal
Registration using Optimal Unitary Transformations (TROUT), is based on a novel
dissimilarity measure between time series that is easy to compute and
automatically identifies optimal alignment between pairs of time series. By
embedding our new measure in a optimization formulation, we retain well-known
advantages of computational and statistical performance. We provide an
efficient algorithm for TROUT-based clustering and demonstrate its superior
performance over a range of competitors.Comment: To appear in ICASSP 202
Zero Order Stochastic Weakly Convex Composite Optimization
In this paper we consider stochastic weakly convex composite problems,
however without the existence of a stochastic subgradient oracle. We present a
derivative free algorithm that uses a two point approximation for computing a
gradient estimate of the smoothed function. We prove convergence at a similar
rate as state of the art methods, however with a larger constant, and report
some numerical results showing the effectiveness of the approach
ODE Analysis of Stochastic Gradient Methods with Optimism and Anchoring for Minimax Problems and GANs
Despite remarkable empirical success, the training dynamics of generative
adversarial networks (GAN), which involves solving a minimax game using
stochastic gradients, is still poorly understood. In this work, we analyze
last-iterate convergence of simultaneous gradient descent (simGD) and its
variants under the assumption of convex-concavity, guided by a continuous-time
analysis with differential equations. First, we show that simGD, as is,
converges with stochastic sub-gradients under strict convexity in the primal
variable. Second, we generalize optimistic simGD to accommodate an optimism
rate separate from the learning rate and show its convergence with full
gradients. Finally, we present anchored simGD, a new method, and show
convergence with stochastic subgradients
Adaptive First-and Zeroth-order Methods for Weakly Convex Stochastic Optimization Problems
In this paper, we design and analyze a new family of adaptive subgradient
methods for solving an important class of weakly convex (possibly nonsmooth)
stochastic optimization problems. Adaptive methods that use exponential moving
averages of past gradients to update search directions and learning rates have
recently attracted a lot of attention for solving optimization problems that
arise in machine learning. Nevertheless, their convergence analysis almost
exclusively requires smoothness and/or convexity of the objective function. In
contrast, we establish non-asymptotic rates of convergence of first and
zeroth-order adaptive methods and their proximal variants for a reasonably
broad class of nonsmooth \& nonconvex optimization problems. Experimental
results indicate how the proposed algorithms empirically outperform stochastic
gradient descent and its zeroth-order variant for solving such optimization
problems
Learning Latent Features with Pairwise Penalties in Low-Rank Matrix Completion
Low-rank matrix completion has achieved great success in many real-world data
applications. A matrix factorization model that learns latent features is
usually employed and, to improve prediction performance, the similarities
between latent variables can be exploited by pairwise learning using the graph
regularized matrix factorization (GRMF) method. However, existing GRMF
approaches often use the squared loss to measure the pairwise differences,
which may be overly influenced by dissimilar pairs and lead to inferior
prediction. To fully empower pairwise learning for matrix completion, we
propose a general optimization framework that allows a rich class of
(non-)convex pairwise penalty functions. A new and efficient algorithm is
developed to solve the proposed optimization problem, with a theoretical
convergence guarantee under mild assumptions. In an important situation where
the latent variables form a small number of subgroups, its statistical
guarantee is also fully considered. In particular, we theoretically
characterize the performance of the complexity-regularized maximum likelihood
estimator, as a special case of our framework, which is shown to have smaller
errors when compared to the standard matrix completion framework without
pairwise penalties. We conduct extensive experiments on both synthetic and real
datasets to demonstrate the superior performance of this general framework
A Manifold Proximal Linear Method for Sparse Spectral Clustering with Application to Single-Cell RNA Sequencing Data Analysis
Spectral clustering is one of the fundamental unsupervised learning methods
widely used in data analysis. Sparse spectral clustering (SSC) imposes sparsity
to the spectral clustering and it improves the interpretability of the model.
This paper considers a widely adopted model for SSC, which can be formulated as
an optimization problem over the Stiefel manifold with nonsmooth and nonconvex
objective. Such an optimization problem is very challenging to solve. Existing
methods usually solve its convex relaxation or need to smooth its nonsmooth
part using certain smoothing techniques. In this paper, we propose a manifold
proximal linear method (ManPL) that solves the original SSC formulation. We
also extend the algorithm to solve the multiple-kernel SSC problems, for which
an alternating ManPL algorithm is proposed. Convergence and iteration
complexity results of the proposed methods are established. We demonstrate the
advantage of our proposed methods over existing methods via the single-cell RNA
sequencing data analysis
A Stochastic Subgradient Method for Nonsmooth Nonconvex Multi-Level Composition Optimization
We propose a single time-scale stochastic subgradient method for constrained
optimization of a composition of several nonsmooth and nonconvex functions. The
functions are assumed to be locally Lipschitz and differentiable in a
generalized sense. Only stochastic estimates of the values and generalized
derivatives of the functions are used. The method is parameter-free. We prove
convergence with probability one of the method, by associating with it a system
of differential inclusions and devising a nondifferentiable Lyapunov function
for this system. For problems with functions having Lipschitz continuous
derivatives, the method finds a point satisfying an optimality measure with
error of order , after executing iterations with constant
stepsize