    Uniqueness of DRS as the 2 Operator Resolvent-Splitting and Impossibility of 3 Operator Resolvent-Splitting

    Given the success of Douglas--Rachford splitting (DRS), it is natural to ask whether DRS can be generalized. Are there other 2 operator resolvent-splittings sharing the favorable properties of DRS? Can DRS be generalized to 3 operators? This work presents the answers: no and no. In a certain sense, DRS is the unique 2 operator resolvent-splitting, and generalizing DRS to 3 operators is impossible without lifting, where lifting roughly corresponds to enlarging the problem size. The impossibility result further raises a question. How much lifting is necessary to generalize DRS to 3 operators? This work presents the answer by providing a novel 3 operator resolvent-splitting with provably minimal lifting that directly generalizes DRS.Comment: Published in Mathematical Programmin

    Linear Convergence of Cyclic SAGA

    In this work, we present and analyze C-SAGA, a (deterministic) cyclic variant of SAGA. C-SAGA is an incremental gradient method that minimizes a sum of differentiable convex functions by cyclically accessing their gradients. Even though the theory of stochastic algorithms is more mature than that of cyclic counterparts in general, practitioners often prefer cyclic algorithms. We prove C-SAGA converges linearly under the standard assumptions. Then, we compare the rate of convergence with the full gradient method, (stochastic) SAGA, and incremental aggregated gradient (IAG), theoretically and experimentally.Comment: Published in Optimization Letter

    Proximal-Proximal-Gradient Method

    In this paper, we present the proximal-proximal-gradient method (PPG), a novel optimization method that is simple to implement and simple to parallelize. PPG generalizes the proximal-gradient method and ADMM and is applicable to minimization problems written as a sum of many differentiable and many non-differentiable convex functions. The non-differentiable functions can be coupled. We furthermore present a related stochastic variation, which we call stochastic PPG (S-PPG). S-PPG can be interpreted as a generalization of Finito and MISO over to the sum of many coupled non-differentiable convex functions. We present many applications that can benefit from PPG and S-PPG and prove convergence for both methods. A key strength of PPG and S-PPG is, compared to existing methods, its ability to directly handle a large sum of non-differentiable non-separable functions with a constant stepsize independent of the number of functions. Such non-diminishing stepsizes allows them to be fast

    Adaptive Importance Sampling via Stochastic Convex Programming

    We show that the variance of the Monte Carlo estimator that is importance sampled from an exponential family is a convex function of the natural parameter of the distribution. With this insight, we propose an adaptive importance sampling algorithm that simultaneously improves the choice of sampling distribution while accumulating a Monte Carlo estimate. Exploiting convexity, we prove that the method's unbiased estimator has variance that is asymptotically optimal over the exponential family

    ODE Analysis of Stochastic Gradient Methods with Optimism and Anchoring for Minimax Problems and GANs

    Despite remarkable empirical success, the training dynamics of generative adversarial networks (GAN), which involves solving a minimax game using stochastic gradients, is still poorly understood. In this work, we analyze last-iterate convergence of simultaneous gradient descent (simGD) and its variants under the assumption of convex-concavity, guided by a continuous-time analysis with differential equations. First, we show that simGD, as is, converges with stochastic sub-gradients under strict convexity in the primal variable. Second, we generalize optimistic simGD to accommodate an optimism rate separate from the learning rate and show its convergence with full gradients. Finally, we present anchored simGD, a new method, and show convergence with stochastic subgradients

    A New Use of Douglas-Rachford Splitting and ADMM for Identifying Infeasible, Unbounded, and Pathological Conic Programs

    In this paper, we present a method for identifying infeasible, unbounded, and pathological conic programs based on Douglas-Rachford splitting, or equivalently ADMM. When an optimization program is infeasible, unbounded, or pathological, the iterates of Douglas-Rachford splitting diverge. Somewhat surprisingly, such divergent iterates still provide useful information, which our method uses for identification. In addition, for strongly infeasible problems the method produces a separating hyperplane and informs the user on how to minimally modify the given problem to achieve strong feasibility. As a first-order method, the proposed algorithm relies on simple subroutines, and therefore is simple to implement and has low per-iteration cost

    Risk-Constrained Kelly Gambling

    We consider the classic Kelly gambling problem with general distribution of outcomes, and an additional risk constraint that limits the probability of a drawdown of wealth to a given undesirable level. We develop a bound on the drawdown probability; using this bound instead of the original risk constraint yields a convex optimization problem that guarantees the drawdown risk constraint holds. Numerical experiments show that our bound on drawdown probability is reasonably close to the actual drawdown risk, as computed by Monte Carlo simulation. Our method is parametrized by a single parameter that has a natural interpretation as a risk-aversion parameter, allowing us to systematically trade off asymptotic growth rate and drawdown risk. Simulations show that this method yields bets that out perform fractional-Kelly bets for the same drawdown risk level or growth rate. Finally, we show that a natural quadratic approximation of our convex problem is closely connected to the classical mean-variance Markowitz portfolio selection problem

    Splitting with Near-Circulant Linear Systems: Applications to Total Variation CT and PET

    Many imaging problems, such as total variation reconstruction of X-ray computed tomography (CT) and positron-emission tomography (PET), are solved via a convex optimization problem with near-circulant, but not actually circulant, linear systems. The popular methods to solve these problems, alternating direction method of multipliers (ADMM) and primal-dual hybrid gradient (PDHG), do not directly utilize this structure. Consequently, ADMM requires a costly matrix inversion as a subroutine, and PDHG takes too many iterations to converge. In this paper, we present near-circulant splitting (NCS), a novel splitting method that leverages the near-circulant structure. We show that NCS can converge with an iteration count close to that of ADMM, while paying a computational cost per iteration close to that of PDHG. Through experiments on a CUDA GPU, we empirically validate the theory and demonstrate that NCS can effectively utilize the parallel computing capabilities of CUDA.Comment: Published in SIAM Journal on Scientific Computin

    Vector and Matrix Optimal Mass Transport: Theory, Algorithm, and Applications

    In many applications such as color image processing, data has more than one piece of information associated with each spatial coordinate, and in such cases the classical optimal mass transport (OMT) must be generalized to handle vector-valued or matrix-valued densities. In this paper, we discuss the vector and matrix optimal mass transport and present three contributions. We first present a rigorous mathematical formulation for these setups and provide analytical results including existence of solutions and strong duality. Next, we present a simple, scalable, and parallelizable methods to solve the vector and matrix-OMT problems. Finally, we implement the proposed methods on a CUDA GPU and present experiments and applications.Comment: 22 pages, 5 figures, 3 table

