12,526 research outputs found

    A quasi-Newton proximal splitting method

    Get PDF
    A new result in convex analysis on the calculation of proximity operators in certain scaled norms is derived. We describe efficient implementations of the proximity calculation for a useful class of functions; the implementations exploit the piece-wise linear nature of the dual problem. The second part of the paper applies the previous result to acceleration of convex minimization problems, and leads to an elegant quasi-Newton method. The optimization method compares favorably against state-of-the-art alternatives. The algorithm has extensive applications including signal processing, sparse recovery and machine learning and classification

    On the Universality of the Logistic Loss Function

    Full text link
    A loss function measures the discrepancy between the true values (observations) and their estimated fits, for a given instance of data. A loss function is said to be proper (unbiased, Fisher consistent) if the fits are defined over a unit simplex, and the minimizer of the expected loss is the true underlying probability of the data. Typical examples are the zero-one loss, the quadratic loss and the Bernoulli log-likelihood loss (log-loss). In this work we show that for binary classification problems, the divergence associated with smooth, proper and convex loss functions is bounded from above by the Kullback-Leibler (KL) divergence, up to a multiplicative normalization constant. It implies that by minimizing the log-loss (associated with the KL divergence), we minimize an upper bound to any choice of loss functions from this set. This property justifies the broad use of log-loss in regression, decision trees, deep neural networks and many other applications. In addition, we show that the KL divergence bounds from above any separable Bregman divergence that is convex in its second argument (up to a multiplicative normalization constant). This result introduces a new set of divergence inequalities, similar to the well-known Pinsker inequality

    Let's Make Block Coordinate Descent Go Fast: Faster Greedy Rules, Message-Passing, Active-Set Complexity, and Superlinear Convergence

    Full text link
    Block coordinate descent (BCD) methods are widely-used for large-scale numerical optimization because of their cheap iteration costs, low memory requirements, amenability to parallelization, and ability to exploit problem structure. Three main algorithmic choices influence the performance of BCD methods: the block partitioning strategy, the block selection rule, and the block update rule. In this paper we explore all three of these building blocks and propose variations for each that can lead to significantly faster BCD methods. We (i) propose new greedy block-selection strategies that guarantee more progress per iteration than the Gauss-Southwell rule; (ii) explore practical issues like how to implement the new rules when using "variable" blocks; (iii) explore the use of message-passing to compute matrix or Newton updates efficiently on huge blocks for problems with a sparse dependency between variables; and (iv) consider optimal active manifold identification, which leads to bounds on the "active set complexity" of BCD methods and leads to superlinear convergence for certain problems with sparse solutions (and in some cases finite termination at an optimal solution). We support all of our findings with numerical results for the classic machine learning problems of least squares, logistic regression, multi-class logistic regression, label propagation, and L1-regularization

    Parallel ADMM for robust quadratic optimal resource allocation problems

    Full text link
    An alternating direction method of multipliers (ADMM) solver is described for optimal resource allocation problems with separable convex quadratic costs and constraints and linear coupling constraints. We describe a parallel implementation of the solver on a graphics processing unit (GPU) using a bespoke quartic function minimizer. An application to robust optimal energy management in hybrid electric vehicles is described, and the results of numerical simulations comparing the computation times of the parallel GPU implementation with those of an equivalent serial implementation are presented

    Strongly polynomial algorithm for a class of minimum-cost flow problems with separable convex objectives

    Get PDF
    A well-studied nonlinear extension of the minimum-cost flow problem is to minimize the objective ijECij(fij)\sum_{ij\in E} C_{ij}(f_{ij}) over feasible flows ff, where on every arc ijij of the network, CijC_{ij} is a convex function. We give a strongly polynomial algorithm for the case when all CijC_{ij}'s are convex quadratic functions, settling an open problem raised e.g. by Hochbaum [1994]. We also give strongly polynomial algorithms for computing market equilibria in Fisher markets with linear utilities and with spending constraint utilities, that can be formulated in this framework (see Shmyrev [2009], Devanur et al. [2011]). For the latter class this resolves an open question raised by Vazirani [2010]. The running time is O(m4logm)O(m^4\log m) for quadratic costs, O(n4+n2(m+nlogn)logn)O(n^4+n^2(m+n\log n)\log n) for Fisher's markets with linear utilities and O(mn3+m2(m+nlogn)logm)O(mn^3 +m^2(m+n\log n)\log m) for spending constraint utilities. All these algorithms are presented in a common framework that addresses the general problem setting. Whereas it is impossible to give a strongly polynomial algorithm for the general problem even in an approximate sense (see Hochbaum [1994]), we show that assuming the existence of certain black-box oracles, one can give an algorithm using a strongly polynomial number of arithmetic operations and oracle calls only. The particular algorithms can be derived by implementing these oracles in the respective settings
    corecore