5,880 research outputs found

    On the relationship between bilevel decomposition algorithms and direct interior-point methods

    Get PDF
    Engineers have been using bilevel decomposition algorithms to solve certain nonconvex large-scale optimization problems arising in engineering design projects. These algorithms transform the large-scale problem into a bilevel program with one upperlevel problem (the master problem) and several lower-level problems (the subproblems). Unfortunately, there is analytical and numerical evidence that some of these commonly used bilevel decomposition algorithms may fail to converge even when the starting point is very close to the minimizer. In this paper, we establish a relationship between a particular bilevel decomposition algorithm, which only performs one iteration of an interior-point method when solving the subproblems, and a direct interior-point method, which solves the problem in its original (integrated) form. Using this relationship, we formally prove that the bilevel decomposition algorithm converges locally at a superlinear rate. The relevance of our analysis is that it bridges the gap between the incipient local convergence theory of bilevel decomposition algorithms and the mature theory of direct interior-point methods

    Constrained Deep Networks: Lagrangian Optimization via Log-Barrier Extensions

    Full text link
    This study investigates the optimization aspects of imposing hard inequality constraints on the outputs of CNNs. In the context of deep networks, constraints are commonly handled with penalties for their simplicity, and despite their well-known limitations. Lagrangian-dual optimization has been largely avoided, except for a few recent works, mainly due to the computational complexity and stability/convergence issues caused by alternating explicit dual updates/projections and stochastic optimization. Several studies showed that, surprisingly for deep CNNs, the theoretical and practical advantages of Lagrangian optimization over penalties do not materialize in practice. We propose log-barrier extensions, which approximate Lagrangian optimization of constrained-CNN problems with a sequence of unconstrained losses. Unlike standard interior-point and log-barrier methods, our formulation does not need an initial feasible solution. Furthermore, we provide a new technical result, which shows that the proposed extensions yield an upper bound on the duality gap. This generalizes the duality-gap result of standard log-barriers, yielding sub-optimality certificates for feasible solutions. While sub-optimality is not guaranteed for non-convex problems, our result shows that log-barrier extensions are a principled way to approximate Lagrangian optimization for constrained CNNs via implicit dual variables. We report comprehensive weakly supervised segmentation experiments, with various constraints, showing that our formulation outperforms substantially the existing constrained-CNN methods, both in terms of accuracy, constraint satisfaction and training stability, more so when dealing with a large number of constraints

    Hessian barrier algorithms for linearly constrained optimization problems

    Get PDF
    In this paper, we propose an interior-point method for linearly constrained optimization problems (possibly nonconvex). The method - which we call the Hessian barrier algorithm (HBA) - combines a forward Euler discretization of Hessian Riemannian gradient flows with an Armijo backtracking step-size policy. In this way, HBA can be seen as an alternative to mirror descent (MD), and contains as special cases the affine scaling algorithm, regularized Newton processes, and several other iterative solution methods. Our main result is that, modulo a non-degeneracy condition, the algorithm converges to the problem's set of critical points; hence, in the convex case, the algorithm converges globally to the problem's minimum set. In the case of linearly constrained quadratic programs (not necessarily convex), we also show that the method's convergence rate is O(1/kρ)\mathcal{O}(1/k^\rho) for some ρ(0,1]\rho\in(0,1] that depends only on the choice of kernel function (i.e., not on the problem's primitives). These theoretical results are validated by numerical experiments in standard non-convex test functions and large-scale traffic assignment problems.Comment: 27 pages, 6 figure
    corecore