98 research outputs found
An extension of the projected gradient method to a Banach space setting with application in structural topology optimization
For the minimization of a nonlinear cost functional under convex
constraints the relaxed projected gradient process is
a well known method. The analysis is classically performed in a Hilbert space
. We generalize this method to functionals which are differentiable in a
Banach space. Thus it is possible to perform e.g. an gradient method if
is only differentiable in . We show global convergence using
Armijo backtracking in and allow the inner product and the scaling
to change in every iteration. As application we present a
structural topology optimization problem based on a phase field model, where
the reduced cost functional is differentiable in . The
presented numerical results using the inner product and a pointwise
chosen metric including second order information show the expected mesh
independency in the iteration numbers. The latter yields an additional, drastic
decrease in iteration numbers as well as in computation time. Moreover we
present numerical results using a BFGS update of the inner product for
further optimization problems based on phase field models
Recommended from our members
Convex Optimization and Extensions, with a View Toward Large-Scale Problems
Machine learning is a major source of interesting optimization problems of current interest. These problems tend to be challenging because of their enormous scale, which makes it difficult to apply traditional optimization algorithms. We explore three avenues to designing algorithms suited to handling these challenges, with a view toward large-scale ML tasks. The first is to develop better general methods for unconstrained minimization. The second is to tailor methods to the features of modern systems, namely the availability of distributed computing. The third is to use specialized algorithms to exploit specific problem structure.
Chapters 2 and 3 focus on improving quasi-Newton methods, a mainstay of unconstrained optimization. In Chapter 2, we analyze an extension of quasi-Newton methods wherein we use block updates, which add curvature information to the Hessian approximation on a higher-dimensional subspace. This defines a family of methods, Block BFGS, that form a spectrum between the classical BFGS method and Newton's method, in terms of the amount of curvature information used. We show that by adding a correction step, the Block BFGS method inherits the convergence guarantees of BFGS for deterministic problems, most notably a Q-superlinear convergence rate for strongly convex problems. To explore the tradeoff between reduced iterations and greater work per iteration of block methods, we present a set of numerical experiments.
In Chapter 3, we focus on the problem of step size determination. To obviate the need for line searches, and for pre-computing fixed step sizes, we derive an analytic step size, which we call curvature-adaptive, for self-concordant functions. This adaptive step size allows us to generalize the damped Newton method of Nesterov to other iterative methods, including gradient descent and quasi-Newton methods. We provide simple proofs of convergence, including superlinear convergence for adaptive BFGS, allowing us to obtain superlinear convergence without line searches.
In Chapter 4, we move from general algorithms to hardware-influenced algorithms. We consider a form of distributed stochastic gradient descent that we call Leader SGD, which is inspired by the Elastic Averaging SGD method. These methods are intended for distributed settings where communication between machines may be expensive, making it important to set their consensus mechanism. We show that LSGD avoids an issue with spurious stationary points that affects EASGD, and provide a convergence analysis of LSGD. In the stochastic strongly convex setting, LSGD converges at the rate O(1/k) with diminishing step sizes, matching other distributed methods. We also analyze the impact of varying communication delays, stochasticity in the selection of the leader points, and under what conditions LSGD may produce better search directions than the gradient alone.
In Chapter 5, we switch again to focus on algorithms to exploit problem structure. Specifically, we consider problems where variables satisfy multiaffine constraints, which motivates us to apply the Alternating Direction Method of Multipliers (ADMM). Problems that can be formulated with such a structure include representation learning (e.g with dictionaries) and deep learning. We show that ADMM can be applied directly to multiaffine problems. By extending the theory of nonconvex ADMM, we prove that ADMM is convergent on multiaffine problems satisfying certain assumptions, and more broadly, analyze the theoretical properties of ADMM for general problems, investigating the effect of different types of structure
A Distributed Newton Method for Network Utility Maximization
Most existing work uses dual decomposition and subgradient methods to solve
Network Utility Maximization (NUM) problems in a distributed manner, which
suffer from slow rate of convergence properties. This work develops an
alternative distributed Newton-type fast converging algorithm for solving
network utility maximization problems with self-concordant utility functions.
By using novel matrix splitting techniques, both primal and dual updates for
the Newton step can be computed using iterative schemes in a decentralized
manner with limited information exchange. Similarly, the stepsize can be
obtained via an iterative consensus-based averaging scheme. We show that even
when the Newton direction and the stepsize in our method are computed within
some error (due to finite truncation of the iterative schemes), the resulting
objective function value still converges superlinearly to an explicitly
characterized error neighborhood. Simulation results demonstrate significant
convergence rate improvement of our algorithm relative to the existing
subgradient methods based on dual decomposition.Comment: 27 pages, 4 figures, LIDS report, submitted to CDC 201
Distributed Newton-type algorithms for network resource allocation
Thesis (S.M.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2010.Cataloged from PDF version of thesis.Includes bibliographical references (p. 99-101).Most of today's communication networks are large-scale and comprise of agents with local information and heterogeneous preferences, making centralized control and coordination impractical. This motivated much interest in developing and studying distributed algorithms for network resource allocation problems, such as Internet routing, data collection and processing in sensor networks, and cross-layer communication network design. Existing works on network resource allocation problems rely on using dual decomposition and first-order (gradient or subgradient) methods, which involve simple computations and can be implemented in a distributed manner, yet suffer from slow rate of convergence. Second-order methods are faster, but their direct implementation requires computation intensive matrix inversion operations, which couple information across the network, hence cannot be implemented in a decentralized way. This thesis develops and analyzes Newton-type (second-order) distributed methods for network resource allocation problems. In particular, we focus on two general formulations: Network Utility Maximization (NUM), and network flow cost minimization problems. For NUM problems, we develop a distributed Newton-type fast converging algorithm using the properties of self-concordant utility functions. Our algorithm utilizes novel matrix splitting techniques, which enable both primal and dual Newton steps to be computed using iterative schemes in a decentralized manner with limited information exchange. Moreover, the step-size used in our method can be obtained via an iterative consensus-based averaging scheme. We show that even when the Newton direction and the step-size in our method are computed within some error (due to finite truncation of the iterative schemes), the resulting objective function value still converges superlinearly to an explicitly characterized error neighborhood. Simulation results demonstrate significant convergence rate improvement of our algorithm relative to the existing subgradient methods based on dual decomposition. The second part of the thesis presents a distributed approach based on a Newtontype method for solving network flow cost minimization problems. The key component of our method is to represent the dual Newton direction as the limit of an iterative procedure involving the graph Laplacian, which can be implemented based only on local information. Using standard Lipschitz conditions, we provide analysis for the convergence properties of our algorithm and show that the method converges superlinearly to an explicitly characterized error neighborhood, even when the iterative schemes used for computing the Newton direction and the stepsize are truncated. We also present some simulation results to illustrate the significant performance gains of this method over the subgradient methods currently used.by Ermin Wei.S.M
- β¦