55 research outputs found

    When Deep Learning Meets Polyhedral Theory: A Survey

    Full text link
    In the past decade, deep learning became the prevalent methodology for predictive modeling thanks to the remarkable accuracy of deep neural networks in tasks such as computer vision and natural language processing. Meanwhile, the structure of neural networks converged back to simpler representations based on piecewise constant and piecewise linear functions such as the Rectified Linear Unit (ReLU), which became the most commonly used type of activation function in neural networks. That made certain types of network structure \unicode{x2014}such as the typical fully-connected feedforward neural network\unicode{x2014} amenable to analysis through polyhedral theory and to the application of methodologies such as Linear Programming (LP) and Mixed-Integer Linear Programming (MILP) for a variety of purposes. In this paper, we survey the main topics emerging from this fast-paced area of work, which bring a fresh perspective to understanding neural networks in more detail as well as to applying linear optimization techniques to train, verify, and reduce the size of such networks

    Optimization under uncertainty and risk: Quadratic and copositive approaches

    Get PDF
    Robust optimization and stochastic optimization are the two main paradigms for dealing with the uncertainty inherent in almost all real-world optimization problems. The core principle of robust optimization is the introduction of parameterized families of constraints. Sometimes, these complicated semi-infinite constraints can be reduced to finitely many convex constraints, so that the resulting optimization problem can be solved using standard procedures. Hence flexibility of robust optimization is limited by certain convexity requirements on various objects. However, a recent strain of literature has sought to expand applicability of robust optimization by lifting variables to a properly chosen matrix space. Doing so allows to handle situations where convexity requirements are not met immediately, but rather intermediately. In the domain of (possibly nonconvex) quadratic optimization, the principles of copositive optimization act as a bridge leading to recovery of the desired convex structures. Copositive optimization has established itself as a powerful paradigm for tackling a wide range of quadratically constrained quadratic optimization problems, reformulating them into linear convex-conic optimization problems involving only linear constraints and objective, plus constraints forcing membership to some matrix cones, which can be thought of as generalizations of the positive-semidefinite matrix cone. These reformulations enable application of powerful optimization techniques, most notably convex duality, to problems which, in their original form, are highly nonconvex. In this text we want to offer readers an introduction and tutorial on these principles of copositive optimization, and to provide a review and outlook of the literature that applies these to optimization problems involving uncertainty

    Decomposition Methods in Column Generation and Data-Driven Stochastic Optimization

    Get PDF
    In this thesis, we are focused on tackling large-scale problems arising in two-stage stochastic optimization and the related Dantzig-Wolfe decomposition. We start with a deterministic setting, where we consider linear programs with a block-structure, but data cannot be stored centrally due to privacy concerns or decentralized storage of large datasets. The larger portion of the thesis is dedicated to the stochastic setting, where we study two-stage distributionally robust optimization under the Wasserstein ambiguity set to tackle problems with limited data. In Chapter 2, joint work with Shabbir Ahmed, we propose a fully distributed Dantzig-Wolfe decomposition (DWD) algorithm using the Alternating Direction Method of Multipliers (ADMM) method. DWD is a classical algorithm used to solve large-scale linear programs whose constraint matrix is a set of independent blocks coupled with a set of linking rows but requires to solve a master problem centrally, which can be undesirable or infeasible in certain cases due to privacy concerns or decentralized storage of data. To this end, we develop a consensus-based Dantzig-Wolfe decomposition algorithm where the master problem is solved in a distributed fashion. We detail the computational and algorithmic challenges of our method, provide bounds on the optimality gap and feasibility violation, and perform extensive computational experiments on instances of the cutting stock problem and synthetic instances using a Message Passing Interface (MPI) implementation, where we obtain high-quality solutions in reasonable time. In Chapter 3 and 4, we turn our focus to stochastic optimization, specifically applications where data is scarce and the underlying probability distribution is difficult to estimate. Chapter 3 is joint work with Anirudh Subramanyam and Kibaek Kim. Here, we consider two-stage conic DRO under the Wasserstein ambiguity set with zero-one uncertainties. We are motivated by problems arising in network optimization, where binary random variables represent failures of network components. We are interested in applications where such failures are rare and have a high impact, making it difficult to estimate failure probabilities. By using ideas from bilinear programming and penalty methods, we provide tractable approximations of our two-stage DRO model which can be iteratively improved using lift-and-project techniques. We illustrate the computational and out-of-sample performance of our method on the optimal power flow problem with random transmission line failures and a multi-commodity network design problem with random node failures. In Chapter 4, joint work with Alejandro Toriello and George Nemhauser, we study a two-stage model which arises in natural disaster management applications, where the first stage is a facility location problem, deciding where to open facilities and pre-allocate resources, and the second stage is a fixed-charge transportation problem, routing resources to affected areas after a disaster. We solve a two-stage DRO model under the Wasserstein set to deal with the lack of available data. The presence of binary variables in the second stage significantly complicates the problem. We develop an efficient column-and-constraint generation algorithm by leveraging the structure of our support set and second-stage value function, and show our results extend to the case where the second stage is a fixed-charge network flow problem. We provide a detailed discussion on our implementation, and end the chapter with computational experiments on synthetic instances and a case study of hurricane threats on the coastal states of the United States. We end the thesis with concluding remarks and potential directions for future research.Ph.D

    (Global) Optimization: Historical notes and recent developments

    Get PDF
    Recent developments in (Global) Optimization are surveyed in this paper. We collected and commented quite a large number of recent references which, in our opinion, well represent the vivacity, deepness, and width of scope of current computational approaches and theoretical results about nonconvex optimization problems. Before the presentation of the recent developments, which are subdivided into two parts related to heuristic and exact approaches, respectively, we briefly sketch the origin of the discipline and observe what, from the initial attempts, survived, what was not considered at all as well as a few approaches which have been recently rediscovered, mostly in connection with machine learning

    Personalized Data-Driven Learning and Optimization: Theory and Applications to Healthcare

    Full text link
    This dissertation is broadly about developing new personalized data-driven learning and optimization methods with theoretical performance guarantees for three important applications in healthcare operations management and medical decision-making. In these research problems, we are dealing with longitudinal settings, where the decision-maker needs to make multi-stage personalized decisions while collecting data in-between stages. In each stage, the decision-maker incorporates the newly observed data in order to update his current system's model or belief, thereby making better decisions next. This new class of data-driven learning and optimization methods indeed learns from data over time so as to make efficient and effective decisions for each individual in real-time under dynamic, uncertain environments. The theoretical contributions lie in the design and analysis of these new predictive and prescriptive learning and optimization methods and proving theoretical performance guarantees for them. The practical contributions are to apply these methods to resolve unmet real-world needs in healthcare operations management and medical decision-making so as to yield managerial and practical insights and new functionality.PHDIndustrial & Operations EngineeringUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/167949/1/keyvan_1.pd

    Exploiting Spatio-Temporal Coherence for Video Object Detection in Robotics

    Get PDF
    This paper proposes a method to enhance video object detection for indoor environments in robotics. Concretely, it exploits knowledge about the camera motion between frames to propagate previously detected objects to successive frames. The proposal is rooted in the concepts of planar homography to propose regions of interest where to find objects, and recursive Bayesian filtering to integrate observations over time. The proposal is evaluated on six virtual, indoor environments, accounting for the detection of nine object classes over a total of ∼ 7k frames. Results show that our proposal improves the recall and the F1-score by a factor of 1.41 and 1.27, respectively, as well as it achieves a significant reduction of the object categorization entropy (58.8%) when compared to a two-stage video object detection method used as baseline, at the cost of small time overheads (120 ms) and precision loss (0.92).</p
    • …