1,615 research outputs found

    A Convex Surrogate Operator for General Non-Modular Loss Functions

    Get PDF
    International audienceEmpirical risk minimization frequently employs convex surrogates to underlying discrete loss functions in order to achieve computational tractability during optimization. However, classical convex surrogates can only tightly bound modular loss functions, sub-modular functions or supermodular functions separately while maintaining polynomial time computation. In this work, a novel generic convex surrogate for general non-modular loss functions is introduced, which provides for the first time a tractable solution for loss functions that are neither super-modular nor submodular. This convex surro-gate is based on a submodular-supermodular decomposition for which the existence and uniqueness is proven in this paper. It takes the sum of two convex surrogates that separately bound the supermodular component and the submodular component using slack-rescaling and the Lovász hinge, respectively. It is further proven that this surrogate is convex , piecewise linear, an extension of the loss function, and for which subgradient computation is polynomial time. Empirical results are reported on a non-submodular loss based on the Sørensen-Dice difference function, and a real-world face track dataset with tens of thousands of frames, demonstrating the improved performance, efficiency, and scalabil-ity of the novel convex surrogate

    The Lov\'asz Hinge: A Novel Convex Surrogate for Submodular Losses

    Get PDF
    Learning with non-modular losses is an important problem when sets of predictions are made simultaneously. The main tools for constructing convex surrogate loss functions for set prediction are margin rescaling and slack rescaling. In this work, we show that these strategies lead to tight convex surrogates iff the underlying loss function is increasing in the number of incorrect predictions. However, gradient or cutting-plane computation for these functions is NP-hard for non-supermodular loss functions. We propose instead a novel surrogate loss function for submodular losses, the Lov\'asz hinge, which leads to O(p log p) complexity with O(p) oracle accesses to the loss function to compute a gradient or cutting-plane. We prove that the Lov\'asz hinge is convex and yields an extension. As a result, we have developed the first tractable convex surrogates in the literature for submodular losses. We demonstrate the utility of this novel convex surrogate through several set prediction tasks, including on the PASCAL VOC and Microsoft COCO datasets

    Synergies between Numerical Methods for Kinetic Equations and Neural Networks

    Get PDF
    The overarching theme of this work is the efficient computation of large-scale systems. Here we deal with two types of mathematical challenges, which are quite different at first glance but offer similar opportunities and challenges upon closer examination. Physical descriptions of phenomena and their mathematical modeling are performed on diverse scales, ranging from nano-scale interactions of single atoms to the macroscopic dynamics of the earth\u27s atmosphere. We consider such systems of interacting particles and explore methods to simulate them efficiently and accurately, with a focus on the kinetic and macroscopic description of interacting particle systems. Macroscopic governing equations describe the time evolution of a system in time and space, whereas the more fine-grained kinetic description additionally takes the particle velocity into account. The study of discretizing kinetic equations that depend on space, time, and velocity variables is a challenge due to the need to preserve physical solution bounds, e.g. positivity, avoiding spurious artifacts and computational efficiency. In the pursuit of overcoming the challenge of computability in both kinetic and multi-scale modeling, a wide variety of approximative methods have been established in the realm of reduced order and surrogate modeling, and model compression. For kinetic models, this may manifest in hybrid numerical solvers, that switch between macroscopic and mesoscopic simulation, asymptotic preserving schemes, that bridge the gap between both physical resolution levels, or surrogate models that operate on a kinetic level but replace computationally heavy operations of the simulation by fast approximations. Thus, for the simulation of kinetic and multi-scale systems with a high spatial resolution and long temporal horizon, the quote by Paul Dirac is as relevant as it was almost a century ago. The first goal of the dissertation is therefore the development of acceleration strategies for kinetic discretization methods, that preserve the structure of their governing equations. Particularly, we investigate the use of convex neural networks, to accelerate the minimal entropy closure method. Further, we develop a neural network-based hybrid solver for multi-scale systems, where kinetic and macroscopic methods are chosen based on local flow conditions. Furthermore, we deal with the compression and efficient computation of neural networks. In the meantime, neural networks are successfully used in different forms in countless scientific works and technical systems, with well-known applications in image recognition, and computer-aided language translation, but also as surrogate models for numerical mathematics. Although the first neural networks were already presented in the 1950s, the scientific discipline has enjoyed increasing popularity mainly during the last 15 years, since only now sufficient computing capacity is available. Remarkably, the increasing availability of computing resources is accompanied by a hunger for larger models, fueled by the common conception of machine learning practitioners and researchers that more trainable parameters equal higher performance and better generalization capabilities. The increase in model size exceeds the growth of available computing resources by orders of magnitude. Since 20122012, the computational resources used in the largest neural network models doubled every 3.43.4 months\footnote{\url{https://openai.com/blog/ai-and-compute/}}, opposed to Moore\u27s Law that proposes a 22-year doubling period in available computing power. To some extent, Dirac\u27s statement also applies to the recent computational challenges in the machine-learning community. The desire to evaluate and train on resource-limited devices sparked interest in model compression, where neural networks are sparsified or factorized, typically after training. The second goal of this dissertation is thus a low-rank method, originating from numerical methods for kinetic equations, to compress neural networks already during training by low-rank factorization. This dissertation thus considers synergies between kinetic models, neural networks, and numerical methods in both disciplines to develop time-, memory- and energy-efficient computational methods for both research areas
    • …
    corecore