87 research outputs found

    Primal-Dual Rates and Certificates

    Get PDF
    We propose an algorithm-independent framework to equip existing optimization methods with primal-dual certificates. Such certificates and corresponding rate of convergence guarantees are important for practitioners to diagnose progress, in particular in machine learning applications. We obtain new primal-dual convergence rates, e.g., for the Lasso as well as many L1, Elastic Net, group Lasso and TV-regularized problems. The theory applies to any norm-regularized generalized linear model. Our approach provides efficiently computable duality gaps which are globally defined, without modifying the original problems in the region of interest.Comment: appearing at ICML 2016 - Proceedings of the 33rd International Conference on Machine Learning, New York, NY, USA, 2016. JMLR: W&CP volume 4

    Efficient and Modular Implicit Differentiation

    Full text link
    Automatic differentiation (autodiff) has revolutionized machine learning. It allows expressing complex computations by composing elementary ones in creative ways and removes the burden of computing their derivatives by hand. More recently, differentiation of optimization problem solutions has attracted widespread attention with applications such as optimization as a layer, and in bi-level problems such as hyper-parameter optimization and meta-learning. However, the formulas for these derivatives often involve case-by-case tedious mathematical derivations. In this paper, we propose a unified, efficient and modular approach for implicit differentiation of optimization problems. In our approach, the user defines (in Python in the case of our implementation) a function FF capturing the optimality conditions of the problem to be differentiated. Once this is done, we leverage autodiff of FF and implicit differentiation to automatically differentiate the optimization problem. Our approach thus combines the benefits of implicit differentiation and autodiff. It is efficient as it can be added on top of any state-of-the-art solver and modular as the optimality condition specification is decoupled from the implicit differentiation mechanism. We show that seemingly simple principles allow to recover many recently proposed implicit differentiation methods and create new ones easily. We demonstrate the ease of formulating and solving bi-level optimization problems using our framework. We also showcase an application to the sensitivity analysis of molecular dynamics.Comment: V2: some corrections and link to softwar

    Acceleration Methods for Classic Convex Optimization Algorithms

    Full text link
    Tesis doctoral inédita leída en la Universidad Autónoma de Madrid, Escuela Politécnica Superior, Departamento de Ingeniería Informática. Fecha de lectura : 12-09-2017Most Machine Learning models are defined in terms of a convex optimization problem. Thus, developing algorithms to quickly solve such problems its of great interest to the field. We focus in this thesis on two of the most widely used models, the Lasso and Support Vector Machines. The former belongs to the family of regularization methods, and it was introduced in 1996 to perform both variable selection and regression at the same time. This is accomplished by adding a `1-regularization term to the least squares model, achieving interpretability and also a good generalization error. Support Vector Machines were originally formulated to solve a classification problem by finding the maximum-margin hyperplane, that is, the hyperplane which separates two sets of points and its at equal distance from both of them. SVMs were later extended to handle non-separable classes and non-linear classification problems, applying the kernel-trick. A first contribution of this work is to carefully analyze all the existing algorithms to solve both problems, describing not only the theory behind them but also pointing out possible advantages and disadvantages of each one. Although the Lasso and SVMs solve very different problems, we show in this thesis that they are both equivalent. Following a recent result by Jaggi, given an instance of one model we can construct an instance of the other having the same solution, and vice versa. This equivalence allows us to translate theoretical and practical results, such as algorithms, from one field to the other, that have been otherwise being developed independently. We will give in this thesis not only the theoretical result but also a practical application, that consists on solving the Lasso problem using the SMO algorithm, the state-of-the-art solver for non-linear SVMs. We also perform experiments comparing SMO to GLMNet, one of the most popular solvers for the Lasso. The results obtained show that SMO is competitive with GLMNet, and sometimes even faster. Furthermore, motivated by a recent trend where classical optimization methods are being re-discovered in improved forms and successfully applied to many problems, we have also analyzed two classical momentum-based methods: the Heavy Ball algorithm, introduced by Polyak in 1963 and Nesterov’s Accelerated Gradient, discovered by Nesterov in 1983. In this thesis we develop practical versions of Conjugate Gradient, which is essentially equivalent to the Heavy Ball method, and Nesterov’s Acceleration for the SMO algorithm. Experiments comparing the convergence of all the methods are also carried out. The results show that the proposed algorithms can achieve a faster convergence both in terms of iterations and execution time.La mayoría de modelos de Aprendizaje Automático se definen en términos de un problema de optimización convexo. Por tanto, desarrollar algoritmos para resolver rápidamente dichos problemas es de gran interés para este campo. En esta tesis nos centramos en dos de los modelos más usados, Lasso y Support Vector Machines. El primero pertenece a la familia de métodos de regularización, y fue introducido en 1996 para realizar selección de características y regresión al mismo tiempo. Esto se consigue añadiendo una penalización `1al modelo de mínimos cuadrados, obteniendo interpretabilidad y un buen error de generalización. Las Máquinas de Vectores de Soporte fueron formuladas originalmente para resolver un problema de clasificación buscando el hiper-plano de máximo margen, es decir, el hiper-plano que separa los dos conjuntos de puntos y está a la misma distancia de ambos. Las SVMs se han extendido posteriormente para manejar clases no separables y problemas de clasificación no lineales, mediante el uso de núcleos. Una primera contribución de este trabajo es analizar cuidadosamente los algoritmos existentes para resolver ambos problemas, describiendo no solo la teoría detrás de los mismos sino también mencionando las posibles ventajas y desventajas de cada uno. A pesar de que el Lasso y las SVMs resuelven problemas muy diferentes, en esta tesis demostramos que ambos son equivalentes. Continuando con un resultado reciente de Jaggi, dada una instancia de uno de los modelos podemos construir una instancia del otro que tiene la misma solución, y viceversa. Esta equivalencia nos permite trasladar resultados teóricos y prácticos, como por ejemplo algoritmos, de un campo al otro, que se han desarrollado de forma independiente. En esta tesis mostraremos no solo la equivalencia teórica sino también una aplicación práctica, que consiste en resolver el problema Lasso usando el algoritmo SMO, que es el estado del arte para la resolución de SVM no lineales. También realizamos experimentos comparando SMO a GLMNet, uno de los algoritmos más populares para resolver el Lasso. Los resultados obtenidos muestran que SMO es competitivo con GLMNet, y en ocasiones incluso más rápido. Además, motivado por una tendencia reciente donde métodos clásicos de optimización se están re- descubriendo y aplicando satisfactoriamente en muchos problemas, también hemos analizado dos métodos clásicos basados en “momento”: el algoritmo Heavy Ball, creado por Polyak en 1963 y el Gradiente Acelerado de Nesterov, descubierto por Nesterov en 1983. En esta tesis desarrollamos versiones prácticas de Gradiente Conjugado, que es equivalente a Heavy Ball, y Aceleración de Nesterov para el algortimo SMO. Además, también se realizan experimentos comparando todos los métodos. Los resultados muestran que los algoritmos propuestos a menudo convergen más rápido, tanto en términos de iteraciones como de tiempo de ejecución

    Network-Based Biomarker Discovery : Development of Prognostic Biomarkers for Personalized Medicine by Integrating Data and Prior Knowledge

    Get PDF
    Advances in genome science and technology offer a deeper understanding of biology while at the same time improving the practice of medicine. The expression profiling of some diseases, such as cancer, allows for identifying marker genes, which could be able to diagnose a disease or predict future disease outcomes. Marker genes (biomarkers) are selected by scoring how well their expression levels can discriminate between different classes of disease or between groups of patients with different clinical outcome (e.g. therapy response, survival time, etc.). A current challenge is to identify new markers that are directly related to the underlying disease mechanism

    Building Predictive Models in R Using the caret Package

    Get PDF
    The caret package, short for classification and regression training, contains numerous tools for developing predictive models using the rich set of models available in R. The package focuses on simplifying model training and tuning across a wide variety of modeling techniques. It also includes methods for pre-processing training data, calculating variable importance, and model visualizations. An example from computational chemistry is used to illustrate the functionality on a real data set and to benchmark the benefits of parallel processing with several types of models.
    corecore