Search CORE

7,152 research outputs found

Q-Learning for Continuous Actions with Cross-Entropy Guided Policies

Author: Eisner Ben
Lee Daniel
Mitchell Eric
Seung Sebastian
Simmons-Edler Riley
Publication venue
Publication date: 01/07/2019
Field of study

Off-Policy reinforcement learning (RL) is an important class of methods for many problem domains, such as robotics, where the cost of collecting data is high and on-policy methods are consequently intractable. Standard methods for applying Q-learning to continuous-valued action domains involve iteratively sampling the Q-function to find a good action (e.g. via hill-climbing), or by learning a policy network at the same time as the Q-function (e.g. DDPG). Both approaches make tradeoffs between stability, speed, and accuracy. We propose a novel approach, called Cross-Entropy Guided Policies, or CGP, that draws inspiration from both classes of techniques. CGP aims to combine the stability and performance of iterative sampling policies with the low computational cost of a policy network. Our approach trains the Q-function using iterative sampling with the Cross-Entropy Method (CEM), while training a policy network to imitate CEM's sampling behavior. We demonstrate that our method is more stable to train than state of the art policy network methods, while preserving equivalent inference time compute costs, and achieving competitive total reward on standard benchmarks

arXiv.org e-Print Archive

UMAP: Uniform Manifold Approximation and Projection for Dimension Reduction

Author: Healy John
McInnes Leland
Melville James
Publication venue
Publication date: 17/09/2020
Field of study

UMAP (Uniform Manifold Approximation and Projection) is a novel manifold learning technique for dimension reduction. UMAP is constructed from a theoretical framework based in Riemannian geometry and algebraic topology. The result is a practical scalable algorithm that applies to real world data. The UMAP algorithm is competitive with t-SNE for visualization quality, and arguably preserves more of the global structure with superior run time performance. Furthermore, UMAP has no computational restrictions on embedding dimension, making it viable as a general purpose dimension reduction technique for machine learning.Comment: Reference implementation available at http://github.com/lmcinnes/uma

arXiv.org e-Print Archive

ADMM-SOFTMAX : An ADMM Approach for Multinomial Logistic Regression

Author: Fung Samy Wu
Haber Eldad
Ruthotto Lars
Tyrväinen Sanna
Publication venue
Publication date: 11/07/2019
Field of study

We present ADMM-Softmax, an alternating direction method of multipliers (ADMM) for solving multinomial logistic regression (MLR) problems. Our method is geared toward supervised classification tasks with many examples and features. It decouples the nonlinear optimization problem in MLR into three steps that can be solved efficiently. In particular, each iteration of ADMM-Softmax consists of a linear least-squares problem, a set of independent small-scale smooth, convex problems, and a trivial dual variable update. Solution of the least-squares problem can be be accelerated by pre-computing a factorization or preconditioner, and the separability in the smooth, convex problem can be easily parallelized across examples. For two image classification problems, we demonstrate that ADMM-Softmax leads to improved generalization compared to a Newton-Krylov, a quasi Newton, and a stochastic gradient descent method

arXiv.org e-Print Archive

NeuralNetwork-Viterbi: A Framework for Weakly Supervised Video Learning

Author: Gall Juergen
Iqbal Ahsan
Kuehne Hilde
Richard Alexander
Publication venue
Publication date: 17/05/2018
Field of study

Video learning is an important task in computer vision and has experienced increasing interest over the recent years. Since even a small amount of videos easily comprises several million frames, methods that do not rely on a frame-level annotation are of special importance. In this work, we propose a novel learning algorithm with a Viterbi-based loss that allows for online and incremental learning of weakly annotated video data. We moreover show that explicit context and length modeling leads to huge improvements in video segmentation and labeling tasks andinclude these models into our framework. On several action segmentation benchmarks, we obtain an improvement of up to 10% compared to current state-of-the-art methods.Comment: CVPR 201

arXiv.org e-Print Archive

Stochastic Nonconvex Optimization with Large Minibatches

Author: Srebro Nathan
Wang Weiran
Publication venue
Publication date: 08/03/2019
Field of study

We study stochastic optimization of nonconvex loss functions, which are typical objectives for training neural networks. We propose stochastic approximation algorithms which optimize a series of regularized, nonlinearized losses on large minibatches of samples, using only first-order gradient information. Our algorithms provably converge to an approximate critical point of the expected objective with faster rates than minibatch stochastic gradient descent, and facilitate better parallelization by allowing larger minibatches.Comment: Accepted by the ALT 201

arXiv.org e-Print Archive

Algorithm Runtime Prediction: Methods & Evaluation

Author: Hoos Holger H.
Hutter Frank
Leyton-Brown Kevin
Xu Lin
Publication venue
Publication date: 26/10/2013
Field of study

Perhaps surprisingly, it is possible to predict how long an algorithm will take to run on a previously unseen input, using machine learning techniques to build a model of the algorithm's runtime as a function of problem-specific instance features. Such models have important applications to algorithm analysis, portfolio-based algorithm selection, and the automatic configuration of parameterized algorithms. Over the past decade, a wide variety of techniques have been studied for building such models. Here, we describe extensions and improvements of existing models, new families of models, and -- perhaps most importantly -- a much more thorough treatment of algorithm parameters as model inputs. We also comprehensively describe new and existing features for predicting algorithm runtime for propositional satisfiability (SAT), travelling salesperson (TSP) and mixed integer programming (MIP) problems. We evaluate these innovations through the largest empirical analysis of its kind, comparing to a wide range of runtime modelling techniques from the literature. Our experiments consider 11 algorithms and 35 instance distributions; they also span a very wide range of SAT, MIP, and TSP instances, with the least structured having been generated uniformly at random and the most structured having emerged from real industrial applications. Overall, we demonstrate that our new models yield substantially better runtime predictions than previous approaches in terms of their generalization to new problem instances, to new algorithms from a parameterized space, and to both simultaneously.Comment: 51 pages, 13 figures, 8 tables. Added references, feature cost, and experiments with subsets of features; reworded Sections 1&

arXiv.org e-Print Archive

Gradient Hyperalignment for multi-subject fMRI data alignment

Author: Xu Tonglin
Yousefnezhad Muhammad
Zhang Daoqiang
Publication venue
Publication date: 07/07/2018
Field of study

Multi-subject fMRI data analysis is an interesting and challenging problem in human brain decoding studies. The inherent anatomical and functional variability across subjects make it necessary to do both anatomical and functional alignment before classification analysis. Besides, when it comes to big data, time complexity becomes a problem that cannot be ignored. This paper proposes Gradient Hyperalignment (Gradient-HA) as a gradient-based functional alignment method that is suitable for multi-subject fMRI datasets with large amounts of samples and voxels. The advantage of Gradient-HA is that it can solve independence and high dimension problems by using Independent Component Analysis (ICA) and Stochastic Gradient Ascent (SGA). Validation using multi-classification tasks on big data demonstrates that Gradient-HA method has less time complexity and better or comparable performance compared with other state-of-the-art functional alignment methods.Comment: 15th Pacific Rim International Conference on Artificial Intelligence (PRICAI 2018), Nanjing, China, August 28-31, 201

arXiv.org e-Print Archive

The Full Spectrum of Deepnet Hessians at Scale: Dynamics with SGD Training and Sample Size

Author: Papyan Vardan
Publication venue
Publication date: 02/06/2019
Field of study

We apply state-of-the-art tools in modern high-dimensional numerical linear algebra to approximate efficiently the spectrum of the Hessian of modern deepnets, with tens of millions of parameters, trained on real data. Our results corroborate previous findings, based on small-scale networks, that the Hessian exhibits "spiked" behavior, with several outliers isolated from a continuous bulk. We decompose the Hessian into different components and study the dynamics with training and sample size of each term individually

arXiv.org e-Print Archive

Automated Synthesis of Safe Digital Controllers for Sampled-Data Stochastic Nonlinear Systems

Author: Bartocci Ezio
Lin Shan
Paoletti Nicola
Shmarov Fedor
Smolka Scott A.
Soudjani Sadegh
Zuliani Paolo
Publication venue
Publication date: 10/01/2019
Field of study

We present a new method for the automated synthesis of digital controllers with formal safety guarantees for systems with nonlinear dynamics, noisy output measurements, and stochastic disturbances. Our method derives digital controllers such that the corresponding closed-loop system, modeled as a sampled-data stochastic control system, satisfies a safety specification with probability above a given threshold. The proposed synthesis method alternates between two steps: generation of a candidate controller pc, and verification of the candidate. pc is found by maximizing a Monte Carlo estimate of the safety probability, and by using a non-validated ODE solver for simulating the system. Such a candidate is therefore sub-optimal but can be generated very rapidly. To rule out unstable candidate controllers, we prove and utilize Lyapunov's indirect method for instability of sampled-data nonlinear systems. In the subsequent verification step, we use a validated solver based on SMT (Satisfiability Modulo Theories) to compute a numerically and statistically valid confidence interval for the safety probability of pc. If the probability so obtained is not above the threshold, we expand the search space for candidates by increasing the controller degree. We evaluate our technique on three case studies: an artificial pancreas model, a powertrain control model, and a quadruple-tank process.Comment: 12 pages, 4 figures, 4 table

arXiv.org e-Print Archive

Proximal Backpropagation

Author: Cremers Daniel
Frerix Thomas
Moeller Michael
Möllenhoff Thomas
Publication venue
Publication date: 20/02/2018
Field of study

We propose proximal backpropagation (ProxProp) as a novel algorithm that takes implicit instead of explicit gradient steps to update the network parameters during neural network training. Our algorithm is motivated by the step size limitation of explicit gradient descent, which poses an impediment for optimization. ProxProp is developed from a general point of view on the backpropagation algorithm, currently the most common technique to train neural networks via stochastic gradient descent and variants thereof. Specifically, we show that backpropagation of a prediction error is equivalent to sequential gradient descent steps on a quadratic penalty energy, which comprises the network activations as variables of the optimization. We further analyze theoretical properties of ProxProp and in particular prove that the algorithm yields a descent direction in parameter space and can therefore be combined with a wide variety of convergent algorithms. Finally, we devise an efficient numerical implementation that integrates well with popular deep learning frameworks. We conclude by demonstrating promising numerical results and show that ProxProp can be effectively combined with common first order optimizers such as Adam.Comment: Published as a conference paper at ICLR 201

arXiv.org e-Print Archive