Search CORE

1,729 research outputs found

AutoLoss: Learning Discrete Schedules for Alternate Optimization

Author: Hu Zhiting
Liang Xiaodan
Salakhutdinov Ruslan
Xing Eric
Xu Haowen
Zhang Hao
Publication venue
Publication date: 04/10/2018
Field of study

Many machine learning problems involve iteratively and alternately optimizing different task objectives with respect to different sets of parameters. Appropriately scheduling the optimization of a task objective or a set of parameters is usually crucial to the quality of convergence. In this paper, we present AutoLoss, a meta-learning framework that automatically learns and determines the optimization schedule. AutoLoss provides a generic way to represent and learn the discrete optimization schedule from metadata, allows for a dynamic and data-driven schedule in ML problems that involve alternating updates of different parameters or from different loss objectives. We apply AutoLoss on four ML tasks: d-ary quadratic regression, classification using a multi-layer perceptron (MLP), image generation using GANs, and multi-task neural machine translation (NMT). We show that the AutoLoss controller is able to capture the distribution of better optimization schedules that result in higher quality of convergence on all four tasks. The trained AutoLoss controller is generalizable -- it can guide and improve the learning of a new task model with different specifications, or on different datasets.Comment: 19-pages manuscripts. The first two authors contributed equall

arXiv.org e-Print Archive

Approaches for MATLAB Applications Acceleration Using High Performance Reconfigurable Computers

Author: Merchant Saumil Girish
Publication venue: TRACE: Tennessee Research and Creative Exchange
Publication date: 01/08/2003
Field of study

A lot of raw computing power is needed in many scientific computing applications and simulations. MATLAB®† is one of the popular choices as a language for technical computing. Presented here are approaches for MATLAB based applications acceleration using High Performance Reconfigurable Computing (HPRC) machines. Typically, these are a cluster of Von Neumann architecture based systems with none or more FPGA reconfigurable boards. As a case study, an Image Correlation Algorithm has been ported on this architecture platform. As a second case study, the recursive training process in an Artificial Neural Network (ANN) to realize an optimum network has been accelerated, by porting it to HPC Systems. The approaches taken are analyzed with respect to target scenarios, end users perspective, programming efficiency and performance. Disclaimer: Some material in this text has been used and reproduced with appropriate references and permissions where required. † MATLAB® is a registered trademark of The Mathworks, Inc. ©1994-2003

Deterministic Non-Autoregressive Neural Sequence Modeling by Iterative Refinement

Author: Cho Kyunghyun
Lee Jason
Mansimov Elman
Publication venue
Publication date: 27/08/2018
Field of study

We propose a conditional non-autoregressive neural sequence model based on iterative refinement. The proposed model is designed based on the principles of latent variable models and denoising autoencoders, and is generally applicable to any sequence generation task. We extensively evaluate the proposed model on machine translation (En-De and En-Ro) and image caption generation, and observe that it significantly speeds up decoding while maintaining the generation quality comparable to the autoregressive counterpart.Comment: Accepted to EMNLP'1

arXiv.org e-Print Archive

Similarity Analysis of Contextual Word Representation Models

Author: Belinkov Yonatan
Dalvi Fahim
Durrani Nadir
Glass James
Sajjad Hassan
Wu John M.
Publication venue
Publication date: 03/05/2020
Field of study

This paper investigates contextual word representation models from the lens of similarity analysis. Given a collection of trained models, we measure the similarity of their internal representations and attention. Critically, these models come from vastly different architectures. We use existing and novel similarity measures that aim to gauge the level of localization of information in the deep models, and facilitate the investigation of which design factors affect model similarity, without requiring any external linguistic annotation. The analysis reveals that models within the same family are more similar to one another, as may be expected. Surprisingly, different architectures have rather similar representations, but different individual neurons. We also observed differences in information localization in lower and higher layers and found that higher layers are more affected by fine-tuning on downstream tasks.Comment: Accepted to ACL 202

arXiv.org e-Print Archive

Evolving Culture vs Local Minima

Author: Bengio Yoshua
Publication venue
Publication date: 29/11/2012
Field of study

We propose a theory that relates difficulty of learning in deep architectures to culture and language. It is articulated around the following hypotheses: (1) learning in an individual human brain is hampered by the presence of effective local minima; (2) this optimization difficulty is particularly important when it comes to learning higher-level abstractions, i.e., concepts that cover a vast and highly-nonlinear span of sensory configurations; (3) such high-level abstractions are best represented in brains by the composition of many levels of representation, i.e., by deep architectures; (4) a human brain can learn such high-level abstractions if guided by the signals produced by other humans, which act as hints or indirect supervision for these high-level abstractions; and (5), language and the recombination and optimization of mental concepts provide an efficient evolutionary recombination operator, and this gives rise to rapid search in the space of communicable ideas that help humans build up better high-level internal representations of their world. These hypotheses put together imply that human culture and the evolution of ideas have been crucial to counter an optimization difficulty: this optimization difficulty would otherwise make it very difficult for human brains to capture high-level knowledge of the world. The theory is grounded in experimental observations of the difficulties of training deep artificial neural networks. Plausible consequences of this theory for the efficiency of cultural evolutions are sketched

arXiv.org e-Print Archive

Predict Responsibly: Improving Fairness and Accuracy by Learning to Defer

Author: Madras David
Pitassi Toniann
Zemel Richard
Publication venue
Publication date: 06/09/2018
Field of study

In many machine learning applications, there are multiple decision-makers involved, both automated and human. The interaction between these agents often goes unaddressed in algorithmic development. In this work, we explore a simple version of this interaction with a two-stage framework containing an automated model and an external decision-maker. The model can choose to say "Pass", and pass the decision downstream, as explored in rejection learning. We extend this concept by proposing "learning to defer", which generalizes rejection learning by considering the effect of other agents in the decision-making process. We propose a learning algorithm which accounts for potential biases held by external decision-makers in a system. Experiments demonstrate that learning to defer can make systems not only more accurate but also less biased. Even when working with inconsistent or biased users, we show that deferring models still greatly improve the accuracy and/or fairness of the entire system.Comment: Accepted as a conference paper at Neural Information Processing Systems 201

arXiv.org e-Print Archive

Learning Simple Algorithms from Examples

Author: Fergus Rob
Joulin Armand
Mikolov Tomas
Zaremba Wojciech
Publication venue
Publication date: 23/11/2015
Field of study

We present an approach for learning simple algorithms such as copying, multi-digit addition and single digit multiplication directly from examples. Our framework consists of a set of interfaces, accessed by a controller. Typical interfaces are 1-D tapes or 2-D grids that hold the input and output data. For the controller, we explore a range of neural network-based models which vary in their ability to abstract the underlying algorithm from training instances and generalize to test examples with many thousands of digits. The controller is trained using

Q

-learning with several enhancements and we show that the bottleneck is in the capabilities of the controller rather than in the search incurred by

Q

-learning

arXiv.org e-Print Archive

GCN-RL Circuit Designer: Transferable Transistor Sizing with Graph Neural Networks and Reinforcement Learning

Author: Han Song
Lee Hae-Seung
Shen Linxiao
Sun Nan
Wang Hanrui
Wang Kuan
Yang Jiacheng
Publication venue
Publication date: 30/04/2020
Field of study

Automatic transistor sizing is a challenging problem in circuit design due to the large design space, complex performance trade-offs, and fast technological advancements. Although there has been plenty of work on transistor sizing targeting on one circuit, limited research has been done on transferring the knowledge from one circuit to another to reduce the re-design overhead. In this paper, we present GCN-RL Circuit Designer, leveraging reinforcement learning (RL) to transfer the knowledge between different technology nodes and topologies. Moreover, inspired by the simple fact that circuit is a graph, we learn on the circuit topology representation with graph convolutional neural networks (GCN). The GCN-RL agent extracts features of the topology graph whose vertices are transistors, edges are wires. Our learning-based optimization consistently achieves the highest Figures of Merit (FoM) on four different circuits compared with conventional black-box optimization methods (Bayesian Optimization, Evolutionary Algorithms), random search, and human expert designs. Experiments on transfer learning between five technology nodes and two circuit topologies demonstrate that RL with transfer learning can achieve much higher FoMs than methods without knowledge transfer. Our transferable optimization method makes transistor sizing and design porting more effective and efficient.Comment: Accepted to the 57th Design Automation Conference (DAC 2020); 6 pages, 8 figure

arXiv.org e-Print Archive

A PAC-Bayesian Analysis of Randomized Learning with Application to Stochastic Gradient Descent

Author: London Ben
Publication venue
Publication date: 20/06/2020
Field of study

We study the generalization error of randomized learning algorithms -- focusing on stochastic gradient descent (SGD) -- using a novel combination of PAC-Bayes and algorithmic stability. Importantly, our generalization bounds hold for all posterior distributions on an algorithm's random hyperparameters, including distributions that depend on the training data. This inspires an adaptive sampling algorithm for SGD that optimizes the posterior at runtime. We analyze this algorithm in the context of our generalization bounds and evaluate it on a benchmark dataset. Our experiments demonstrate that adaptive sampling can reduce empirical risk faster than uniform sampling while also improving out-of-sample accuracy.Comment: In Neural Information Processing Systems (NIPS) 2017. The latest version specifies that the references to Kuzborskij & Lampert (2017) are for v2 of their manuscript, which was posted to arXiv in March, 2017. Importantly, Theorem 3 therein (a stability bound for convex losses) has a different form than the final versio

arXiv.org e-Print Archive

Low-pass Recurrent Neural Networks - A memory architecture for longer-term correlation discovery

Author: Dabney Will
Jayakumar Siddhant M.
Munos Remi
Pascanu Razvan
Soyer Hubert
Stepleton Thomas
Publication venue
Publication date: 13/05/2018
Field of study

Reinforcement learning (RL) agents performing complex tasks must be able to remember observations and actions across sizable time intervals. This is especially true during the initial learning stages, when exploratory behaviour can increase the delay between specific actions and their effects. Many new or popular approaches for learning these distant correlations employ backpropagation through time (BPTT), but this technique requires storing observation traces long enough to span the interval between cause and effect. Besides memory demands, learning dynamics like vanishing gradients and slow convergence due to infrequent weight updates can reduce BPTT's practicality; meanwhile, although online recurrent network learning is a developing topic, most approaches are not efficient enough to use as replacements. We propose a simple, effective memory strategy that can extend the window over which BPTT can learn without requiring longer traces. We explore this approach empirically on a few tasks and discuss its implications

arXiv.org e-Print Archive