1,729 research outputs found
AutoLoss: Learning Discrete Schedules for Alternate Optimization
Many machine learning problems involve iteratively and alternately optimizing
different task objectives with respect to different sets of parameters.
Appropriately scheduling the optimization of a task objective or a set of
parameters is usually crucial to the quality of convergence. In this paper, we
present AutoLoss, a meta-learning framework that automatically learns and
determines the optimization schedule. AutoLoss provides a generic way to
represent and learn the discrete optimization schedule from metadata, allows
for a dynamic and data-driven schedule in ML problems that involve alternating
updates of different parameters or from different loss objectives. We apply
AutoLoss on four ML tasks: d-ary quadratic regression, classification using a
multi-layer perceptron (MLP), image generation using GANs, and multi-task
neural machine translation (NMT). We show that the AutoLoss controller is able
to capture the distribution of better optimization schedules that result in
higher quality of convergence on all four tasks. The trained AutoLoss
controller is generalizable -- it can guide and improve the learning of a new
task model with different specifications, or on different datasets.Comment: 19-pages manuscripts. The first two authors contributed equall
Approaches for MATLAB Applications Acceleration Using High Performance Reconfigurable Computers
A lot of raw computing power is needed in many scientific computing applications and simulations. MATLAB®†is one of the popular choices as a language for technical computing. Presented here are approaches for MATLAB based applications acceleration using High Performance Reconfigurable Computing (HPRC) machines. Typically, these are a cluster of Von Neumann architecture based systems with none or more FPGA reconfigurable boards. As a case study, an Image Correlation Algorithm has been ported on this architecture platform. As a second case study, the recursive training process in an Artificial Neural Network (ANN) to realize an optimum network has been accelerated, by porting it to HPC Systems. The approaches taken are analyzed with respect to target scenarios, end users perspective, programming efficiency and performance. Disclaimer: Some material in this text has been used and reproduced with appropriate references and permissions where required. †MATLAB® is a registered trademark of The Mathworks, Inc. ©1994-2003
Deterministic Non-Autoregressive Neural Sequence Modeling by Iterative Refinement
We propose a conditional non-autoregressive neural sequence model based on
iterative refinement. The proposed model is designed based on the principles of
latent variable models and denoising autoencoders, and is generally applicable
to any sequence generation task. We extensively evaluate the proposed model on
machine translation (En-De and En-Ro) and image caption generation, and observe
that it significantly speeds up decoding while maintaining the generation
quality comparable to the autoregressive counterpart.Comment: Accepted to EMNLP'1
Similarity Analysis of Contextual Word Representation Models
This paper investigates contextual word representation models from the lens
of similarity analysis. Given a collection of trained models, we measure the
similarity of their internal representations and attention. Critically, these
models come from vastly different architectures. We use existing and novel
similarity measures that aim to gauge the level of localization of information
in the deep models, and facilitate the investigation of which design factors
affect model similarity, without requiring any external linguistic annotation.
The analysis reveals that models within the same family are more similar to one
another, as may be expected. Surprisingly, different architectures have rather
similar representations, but different individual neurons. We also observed
differences in information localization in lower and higher layers and found
that higher layers are more affected by fine-tuning on downstream tasks.Comment: Accepted to ACL 202
Evolving Culture vs Local Minima
We propose a theory that relates difficulty of learning in deep architectures
to culture and language. It is articulated around the following hypotheses: (1)
learning in an individual human brain is hampered by the presence of effective
local minima; (2) this optimization difficulty is particularly important when
it comes to learning higher-level abstractions, i.e., concepts that cover a
vast and highly-nonlinear span of sensory configurations; (3) such high-level
abstractions are best represented in brains by the composition of many levels
of representation, i.e., by deep architectures; (4) a human brain can learn
such high-level abstractions if guided by the signals produced by other humans,
which act as hints or indirect supervision for these high-level abstractions;
and (5), language and the recombination and optimization of mental concepts
provide an efficient evolutionary recombination operator, and this gives rise
to rapid search in the space of communicable ideas that help humans build up
better high-level internal representations of their world. These hypotheses put
together imply that human culture and the evolution of ideas have been crucial
to counter an optimization difficulty: this optimization difficulty would
otherwise make it very difficult for human brains to capture high-level
knowledge of the world. The theory is grounded in experimental observations of
the difficulties of training deep artificial neural networks. Plausible
consequences of this theory for the efficiency of cultural evolutions are
sketched
Predict Responsibly: Improving Fairness and Accuracy by Learning to Defer
In many machine learning applications, there are multiple decision-makers
involved, both automated and human. The interaction between these agents often
goes unaddressed in algorithmic development. In this work, we explore a simple
version of this interaction with a two-stage framework containing an automated
model and an external decision-maker. The model can choose to say "Pass", and
pass the decision downstream, as explored in rejection learning. We extend this
concept by proposing "learning to defer", which generalizes rejection learning
by considering the effect of other agents in the decision-making process. We
propose a learning algorithm which accounts for potential biases held by
external decision-makers in a system. Experiments demonstrate that learning to
defer can make systems not only more accurate but also less biased. Even when
working with inconsistent or biased users, we show that deferring models still
greatly improve the accuracy and/or fairness of the entire system.Comment: Accepted as a conference paper at Neural Information Processing
Systems 201
Learning Simple Algorithms from Examples
We present an approach for learning simple algorithms such as copying,
multi-digit addition and single digit multiplication directly from examples.
Our framework consists of a set of interfaces, accessed by a controller.
Typical interfaces are 1-D tapes or 2-D grids that hold the input and output
data. For the controller, we explore a range of neural network-based models
which vary in their ability to abstract the underlying algorithm from training
instances and generalize to test examples with many thousands of digits. The
controller is trained using -learning with several enhancements and we show
that the bottleneck is in the capabilities of the controller rather than in the
search incurred by -learning
GCN-RL Circuit Designer: Transferable Transistor Sizing with Graph Neural Networks and Reinforcement Learning
Automatic transistor sizing is a challenging problem in circuit design due to
the large design space, complex performance trade-offs, and fast technological
advancements. Although there has been plenty of work on transistor sizing
targeting on one circuit, limited research has been done on transferring the
knowledge from one circuit to another to reduce the re-design overhead. In this
paper, we present GCN-RL Circuit Designer, leveraging reinforcement learning
(RL) to transfer the knowledge between different technology nodes and
topologies. Moreover, inspired by the simple fact that circuit is a graph, we
learn on the circuit topology representation with graph convolutional neural
networks (GCN). The GCN-RL agent extracts features of the topology graph whose
vertices are transistors, edges are wires. Our learning-based optimization
consistently achieves the highest Figures of Merit (FoM) on four different
circuits compared with conventional black-box optimization methods (Bayesian
Optimization, Evolutionary Algorithms), random search, and human expert
designs. Experiments on transfer learning between five technology nodes and two
circuit topologies demonstrate that RL with transfer learning can achieve much
higher FoMs than methods without knowledge transfer. Our transferable
optimization method makes transistor sizing and design porting more effective
and efficient.Comment: Accepted to the 57th Design Automation Conference (DAC 2020); 6
pages, 8 figure
A PAC-Bayesian Analysis of Randomized Learning with Application to Stochastic Gradient Descent
We study the generalization error of randomized learning algorithms --
focusing on stochastic gradient descent (SGD) -- using a novel combination of
PAC-Bayes and algorithmic stability. Importantly, our generalization bounds
hold for all posterior distributions on an algorithm's random hyperparameters,
including distributions that depend on the training data. This inspires an
adaptive sampling algorithm for SGD that optimizes the posterior at runtime. We
analyze this algorithm in the context of our generalization bounds and evaluate
it on a benchmark dataset. Our experiments demonstrate that adaptive sampling
can reduce empirical risk faster than uniform sampling while also improving
out-of-sample accuracy.Comment: In Neural Information Processing Systems (NIPS) 2017. The latest
version specifies that the references to Kuzborskij & Lampert (2017) are for
v2 of their manuscript, which was posted to arXiv in March, 2017.
Importantly, Theorem 3 therein (a stability bound for convex losses) has a
different form than the final versio
Low-pass Recurrent Neural Networks - A memory architecture for longer-term correlation discovery
Reinforcement learning (RL) agents performing complex tasks must be able to
remember observations and actions across sizable time intervals. This is
especially true during the initial learning stages, when exploratory behaviour
can increase the delay between specific actions and their effects. Many new or
popular approaches for learning these distant correlations employ
backpropagation through time (BPTT), but this technique requires storing
observation traces long enough to span the interval between cause and effect.
Besides memory demands, learning dynamics like vanishing gradients and slow
convergence due to infrequent weight updates can reduce BPTT's practicality;
meanwhile, although online recurrent network learning is a developing topic,
most approaches are not efficient enough to use as replacements. We propose a
simple, effective memory strategy that can extend the window over which BPTT
can learn without requiring longer traces. We explore this approach empirically
on a few tasks and discuss its implications
- …