34,763 research outputs found
AutoLoss: Learning Discrete Schedules for Alternate Optimization
Many machine learning problems involve iteratively and alternately optimizing
different task objectives with respect to different sets of parameters.
Appropriately scheduling the optimization of a task objective or a set of
parameters is usually crucial to the quality of convergence. In this paper, we
present AutoLoss, a meta-learning framework that automatically learns and
determines the optimization schedule. AutoLoss provides a generic way to
represent and learn the discrete optimization schedule from metadata, allows
for a dynamic and data-driven schedule in ML problems that involve alternating
updates of different parameters or from different loss objectives. We apply
AutoLoss on four ML tasks: d-ary quadratic regression, classification using a
multi-layer perceptron (MLP), image generation using GANs, and multi-task
neural machine translation (NMT). We show that the AutoLoss controller is able
to capture the distribution of better optimization schedules that result in
higher quality of convergence on all four tasks. The trained AutoLoss
controller is generalizable -- it can guide and improve the learning of a new
task model with different specifications, or on different datasets.Comment: 19-pages manuscripts. The first two authors contributed equall
On Ensuring that Intelligent Machines Are Well-Behaved
Machine learning algorithms are everywhere, ranging from simple data analysis
and pattern recognition tools used across the sciences to complex systems that
achieve super-human performance on various tasks. Ensuring that they are
well-behaved---that they do not, for example, cause harm to humans or act in a
racist or sexist way---is therefore not a hypothetical problem to be dealt with
in the future, but a pressing one that we address here. We propose a new
framework for designing machine learning algorithms that simplifies the problem
of specifying and regulating undesirable behaviors. To show the viability of
this new framework, we use it to create new machine learning algorithms that
preclude the sexist and harmful behaviors exhibited by standard machine
learning algorithms in our experiments. Our framework for designing machine
learning algorithms simplifies the safe and responsible application of machine
learning
Multiobjective Reinforcement Learning for Reconfigurable Adaptive Optimal Control of Manufacturing Processes
In industrial applications of adaptive optimal control often multiple
contrary objectives have to be considered. The weights (relative importance) of
the objectives are often not known during the design of the control and can
change with changing production conditions and requirements. In this work a
novel model-free multiobjective reinforcement learning approach for adaptive
optimal control of manufacturing processes is proposed. The approach enables
sample-efficient learning in sequences of control configurations, given by
particular objective weights.Comment: Conference, Preprint, 978-1-5386-5925-0/18/$31.00 \c{opyright} 2018
IEE
Some Considerations on Learning to Explore via Meta-Reinforcement Learning
We consider the problem of exploration in meta reinforcement learning. Two
new meta reinforcement learning algorithms are suggested: E-MAML and
E-. Results are presented on a novel environment we call `Krazy
World' and a set of maze environments. We show E-MAML and E-
deliver better performance on tasks where exploration is important
CLIC: Curriculum Learning and Imitation for object Control in non-rewarding environments
In this paper we study a new reinforcement learning setting where the
environment is non-rewarding, contains several possibly related objects of
various controllability, and where an apt agent Bob acts independently, with
non-observable intentions. We argue that this setting defines a realistic
scenario and we present a generic discrete-state discrete-action model of such
environments. To learn in this environment, we propose an unsupervised
reinforcement learning agent called CLIC for Curriculum Learning and Imitation
for Control. CLIC learns to control individual objects in its environment, and
imitates Bob's interactions with these objects. It selects objects to focus on
when training and imitating by maximizing its learning progress. We show that
CLIC is an effective baseline in our new setting. It can effectively observe
Bob to gain control of objects faster, even if Bob is not explicitly teaching.
It can also follow Bob when he acts as a mentor and provides ordered
demonstrations. Finally, when Bob controls objects that the agent cannot, or in
presence of a hierarchy between objects in the environment, we show that CLIC
ignores non-reproducible and already mastered interactions with objects,
resulting in a greater benefit from imitation
LIFT: Reinforcement Learning in Computer Systems by Learning From Demonstrations
Reinforcement learning approaches have long appealed to the data management
community due to their ability to learn to control dynamic behavior from raw
system performance. Recent successes in combining deep neural networks with
reinforcement learning have sparked significant new interest in this domain.
However, practical solutions remain elusive due to large training data
requirements, algorithmic instability, and lack of standard tools. In this
work, we introduce LIFT, an end-to-end software stack for applying deep
reinforcement learning to data management tasks. While prior work has
frequently explored applications in simulations, LIFT centers on utilizing
human expertise to learn from demonstrations, thus lowering online training
times. We further introduce TensorForce, a TensorFlow library for applied deep
reinforcement learning exposing a unified declarative interface to common RL
algorithms, thus providing a backend to LIFT. We demonstrate the utility of
LIFT in two case studies in database compound indexing and resource management
in stream processing. Results show LIFT controllers initialized from
demonstrations can outperform human baselines and heuristics across latency
metrics and space usage by up to 70%
Recommended from our members
Towards Informed Exploration for Deep Reinforcement Learning
In this thesis, we discuss various techniques for improving exploration for deep reinforcement learning. We begin with a brief review of reinforcement learning (RL) and the fundamental v.s. exploitation trade-off. Then we review how deep RL has improved upon classical and summarize six categories of the latest exploration methods for deep RL, in the order increasing usage of prior information. We then explore representative works in three categories discuss their strengths and weaknesses. The first category, represented by Soft Q-learning, uses regularization to encourage exploration. The second category, represented by count-based via hashing, maps states to hash codes for counting and assigns higher exploration to less-encountered states. The third category utilizes hierarchy and is represented by modular architecture for RL agents to play StarCraft II. Finally, we conclude that exploration by prior knowledge is a promising research direction and suggest topics of potentially impact
Deep Reinforcement Learning for Unsupervised Video Summarization with Diversity-Representativeness Reward
Video summarization aims to facilitate large-scale video browsing by
producing short, concise summaries that are diverse and representative of
original videos. In this paper, we formulate video summarization as a
sequential decision-making process and develop a deep summarization network
(DSN) to summarize videos. DSN predicts for each video frame a probability,
which indicates how likely a frame is selected, and then takes actions based on
the probability distributions to select frames, forming video summaries. To
train our DSN, we propose an end-to-end, reinforcement learning-based
framework, where we design a novel reward function that jointly accounts for
diversity and representativeness of generated summaries and does not rely on
labels or user interactions at all. During training, the reward function judges
how diverse and representative the generated summaries are, while DSN strives
for earning higher rewards by learning to produce more diverse and more
representative summaries. Since labels are not required, our method can be
fully unsupervised. Extensive experiments on two benchmark datasets show that
our unsupervised method not only outperforms other state-of-the-art
unsupervised methods, but also is comparable to or even superior than most of
published supervised approaches.Comment: AAAI 201
Deep Generative Models with Learnable Knowledge Constraints
The broad set of deep generative models (DGMs) has achieved remarkable
advances. However, it is often difficult to incorporate rich structured domain
knowledge with the end-to-end DGMs. Posterior regularization (PR) offers a
principled framework to impose structured constraints on probabilistic models,
but has limited applicability to the diverse DGMs that can lack a Bayesian
formulation or even explicit density evaluation. PR also requires constraints
to be fully specified a priori, which is impractical or suboptimal for complex
knowledge with learnable uncertain parts. In this paper, we establish
mathematical correspondence between PR and reinforcement learning (RL), and,
based on the connection, expand PR to learn constraints as the extrinsic reward
in RL. The resulting algorithm is model-agnostic to apply to any DGMs, and is
flexible to adapt arbitrary constraints with the model jointly. Experiments on
human image generation and templated sentence generation show models with
learned knowledge constraints by our algorithm greatly improve over base
generative models.Comment: Neural Information Processing Systems (NeurIPS) 201
Transferable Cost-Aware Security Policy Implementation for Malware Detection Using Deep Reinforcement Learning
Malware detection is an ever-present challenge for all organizational
gatekeepers, who must maintain high detection rates while minimizing
interruptions to the organization's workflow. To improve detection rates,
organizations often deploy an ensemble of detectors. While effective, this
approach is computationally expensive, since every file - even clear-cut cases
- needs to be analyzed by all detectors. Moreover, with an ever-increasing
number of files to process, the use of ensembles may incur unacceptable
processing times and costs (e.g., cloud resources). In this study, we propose
SPIREL, a reinforcement learning-based method for cost-effective malware
detection. Our method enables organizations to directly associate costs to
correct/incorrect classification, computing resources and run-time, and then
dynamically establishes a security policy. This security policy is then
implemented, and for each inspected file, a different set of detectors is
assigned and a different detection threshold is set. Our evaluation on two
malware domains- Portable Executable (PE) and Android Application Package
(APK)files - shows that SPIREL is both accurate and extremely
resource-efficient: the proposed method either outperforms the best performing
baselines while achieving a modest improvement in efficiency, or reduces the
required running time by ~80% while decreasing the accuracy and F1-score by
only 0.5%. We also show that our approach is both highly transferable across
different datasets and adaptable to changes in individual detector performance
- …