1,353 research outputs found
Solving Continual Combinatorial Selection via Deep Reinforcement Learning
We consider the Markov Decision Process (MDP) of selecting a subset of items
at each step, termed the Select-MDP (S-MDP). The large state and action spaces
of S-MDPs make them intractable to solve with typical reinforcement learning
(RL) algorithms especially when the number of items is huge. In this paper, we
present a deep RL algorithm to solve this issue by adopting the following key
ideas. First, we convert the original S-MDP into an Iterative Select-MDP
(IS-MDP), which is equivalent to the S-MDP in terms of optimal actions. IS-MDP
decomposes a joint action of selecting K items simultaneously into K iterative
selections resulting in the decrease of actions at the expense of an
exponential increase of states. Second, we overcome this state space explo-sion
by exploiting a special symmetry in IS-MDPs with novel weight shared
Q-networks, which prov-ably maintain sufficient expressive power. Various
experiments demonstrate that our approach works well even when the item space
is large and that it scales to environments with item spaces different from
those used in training.Comment: Accepted to IJCAI 2019,14 pages,8 figure
Sparse Training Theory for Scalable and Efficient Agents
A fundamental task for artificial intelligence is learning. Deep Neural
Networks have proven to cope perfectly with all learning paradigms, i.e.
supervised, unsupervised, and reinforcement learning. Nevertheless, traditional
deep learning approaches make use of cloud computing facilities and do not
scale well to autonomous agents with low computational resources. Even in the
cloud, they suffer from computational and memory limitations, and they cannot
be used to model adequately large physical worlds for agents which assume
networks with billions of neurons. These issues are addressed in the last few
years by the emerging topic of sparse training, which trains sparse networks
from scratch. This paper discusses sparse training state-of-the-art, its
challenges and limitations while introducing a couple of new theoretical
research directions which has the potential of alleviating sparse training
limitations to push deep learning scalability well beyond its current
boundaries. Nevertheless, the theoretical advancements impact in complex
multi-agents settings is discussed from a real-world perspective, using the
smart grid case study
Parameterizing Branch-and-Bound Search Trees to Learn Branching Policies
Branch and Bound (B&B) is the exact tree search method typically used to
solve Mixed-Integer Linear Programming problems (MILPs). Learning branching
policies for MILP has become an active research area, with most works proposing
to imitate the strong branching rule and specialize it to distinct classes of
problems. We aim instead at learning a policy that generalizes across
heterogeneous MILPs: our main hypothesis is that parameterizing the state of
the B&B search tree can aid this type of generalization. We propose a novel
imitation learning framework, and introduce new input features and
architectures to represent branching. Experiments on MILP benchmark instances
clearly show the advantages of incorporating an explicit parameterization of
the state of the search tree to modulate the branching decisions, in terms of
both higher accuracy and smaller B&B trees. The resulting policies
significantly outperform the current state-of-the-art method for "learning to
branch" by effectively allowing generalization to generic unseen instances.Comment: AAAI 2021 camera-ready version with supplementary materials, improved
readability of figures in main article. Code, data and trained models are
available at https://github.com/ds4dm/branch-search-tree
- …