1,319 research outputs found
Macro action selection with deep reinforcement learning in StarCraft
StarCraft (SC) is one of the most popular and successful Real Time Strategy
(RTS) games. In recent years, SC is also widely accepted as a challenging
testbed for AI research because of its enormous state space, partially observed
information, multi-agent collaboration, and so on. With the help of annual
AIIDE and CIG competitions, a growing number of SC bots are proposed and
continuously improved. However, a large gap remains between the top-level bot
and the professional human player. One vital reason is that current SC bots
mainly rely on predefined rules to select macro actions during their games.
These rules are not scalable and efficient enough to cope with the enormous yet
partially observed state space in the game. In this paper, we propose a deep
reinforcement learning (DRL) framework to improve the selection of macro
actions. Our framework is based on the combination of the Ape-X DQN and the
Long-Short-Term-Memory (LSTM). We use this framework to build our bot, named as
LastOrder. Our evaluation, based on training against all bots from the AIIDE
2017 StarCraft AI competition set, shows that LastOrder achieves an 83% winning
rate, outperforming 26 bots in total 28 entrants
Master-slave Deep Architecture for Top-K Multi-armed Bandits with Non-linear Bandit Feedback and Diversity Constraints
We propose a novel master-slave architecture to solve the top-
combinatorial multi-armed bandits problem with non-linear bandit feedback and
diversity constraints, which, to the best of our knowledge, is the first
combinatorial bandits setting considering diversity constraints under bandit
feedback. Specifically, to efficiently explore the combinatorial and
constrained action space, we introduce six slave models with distinguished
merits to generate diversified samples well balancing rewards and constraints
as well as efficiency. Moreover, we propose teacher learning based optimization
and the policy co-training technique to boost the performance of the multiple
slave models. The master model then collects the elite samples provided by the
slave models and selects the best sample estimated by a neural contextual
UCB-based network to make a decision with a trade-off between exploration and
exploitation. Thanks to the elaborate design of slave models, the co-training
mechanism among slave models, and the novel interactions between the master and
slave models, our approach significantly surpasses existing state-of-the-art
algorithms in both synthetic and real datasets for recommendation tasks. The
code is available at:
\url{https://github.com/huanghanchi/Master-slave-Algorithm-for-Top-K-Bandits}.Comment: IEEE Transactions on Neural Networks and Learning System
Macro action selection with deep reinforcement learning in StarCraft
StarCraft (SC) is one of the most popular and successful Real Time Strategy
(RTS) games. In recent years, SC is also considered as a testbed for AI
research, due to its enormous state space, hidden information, multi-agent
collaboration and so on. Thanks to the annual AIIDE and CIG competitions, a
growing number of bots are proposed and being continuously improved. However, a
big gap still remains between the top bot and the professional human players.
One vital reason is that current bots mainly rely on predefined rules to
perform macro actions. These rules are not scalable and efficient enough to
cope with the large but partially observed macro state space in SC. In this
paper, we propose a DRL based framework to do macro action selection. Our
framework combines the reinforcement learning approach Ape-X DQN with
Long-Short-Term-Memory (LSTM) to improve the macro action selection in bot. We
evaluate our bot, named as LastOrder, on the AIIDE 2017 StarCraft AI
competition bots set. Our bot achieves overall 83% win-rate, outperforming 26
bots in total 28 entrants
- …