16 research outputs found
Learning Reward Machines in Cooperative Multi-Agent Tasks
This paper presents a novel approach to Multi-Agent Reinforcement Learning
(MARL) that combines cooperative task decomposition with the learning of reward
machines (RMs) encoding the structure of the sub-tasks. The proposed method
helps deal with the non-Markovian nature of the rewards in partially observable
environments and improves the interpretability of the learnt policies required
to complete the cooperative task. The RMs associated with each sub-task are
learnt in a decentralised manner and then used to guide the behaviour of each
agent. By doing so, the complexity of a cooperative multi-agent problem is
reduced, allowing for more effective learning. The results suggest that our
approach is a promising direction for future research in MARL, especially in
complex environments with large state spaces and multiple agents.Comment: Neuro-symbolic AI for Agent and Multi-Agent Systems Workshop at
AAMAS'2
Population-Based Reinforcement Learning for Combinatorial Optimization
Applying reinforcement learning (RL) to combinatorial optimization problems
is attractive as it removes the need for expert knowledge or pre-solved
instances. However, it is unrealistic to expect an agent to solve these (often
NP-)hard problems in a single shot at inference due to their inherent
complexity. Thus, leading approaches often implement additional search
strategies, from stochastic sampling and beam-search to explicit fine-tuning.
In this paper, we argue for the benefits of learning a population of
complementary policies, which can be simultaneously rolled out at inference. To
this end, we introduce Poppy, a simple theoretically grounded training
procedure for populations. Instead of relying on a predefined or hand-crafted
notion of diversity, Poppy induces an unsupervised specialization targeted
solely at maximizing the performance of the population. We show that Poppy
produces a set of complementary policies, and obtains state-of-the-art RL
results on three popular NP-hard problems: the traveling salesman (TSP), the
capacitated vehicle routing (CVRP), and 0-1 knapsack (KP) problems. On TSP
specifically, Poppy outperforms the previous state-of-the-art, dividing the
optimality gap by 5 while reducing the inference time by more than an order of
magnitude
Induction of Subgoal Automata for Reinforcement Learning
In this work we present ISA, a novel approach for learning and exploiting
subgoals in reinforcement learning (RL). Our method relies on inducing an
automaton whose transitions are subgoals expressed as propositional formulas
over a set of observable events. A state-of-the-art inductive logic programming
system is used to learn the automaton from observation traces perceived by the
RL agent. The reinforcement learning and automaton learning processes are
interleaved: a new refined automaton is learned whenever the RL agent generates
a trace not recognized by the current automaton. We evaluate ISA in several
gridworld problems and show that it performs similarly to a method for which
automata are given in advance. We also show that the learned automata can be
exploited to speed up convergence through reward shaping and transfer learning
across multiple tasks. Finally, we analyze the running time and the number of
traces that ISA needs to learn an automata, and the impact that the number of
observable events has on the learner's performance.Comment: Preprint accepted for publication to the 34th AAAI Conference on
Artificial Intelligence (AAAI-20
Collective adaptation through concurrent planning: the case of sustainable urban mobility
In this paper we address the challenges that impede collective adaptation in smart mobility systems by proposing a notion of ensembles. Ensembles enable systems with collective adaptability to be built as emergent aggregations of autonomous and self-adaptive agents. Adaptation in these systems is triggered by a run-time occurrence, which is known as an issue. The novel aspect of our approach is, it allows agents affected by an issue in the context of a smart mobility scenario to adapt collaboratively with minimal impact on their own preferences through an issue resolution process based on concurrent planning algorithms
Learning and Generalization in Atari Games
Treball de fi de grau en inform脿ticaTutor: Anders JonssonThis thesis describes the design of agents that learn to play Atari games using the Arcade Learning Environment (ALE) framework to interact with them. The application of machine learning in video games, given its high complexity, is considered to be a bridge towards real-world domains such as robotics. The goal in Atari games is to achieve the highest possible score. To solve this task, reinforcement learning and search techniques are used. These algorithms outperform humans in 30 of the 61 games supported by ALE. Since humans are very good at making generalizations between games, special emphasis is/ngiven to evaluating how well an agent learns from multiple games simultaneously. These experiments usually result in a higher score for specific pairs of games. Besides, there are games that tend to increase their score when playing with other games, whereas there are games that help others to perform better.Aquesta tesis descriu el disseny d'agents que aprenen a jugar a jocs d'Atari utilitzant el/nframework Arcade Learning Environment (ALE) per a interactuar amb ells. L'aplicaci贸/nd'aprenentatge autom脿tic en videojocs, donada la seva alta complexitat, es considera un/npont cap a dominis com la rob貌tica./nL'objectiu als jocs d'Atari 茅s aconseguir la major puntuaci贸 possible. Per a resoldre aquesta/ntasca, s'utilitzen t猫cniques d'aprenentatge per refor莽 i cerca. Aquests algoritmes superen/nals humans en 30 dels 61 jocs suportats per ALE./nCom els humans s贸n molt bons fent generalitzacions entre jocs, es fa especial 猫mfasi en/navaluar com un agent pot aprendre de m煤ltiples jocs jugats simult脿niament. Aquests/nexperiments solen resultar en una major puntuaci贸 per a parelles espec铆fiques de jocs. A/nm茅s, hi ha jocs que tendeixen a incrementar la seva puntuaci贸 quan juguen amb altres,/nmentre que tamb茅 hi ha jocs que ajuden a altres a actuar millor.Esta tesis describe el diseno de agentes que aprenden a jugar a juegos de Atari usando el/nframework Arcade Learning Environment (ALE) para interactuar con ellos. La aplicaci贸n/nde aprendizaje autom谩tico en videojuegos, dada su alta complejidad, se considera un/npuente hacia dominios como la rob贸tica./nEl objetivo en los juegos de Atari es conseguir la mayor puntuaci贸n posible. Para resolver/nesta tarea, se utilizan t茅cnicas de aprendizaje por refuerzo y b煤squeda. Estos algoritmos/nsuperan a los humanos en 30 de los 61 juegos soportados por ALE./nComo los humanos son muy buenos haciendo generalizaciones entre juegos, se hace especial 茅nfasis en evaluar c贸mo un agente puede aprender de m煤ltiples juegos jugados simult谩neamente. Estos experimentos suelen resultar en una mayor puntuaci贸n para pares/nespec铆ficos de juegos. Adem谩s, hay juegos que tienden a incrementar su puntuaci贸n cuando/njuegan con otros, mientras que tambi茅n hay juegos que ayudan a otros a actuar mejor
Resolution of concurrent planning problems using classical planning
Tutor: Anders JonssonTreball fi de m脿ster de: Master in Intelligent Interactive SystemsIn this work, we present new approaches for solving multiagent planning and temporal
planning problems. These planning forms are two types of concurrent planning,
where actions occur in parallel. The methods we propose rely on a compilation to
classical planning problems that can be solved using an off-the-shelf classical planner.
Then, the solutions can be converted back into multiagent or temporal solutions.
Our compilation for multiagent planning is able to generate concurrent actions that
satisfy a set of concurrency constraints. Furthermore, it avoids the exponential
blowup associated with concurrent actions, a problem that many multiagent planners
are facing nowadays. Incorporating similar ideas in temporal planning enables
us to generate temporal plans with simultaneous events, which most state-of-the-art
temporal planners cannot do.
In experiments, we compare our approaches to other approaches. We show that the
methods using transformations to classical planning are able to get better results
than state-of-the-art approaches for complex problems. In contrast, we also highlight
some of the drawbacks that this kind of methods have for both multiagent and
temporal planning.
We also illustrate how these methods can be applied to real world domains like the
smart mobility domain. In this domain, a group of vehicles and passengers must
self-adapt in order to reach their target positions. The adaptation process consists
in running a concurrent planning algorithm. The behavior of the approach is then
evaluated
Solving multiagent planning problems with concurrent conditional effects
Comunicaci贸 presentada al 33rd AAAI Conference on Artificial Intelligence, AAAI 2019, 31st Innovative Applications of Artificial Intelligence Conference, IAAI 2019 and the 9th AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, celebrat del 27 de gener a l'1 de febrer de 2019 a Palo Alta, EEUU.In this work we present a novel approach to solving concurrent multiagent planning problems in which several agents act in parallel. Our approach relies on a compilation from concurrent multiagent planning to classical planning, allowing us to use an off-the-shelf classical planner to solve the original multiagent problem. The solution can be directly interpreted as a concurrent plan that satisfies a given set of concurrency constraints, while avoiding the exponential blowup associated with concurrent actions. Our planner is the first to handle action effects that are conditional on what other agents are doing. Theoretically, we show that the compilation is sound and complete. Empirically, we show that our compilation can solve challenging multiagent planning problems that require concurrent actions.This work has been supported by the Maria de Maeztu Units of Excellence Programme (MDM-2015-0502). Anders Jonsson is partially supported by the grants TIN2015-67959 and PCIN-2017-082 of the Spanish Ministry of Science
Solving concurrent multiagent planning using classical planning
Comunicaci贸 presentada al 6th Workshop on Distributed and Multi-Agent Planning (DMAP 2018), celebrat durant la 28th International Conference on Automated Planning and Scheduling, els dies 24 a 29 de juny de 2018 a Delft, Pa茂sos Baixos.In this work we present a novel approach to solving concurrent multiagent planning problems in which several agents act in parallel. Our approach relies on a compilation from concurrent multiagent planning to classical planning, allowing us to use an off-the-shelf classical planner to solve the original multiagent problem. The solution can be directly interpreted as a concurrent plan that satisfies a given set of concurrency constraints, while avoiding the exponential blowup associated with concurrent actions. Theoretically, we show that the compilation is sound and complete. Empirically, we show that our compilation can solve challenging multiagent planning problems that require concurrent actions.This work has been supported by the Maria de Maeztu Units of Excellence Programme (MDM-2015-0502)
Solving concurrent multiagent planning using classical planning
Comunicaci贸 presentada al 6th Workshop on Distributed and Multi-Agent Planning (DMAP 2018), celebrat durant la 28th International Conference on Automated Planning and Scheduling, els dies 24 a 29 de juny de 2018 a Delft, Pa茂sos Baixos.In this work we present a novel approach to solving concurrent
multiagent planning problems in which several agents act
in parallel. Our approach relies on a compilation from concurrent
multiagent planning to classical planning, allowing us
to use an off-the-shelf classical planner to solve the original
multiagent problem. The solution can be directly interpreted
as a concurrent plan that satisfies a given set of concurrency
constraints, while avoiding the exponential blowup associated
with concurrent actions. Theoretically, we show that the compilation
is sound and complete. Empirically, we show that our
compilation can solve challenging multiagent planning problems
that require concurrent actions.This work has been supported by the Maria de Maeztu Units of Excellence Programme (MDM-2015-0502)
CARPooL: Collective Adaptation using concuRrent PLanning
Comunicaci贸 presentada a la 17th International Conference on Autonomous Agents and Multiagent Systems (AAMAS 2018), celebrada a Stockholm del 10 al 15 de juliol de 2018.In this paper we present the CARPooL demonstrator, an implementation
of a Collective Adaptation Engine (CAE) that addresses the
challenge of collective adaptation in the smart mobility domain.
CARPooL resolves adaptation issues via concurrent planning techniques.
It also allows to interact with the provided solutions by
adding new issues or analyzing the actions done by each agent.This work has been partially supported by the Maria de Maeztu Units of Excellence Programme (MDM-2015-0502)