2 research outputs found
A Reinforcement Learning Framework for Sequencing Multi-Robot Behaviors
Given a list of behaviors and associated parameterized controllers for
solving different individual tasks, we study the problem of selecting an
optimal sequence of coordinated behaviors in multi-robot systems for completing
a given mission, which could not be handled by any single behavior. In
addition, we are interested in the case where partial information of the
underlying mission is unknown, therefore, the robots must cooperatively learn
this information through their course of actions. Such problem can be
formulated as an optimal decision problem in multi-robot systems, however, it
is in general intractable due to modeling imperfections and the curse of
dimensionality of the decision variables. To circumvent these issues, we first
consider an alternate formulation of the original problem through introducing a
sequence of behaviors' switching times. Our main contribution is then to
propose a novel reinforcement learning based method, that combines Q-learning
and online gradient descent, for solving this reformulated problem. In
particular, the optimal sequence of the robots' behaviors is found by using
Q-learning while the optimal parameters of the associated controllers are
obtained through an online gradient descent method. Finally, to illustrate the
effectiveness of our proposed method we implement it on a team of
differential-drive robots for solving two different missions, namely, convoy
protection and object manipulation.Comment: 6 page
Resilient Monitoring in Heterogeneous Multi-robot Systems through Network Reconfiguration
We propose a framework for resilience in a networked heterogeneous
multi-robot team subject to resource failures. Each robot in the team is
equipped with resources that it shares with its neighbors. Additionally, each
robot in the team executes a task, whose performance depends on the resources
to which it has access. When a resource on a particular robot becomes
unavailable (\eg a camera ceases to function), the team optimally reconfigures
its communication network so that the robots affected by the failure can
continue their tasks. We focus on a monitoring task, where robots individually
estimate the state of an exogenous process. We encode the end-to-end effect of
a robot's resource loss on the monitoring performance of the team by defining a
new stronger notion of observability -- \textit{one-hop observability}. By
abstracting the impact that {low-level} individual resources have on the task
performance through the notion of one-hop observability, our framework leads to
the principled reconfiguration of information flow in the team to effectively
replace the lost resource on one robot with information from another, as long
as certain conditions are met. Network reconfiguration is converted to the
problem of selecting edges to be modified in the system's communication graph
after a resource failure has occurred. A controller based on finite-time
convergence control barrier functions drives each robot to a spatial location
that enables the communication links of the modified graph. We validate the
effectiveness of our framework by deploying it on a team of differential-drive
robots estimating the position of a group of quadrotors.Comment: 12 pages, 5 figure