10 research outputs found

    Graph Exploration for Effective Multi-agent Q-Learning

    Full text link
    This paper proposes an exploration technique for multi-agent reinforcement learning (MARL) with graph-based communication among agents. We assume the individual rewards received by the agents are independent of the actions by the other agents, while their policies are coupled. In the proposed framework, neighbouring agents collaborate to estimate the uncertainty about the state-action space in order to execute more efficient explorative behaviour. Different from existing works, the proposed algorithm does not require counting mechanisms and can be applied to continuous-state environments without requiring complex conversion techniques. Moreover, the proposed scheme allows agents to communicate in a fully decentralized manner with minimal information exchange. And for continuous-state scenarios, each agent needs to exchange only a single parameter vector. The performance of the algorithm is verified with theoretical results for discrete-state scenarios and with experiments for continuous ones

    Joint Intrinsic Motivation for Coordinated Exploration in Multi-Agent Deep Reinforcement Learning

    Full text link
    Multi-agent deep reinforcement learning (MADRL) problems often encounter the challenge of sparse rewards. This challenge becomes even more pronounced when coordination among agents is necessary. As performance depends not only on one agent's behavior but rather on the joint behavior of multiple agents, finding an adequate solution becomes significantly harder. In this context, a group of agents can benefit from actively exploring different joint strategies in order to determine the most efficient one. In this paper, we propose an approach for rewarding strategies where agents collectively exhibit novel behaviors. We present JIM (Joint Intrinsic Motivation), a multi-agent intrinsic motivation method that follows the centralized learning with decentralized execution paradigm. JIM rewards joint trajectories based on a centralized measure of novelty designed to function in continuous environments. We demonstrate the strengths of this approach both in a synthetic environment designed to reveal shortcomings of state-of-the-art MADRL methods, and in simulated robotic tasks. Results show that joint exploration is crucial for solving tasks where the optimal strategy requires a high level of coordination.Comment: 13 pages, 13 figures. Published as an extended abstract at AAMAS 202

    Decentralized Unknown Building Exploration by Frontier Incentivization and Voronoi Segmentation in a Communication Restricted Domain

    Get PDF
    Exploring unknown environments using multiple robots poses a complex challenge, particularly in situations where communication between robots is either impossible or limited. Existing exploration techniques exhibit research gaps due to unrealistic communication assumptions or the computational complexities associated with exploration strategies in unfamiliar domains. In our investigation of multi-robot exploration in unknown areas, we employed various exploration and coordination techniques, evaluating their performance in terms of robustness and efficiency across different levels of environmental complexity. Our research is centered on optimizing the exploration process through strategic agent distribution. We initially address the challenge of city roadway coverage, aiming to minimize the travel distance of each agent in a scenario involving multiple agents to enhance overall system efficiency. To achieve this, we partition the city into subregions. and utilize Voronoi relaxation to optimize the size of postman distances for these subregions. This technique highlights the essential elements of an efficient city exploration. Expanding our exploration techniques to unknown buildings, we develop strategies tailored to this specific domain. After a careful evaluation of various exploration techniques, we introduce another goal selection strategy, Unknown Closest. This strategy combines the advantages of a greedy approach with the improved dispersal of agents, achieved through the randomization effect of a larger goal set. We further assess the exploration techniques in environments with restricted communication, presenting upper coordination mechanisms such as frontier incentivization and area segmentation. These methods enhance exploration performance by promoting independence and implicit coordination among agents. Our simulations demonstrate the successful application of these techniques in various complexity of interiors. In summary, this dissertation offers solutions for multi-robot exploration in unknown domains, paving the way for more efficient, cost-effective, and adaptable exploration strategies. Our findings have significant implications for various fields, ranging from autonomous city-wide monitoring to the exploration of hazardous interiors, where time-efficient exploration is crucial

    Multi-Robot Coverage Path Planning for Inspection of Offshore Wind Farms: A Review

    Get PDF
    Offshore wind turbine (OWT) inspection research is receiving increasing interest as the sector grows worldwide. Wind farms are far from emergency services and experience extreme weather and winds. This hazardous environment lends itself to unmanned approaches, reducing human exposure to risk. Increasing automation in inspections can reduce human effort and financial costs. Despite the benefits, research on automating inspection is sparse. This work proposes that OWT inspection can be described as a multi-robot coverage path planning problem. Reviews of multi-robot coverage exist, but to the best of our knowledge, none captures the domain-specific aspects of an OWT inspection. In this paper, we present a review on the current state of the art of multi-robot coverage to identify gaps in research relating to coverage for OWT inspection. To perform a qualitative study, the PICo (population, intervention, and context) framework was used. The retrieved works are analysed according to three aspects of coverage approaches: environmental modelling, decision making, and coordination. Based on the reviewed studies and the conducted analysis, candidate approaches are proposed for the structural coverage of an OWT. Future research should involve the adaptation of voxel-based ray-tracing pose generation to UAVs and exploration, applying semantic labels to tasks to facilitate heterogeneous coverage and semantic online task decomposition to identify the coverage target during the run time.</jats:p

    Recent Advances in Multi Robot Systems

    Get PDF
    To design a team of robots which is able to perform given tasks is a great concern of many members of robotics community. There are many problems left to be solved in order to have the fully functional robot team. Robotics community is trying hard to solve such problems (navigation, task allocation, communication, adaptation, control, ...). This book represents the contributions of the top researchers in this field and will serve as a valuable tool for professionals in this interdisciplinary field. It is focused on the challenging issues of team architectures, vehicle learning and adaptation, heterogeneous group control and cooperation, task selection, dynamic autonomy, mixed initiative, and human and robot team interaction. The book consists of 16 chapters introducing both basic research and advanced developments. Topics covered include kinematics, dynamic analysis, accuracy, optimization design, modelling, simulation and control of multi robot systems

    Towards coordinated multi-agent exploration problem via segmentation and reinforcement learning

    No full text
    Exploring an unknown environment by multiple autonomous robots is a major challenge in the robotics domain. The robot or agent needs to incrementally construct a model or a map representation of the environment while performing its domain tasks like surveillance, search and rescue tasks, and cleaning. What the robot should do or where it should go to visit next can only be determined after the map is constructed at least partially. The typical approach is by taking a frontier point which is located in the boundary between a known area and an unknown region as the target location to visit. This point is selected from other frontiers as revealed whenever the robots observe the environment. However, when multiple robots are involved, the task becomes more challenging as they have to explore the unknown environment as efficient and fast as possible while avoiding conflicts or interferences among the robots that can reduce the efficiency. Although coordinating a team of autonomous robots to explore an unknown environment can be done in an efficient way, partitioning the map of the environment into separate regions or segments as the targets allocated to the robots to visit is an efficient approach. The partitioning must be performed continually and incrementally. There is a trade-off that generating many small segments can provide more details of the environment, but may lose the representation of larger areas that are useful and relevant to the exploration task at hand. A Hierarchical Adaptive Clustering (HAC) segmentation of the indoor environment is introduced in this thesis that can strike a balance between fine-grained clustering and generalized segmentation during the exploration. With the HAC approach, an effective multi-agent task allocation approach is developed, wherein the partitioning and allocation processes can be performed continually and incrementally in real-time. Experimental results on HAC-based exploration method shows that it is comparable with other state-of-the-art approaches including Frontier-based allocation and Voronoi-based exploration. The model outperforms the others in terms of meaningful topological clusters and efficient exploration. However, non-learning based methods usually employ a fixed strategy to allocate the robots or agents to explore selected locations that sometimes can not handle the unpredictable and dynamic situations well. These methods can be effective in a single robot case, but assigning multiple robots to explore different locations is challenging since individual robots may interfere with others, making the overall tasks less efficient. A learning-based approach is proposed to solve those issues in this thesis. The algorithm is called CNN-based Multi-agent Proximal Policy Optimization (CMAPPO), which is for allocating multiple robots to explore different environments while over time improving their strategies to allocate the tasks more efficiently and flexibly. This algorithm combines CNN to process multi-channel visual inputs from the observed environment, curriculum learning for improving learning efficiency, and PPO algorithm for motivation based reinforcement learning. Based on the evaluation, the CMAPPO can learn a more efficient strategy for multiple robots (the robot is named agent in the rest of this thesis) to explore the environment than the conventional frontier-based method. This thesis introduces a novel indoor space segmentation-based exploration method which is based on topological clusters of an enclosed environment to perform multi-agent exploration. Considering the dynamic situations in the environment, this thesis further develops a new end-to-end deep reinforcement learning architecture for multi-agent exploration strategy by using Convolutional Neural Network (CNN) and Proximal Policy Optimization (PPO).Master of Engineerin
    corecore