37 research outputs found

    A study of FMQ heuristic in cooperative multi-agent games.

    No full text
    International audienceThe article focuses on decentralized reinforcement learning (RL) in cooperative multi-agent games, where a team of independent learning agents (ILs) try to coordinate their individual actions to reach an optimal joint action. Within this framework, some algorithms based on Q-learning are proposed in recent works. Especially, we are interested in Distributed Q-learning which finds optimal policies in deterministic games, and in the Frequency Maximum Q value (FMQ) heuristic which is able in partially stochastic matrix games to distinguish if a poor reward received for the same action are due to either miscoordination or to the noisy reward function. Making this distinction is one of the main difficulties to solve stochastic games. Our objective is to find an algorithm able to switch over the updates according to a detection of the cause of noise. In this paper, a modified version of the FMQ heuristic is proposed which achieves this detection and the update adaptation. Moreover, this modified FMQ version is more robust and very easy to set

    Towards Collaborative Plan Acquisition through Theory of Mind Modeling in Situated Dialogue

    Full text link
    Collaborative tasks often begin with partial task knowledge and incomplete initial plans from each partner. To complete these tasks, agents need to engage in situated communication with their partners and coordinate their partial plans towards a complete plan to achieve a joint task goal. While such collaboration seems effortless in a human-human team, it is highly challenging for human-AI collaboration. To address this limitation, this paper takes a step towards collaborative plan acquisition, where humans and agents strive to learn and communicate with each other to acquire a complete plan for joint tasks. Specifically, we formulate a novel problem for agents to predict the missing task knowledge for themselves and for their partners based on rich perceptual and dialogue history. We extend a situated dialogue benchmark for symmetric collaborative tasks in a 3D blocks world and investigate computational strategies for plan acquisition. Our empirical results suggest that predicting the partner's missing knowledge is a more viable approach than predicting one's own. We show that explicit modeling of the partner's dialogue moves and mental states produces improved and more stable results than without. These results provide insight for future AI agents that can predict what knowledge their partner is missing and, therefore, can proactively communicate such information to help their partner acquire such missing knowledge toward a common understanding of joint tasks

    Distributed reinforcement learning for self-reconfiguring modular robots

    Get PDF
    Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2007.This electronic version was submitted by the student author. The certified thesis is available in the Institute Archives and Special Collections.Includes bibliographical references (p. 101-106).In this thesis, we study distributed reinforcement learning in the context of automating the design of decentralized control for groups of cooperating, coupled robots. Specifically, we develop a framework and algorithms for automatically generating distributed controllers for self-reconfiguring modular robots using reinforcement learning. The promise of self-reconfiguring modular robots is that of robustness, adaptability and versatility. Yet most state-of-the-art distributed controllers are laboriously handcrafted and task-specific, due to the inherent complexities of distributed, local-only control. In this thesis, we propose and develop a framework for using reinforcement learning for automatic generation of such controllers. The approach is profitable because reinforcement learning methods search for good behaviors during the lifetime of the learning agent, and are therefore applicable to online adaptation as well as automatic controller design. However, we must overcome the challenges due to the fundamental partial observability inherent in a distributed system such as a self reconfiguring modular robot. We use a family of policy search methods that we adapt to our distributed problem. The outcome of a local search is always influenced by the search space dimensionality, its starting point, and the amount and quality of available exploration through experience.(cont) We undertake a systematic study of the effects that certain robot and task parameters, such as the number of modules, presence of exploration constraints, availability of nearest-neighbor communications, and partial behavioral knowledge from previous experience, have on the speed and reliability of learning through policy search in self-reconfiguring modular robots. In the process, we develop novel algorithmic variations and compact search space representations for learning in our domain, which we test experimentally on a number of tasks. This thesis is an empirical study of reinforcement learning in a simulated lattice based self-reconfiguring modular robot domain. However, our results contribute to the broader understanding of automatic generation of group control and design of distributed reinforcement learning algorithms.by Paulina Varshavskaya.Ph.D

    White Paper 11: Artificial intelligence, robotics & data science

    Get PDF
    198 p. : 17 cmSIC white paper on Artificial Intelligence, Robotics and Data Science sketches a preliminary roadmap for addressing current R&D challenges associated with automated and autonomous machines. More than 50 research challenges investigated all over Spain by more than 150 experts within CSIC are presented in eight chapters. Chapter One introduces key concepts and tackles the issue of the integration of knowledge (representation), reasoning and learning in the design of artificial entities. Chapter Two analyses challenges associated with the development of theories –and supporting technologies– for modelling the behaviour of autonomous agents. Specifically, it pays attention to the interplay between elements at micro level (individual autonomous agent interactions) with the macro world (the properties we seek in large and complex societies). While Chapter Three discusses the variety of data science applications currently used in all fields of science, paying particular attention to Machine Learning (ML) techniques, Chapter Four presents current development in various areas of robotics. Chapter Five explores the challenges associated with computational cognitive models. Chapter Six pays attention to the ethical, legal, economic and social challenges coming alongside the development of smart systems. Chapter Seven engages with the problem of the environmental sustainability of deploying intelligent systems at large scale. Finally, Chapter Eight deals with the complexity of ensuring the security, safety, resilience and privacy-protection of smart systems against cyber threats.18 EXECUTIVE SUMMARY ARTIFICIAL INTELLIGENCE, ROBOTICS AND DATA SCIENCE Topic Coordinators Sara Degli Esposti ( IPP-CCHS, CSIC ) and Carles Sierra ( IIIA, CSIC ) 18 CHALLENGE 1 INTEGRATING KNOWLEDGE, REASONING AND LEARNING Challenge Coordinators Felip Manyà ( IIIA, CSIC ) and Adrià Colomé ( IRI, CSIC – UPC ) 38 CHALLENGE 2 MULTIAGENT SYSTEMS Challenge Coordinators N. Osman ( IIIA, CSIC ) and D. López ( IFS, CSIC ) 54 CHALLENGE 3 MACHINE LEARNING AND DATA SCIENCE Challenge Coordinators J. J. Ramasco Sukia ( IFISC ) and L. Lloret Iglesias ( IFCA, CSIC ) 80 CHALLENGE 4 INTELLIGENT ROBOTICS Topic Coordinators G. Alenyà ( IRI, CSIC – UPC ) and J. Villagra ( CAR, CSIC ) 100 CHALLENGE 5 COMPUTATIONAL COGNITIVE MODELS Challenge Coordinators M. D. del Castillo ( CAR, CSIC) and M. Schorlemmer ( IIIA, CSIC ) 120 CHALLENGE 6 ETHICAL, LEGAL, ECONOMIC, AND SOCIAL IMPLICATIONS Challenge Coordinators P. Noriega ( IIIA, CSIC ) and T. Ausín ( IFS, CSIC ) 142 CHALLENGE 7 LOW-POWER SUSTAINABLE HARDWARE FOR AI Challenge Coordinators T. Serrano ( IMSE-CNM, CSIC – US ) and A. Oyanguren ( IFIC, CSIC - UV ) 160 CHALLENGE 8 SMART CYBERSECURITY Challenge Coordinators D. Arroyo Guardeño ( ITEFI, CSIC ) and P. Brox Jiménez ( IMSE-CNM, CSIC – US )Peer reviewe

    Metaphor-based negotiation and its application in AGV movement planning

    Get PDF
    The theme of this thesis is "metaphor-based negotiation". By metaphor-based negotiation I mean a category of approaches for problem-solving in Distributed Artificial Intelligence (DAI) that mimic some aspects of human negotiation behaviour. The research in this dissertation is divided into two closely related parts. Cooperative interaction among agents in a multiagent system (MAS) is discussed in general, and the discussion leads to a formal definition of metaphor-based negotiation. Then, as a specific application, a "spring-based" computational model for metaphor-based negotiation is developed as an approach to solving movement planning, specifically the AGV scheduling problem (AGVSP) — determing the timings of AGVs' activities, of automated guided vehicles (AGVs) in a factory.By formally addressing the multi-agent cooperative interaction problem and assuming that agents in a MAS are rational, benevolent and fully informed, an initial strategy set of cooperative interaction can be reduced to a strategy set by eliminating strategies that are irrational in a group sense. However, it is proved in this dissertation that, in the remaining strategy set, no unique strategy can be found that is acceptable to all agents according their individual preferences. More specifically, in this smaller strategy set, if one agent moves from one strategy to another in an attempt to better its individual goal achievement, then there is at least one agent whose goal achievement will be negatively affected by such a move. So, the cooperative interaction problem can only be partially solved if no further knowledge is given to those agents. The idea of a common sense principle is introduced in this dissertation to overcome the deficiencies of the assumptions of rationality, benevolence and full-informedness.In reality, the assumption of full-informedness of agents may not be practical. Communication is needed for agents to (1) exchange their local problem solving information, and (2) exchange proposals for global problem solving, when their views are in conflict. Based on the discussion of cooperative interaction, a formal definition of metaphorbased negotiation is proposed to formally indicate what is a proposal and what is the condition for accepting a proposal from another agent. In this definition, the common sense principle is one of the most important features, not found in definitions of negotiation available so far in the literature, which guides agents to find an agreement when negotiation is running into difficulties.The AGVSP involves timing activities for each AGV in a AGV-based factory. The AGVSP is naturally distributed: the whole problem can be easily divided into several subproblems each of which involves timing of activities of one AGV. Therefore, it is intuitively straightforward for us to seek DAI approaches to solving the AGVSP. In spired by Kwa's Iterative Negotiation Model [Kwa 88b] [Kwa 88a] for the AGVSP, we developed a spring-based (metaphor-based) negotiation model for the AGVSP to overcome some vital problems in Kwa's model. The idea of the spring-based negotiation model is described below:The AGVSP can be regarded as a Distributed Constraint Satisfaction Problem (DCSP) and solved in a MAS. Each agent in the MAS is designed to solve a subproblem — a local scheduling problem which is a small Constraint Satisfaction Problem (CSP). Conflicts exist when intra-agent constraints or inter-agent constraints are violated. These constraints can be classified into hard constraints— those that can not be relaxed at the agent level unless the system designer permits (e.g., by providing an arbitrator), and soft constraints — those that can be relaxed at the agent level when necessary. When agents are in conflict, i.e, when some inter-agent constraints are violated (or say, when one agent's timings of its activities overlap those of some other agents), these agents involved will resolve the conflicts through a (metaphor-based) negotiation procedure in which conflicts will be gradually resolved by each agent's relaxation of its intra-agent constraints, i.e, by yielding some amount of its initially allocated resources to other agents or by shifting its initially allocated resources. The negotiation can be viewed as a process of exchanging proposals (of cooperative strategies) between conflicting agents, where a cooperative strategy is a possible resolution to a conflict according to the viewpoint of the proposing agent. However, since agents are designed to be rational, each agent that is involved in the conflicts will try hard to relax its intra-agent constraints as little as possible. Further, it is reasonably acceptable that the more an intra-agent constraint has been relaxed the less the respective agent is willing to relax it further. This feature can be modeled by a spring — the more it has been compressed the harder it is to compress it further. Based on this inspiration, a spring-based computational model of metaphor-based negotiation is proposed: each agent's local schedule is represented by a local spring network in which each spring element represents a soft intra-agent constraint. Relaxation of an intra-agent constraint is likened to a spring being compressed by external forces from other agents. As a consequence, the compressed spring will also show a reacting force upon those compressing agents. An agreement will be reached when those forces and reacting forces are balanced. This is the common sense principle in the spring-based negotiation. The model solves some key issues, e.g., how to select negotiation techniques and skills during the process of negotiation, that have not been solved by Kwa's iterative negotiation model. Some experimental evidence of the value of this model is presented

    Eyes in the sky: multi-drones surveillance technology

    Get PDF
    Neste projeto pretende-se desenvolver uma rede de segurança baseada no trabalho cooperativo entre vários UAVs. Sabendo que os UAVs podem variar na sua autonomia, velocidade de voo, estabilidade e muitos outros fatores, será feito um estudo onde tentaremos potenciar as melhores características para a rede de segurança a desenvolver. Em simultâneo com este estudo serão aplicados algoritmos de controlo de distribuição aos vários agentes para que a cobertura da área seja máxima. O resultado final esperado deste projeto é conseguir criar um miniprograma capaz de comunicar com vários agentes de patrulha, receber as suas localizações, calcular as suas posições ideais ou, no caso de não conseguirem cobrir por completo a área, calcular uma rota de patrulha e, enviar as informações calculadas. Esperamos também que este programa possa ser usado em simulação e se possível no terreno.In this project, we will develop a security network based on the cooperation between several UAVs. Knowing that UAV's autonomy, speed, stability and many other factors, a study will be made where we will leverage the best characteristics for our goals. Simultaneously, we will design and apply a coverage algorithm to control the distribution of the agents in the area to maximize their coverage. As result of this project we wish to have a mini-program capable of communicate with several agents, read their locations, calculate their optimal positions or patrolling routes, if they can't cover all the area with their sensor range, and send them the information needed. We also want this program to be at least simulated and if possible on the field

    Analysis of radiofrequency-based methods for position and velocity determination of autonomous robots in lunar surface exploration missions

    Get PDF
    The use of distributed systems has been disruptive in almost any industrial sector, from manufacturing to processing plants from environmental monitoring to vehicle control, and many more. It is therefore natural to assess the benefits that such an advantageous engineering paradigm could bring to space exploration. In recent years, we have been witness to the emergence of concepts such as fractionated satellite systems, formation flying, megaconstellations, and femtoswarms. Most of these space missions have evolved from the idea of a decentralization of processes that were formerly performed in platforms conceived as monolithic systems. The application of this concept to robotic systems is not new, and a great deal of scientific contributions on multi-robot systems exists, focusing on different aspects such as cooperative robotics, behavioural or reactive control, distributed artificial intelligence, swarm multi-agent systems etc. The intrinsic advantages of distribution (improved reliability and efficiency, higher robustness, etc.) has been boosted by the exponential growing of computational power density and a simultaneous miniaturization of technology, leading to smaller and more powerful robotic platforms, which could make a distributed robotic system, made of small robotic agents, a powerful substitute to classical large robotic platforms. This thesis proposes, in the framework of multi-robot systems, a localization method for robotic agents in planetary surface exploration scenarios based on RF range and Doppler frequency shift analysis. The relevance of spatial localization awareness in agents belonging to a distributed robotic system is defined in the context of the advantages of robotic exploration. Different range determination techniques and, specifically, the advantages of including Doppler Effect in the determination of the relative position within the robotic system deployed are considered and the strengths and weaknesses analysed accordingly. Special attention is devoted to the noise sources present in the lunar environment, related to a practical (i.e. non-ideal) implementation architecture and its influence on the system performance. From this point of view, we develop a theoretical model for localization accuracy estimation, generated from power spectrum characteristics, in accordance with the system architecture proposed, and consolidated with numerical simulations and a parametrical assessment on a set of real references of components playing a key role in the overall performance. The selected system architecture is then implemented in a representative set-up and tested under laboratory conditions. Algorithms used for carrier frequency generation and frequency measurement are developed, applied and tested in the hardware-on-the-loop breadboard. The results show that Doppler frequency component can be measured with the proposed architecture, yielding a high sensitivity in the determination of relative speed even at standard communication frequencies (UHF), and improving significantly at higher bands (S, C, etc.). This enables the possibility of adding relative speed to relative position determination via sensor fusion techniques, improving the response time and accuracy during navigation through the exploration scenario

    Hierarchical reinforcement learning for trading agents

    Get PDF
    Autonomous software agents, the use of which has increased due to the recent growth in computer power, have considerably improved electronic commerce processes by facilitating automated trading actions between the market participants (sellers, brokers and buyers). The rapidly changing market environments pose challenges to the performance of such agents, which are generally developed for specific market settings. To this end, this thesis is concerned with designing agents that can gradually adapt to variable, dynamic and uncertain markets and that are able to reuse the acquired trading skills in new markets. This thesis proposes the use of reinforcement learning techniques to develop adaptive trading agents and puts forward a novel software architecture based on the semi-Markov decision process and on an innovative knowledge transfer framework. To evaluate my approach, the developed trading agents are tested in internationally well-known market simulations and their behaviours when buying or/and selling in the retail and wholesale markets are analysed. The proposed approach has been shown to improve the adaptation of the trading agent in a specific market as well as to enable the portability of the its knowledge in new markets

    Deep Learning for Safe Autonomous Driving: Current Challenges and Future Directions

    Full text link
    [EN] Advances in information and signal processing technologies have a significant impact on autonomous driving (AD), improving driving safety while minimizing the efforts of human drivers with the help of advanced artificial intelligence (AI) techniques. Recently, deep learning (DL) approaches have solved several real-world problems of complex nature. However, their strengths in terms of control processes for AD have not been deeply investigated and highlighted yet. This survey highlights the power of DL architectures in terms of reliability and efficient real-time performance and overviews state-of-the-art strategies for safe AD, with their major achievements and limitations. Furthermore, it covers major embodiments of DL along the AD pipeline including measurement, analysis, and execution, with a focus on road, lane, vehicle, pedestrian, drowsiness detection, collision avoidance, and traffic sign detection through sensing and vision-based DL methods. In addition, we discuss on the performance of several reviewed methods by using different evaluation metrics, with critics on their pros and cons. Finally, this survey highlights the current issues of safe DL-based AD with a prospect of recommendations for future research, rounding up a reference material for newcomers and researchers willing to join this vibrant area of Intelligent Transportation Systems.This work was supported by Institute of Information & Communications Technology Planning & Evaluation (IITP) Grant funded by the Korea Government (MSIT) (2019-0-00136, Development of AI-Convergence Technologies for Smart City Industry Productivity Innovation); The work of Javier Del Ser was supported by the Basque Government through the EMAITEK and ELKARTEK Programs, as well as by the Department of Education of this institution (Consolidated Research Group MATHMODE, IT1294-19); VHCA received support from the Brazilian National Council for Research and Development (CNPq, Grant #304315/2017-6 and #430274/2018-1).Muhammad, K.; Ullah, A.; Lloret, J.; Del Ser, J.; De Albuquerque, VHC. (2021). Deep Learning for Safe Autonomous Driving: Current Challenges and Future Directions. IEEE Transactions on Intelligent Transportation Systems. 22(7):4316-4336. https://doi.org/10.1109/TITS.2020.30322274316433622
    corecore