37 research outputs found
A study of FMQ heuristic in cooperative multi-agent games.
International audienceThe article focuses on decentralized reinforcement learning (RL) in cooperative multi-agent games, where a team of independent learning agents (ILs) try to coordinate their individual actions to reach an optimal joint action. Within this framework, some algorithms based on Q-learning are proposed in recent works. Especially, we are interested in Distributed Q-learning which finds optimal policies in deterministic games, and in the Frequency Maximum Q value (FMQ) heuristic which is able in partially stochastic matrix games to distinguish if a poor reward received for the same action are due to either miscoordination or to the noisy reward function. Making this distinction is one of the main difficulties to solve stochastic games. Our objective is to find an algorithm able to switch over the updates according to a detection of the cause of noise. In this paper, a modified version of the FMQ heuristic is proposed which achieves this detection and the update adaptation. Moreover, this modified FMQ version is more robust and very easy to set
Towards Collaborative Plan Acquisition through Theory of Mind Modeling in Situated Dialogue
Collaborative tasks often begin with partial task knowledge and incomplete
initial plans from each partner. To complete these tasks, agents need to engage
in situated communication with their partners and coordinate their partial
plans towards a complete plan to achieve a joint task goal. While such
collaboration seems effortless in a human-human team, it is highly challenging
for human-AI collaboration. To address this limitation, this paper takes a step
towards collaborative plan acquisition, where humans and agents strive to learn
and communicate with each other to acquire a complete plan for joint tasks.
Specifically, we formulate a novel problem for agents to predict the missing
task knowledge for themselves and for their partners based on rich perceptual
and dialogue history. We extend a situated dialogue benchmark for symmetric
collaborative tasks in a 3D blocks world and investigate computational
strategies for plan acquisition. Our empirical results suggest that predicting
the partner's missing knowledge is a more viable approach than predicting one's
own. We show that explicit modeling of the partner's dialogue moves and mental
states produces improved and more stable results than without. These results
provide insight for future AI agents that can predict what knowledge their
partner is missing and, therefore, can proactively communicate such information
to help their partner acquire such missing knowledge toward a common
understanding of joint tasks
Distributed reinforcement learning for self-reconfiguring modular robots
Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2007.This electronic version was submitted by the student author. The certified thesis is available in the Institute Archives and Special Collections.Includes bibliographical references (p. 101-106).In this thesis, we study distributed reinforcement learning in the context of automating the design of decentralized control for groups of cooperating, coupled robots. Specifically, we develop a framework and algorithms for automatically generating distributed controllers for self-reconfiguring modular robots using reinforcement learning. The promise of self-reconfiguring modular robots is that of robustness, adaptability and versatility. Yet most state-of-the-art distributed controllers are laboriously handcrafted and task-specific, due to the inherent complexities of distributed, local-only control. In this thesis, we propose and develop a framework for using reinforcement learning for automatic generation of such controllers. The approach is profitable because reinforcement learning methods search for good behaviors during the lifetime of the learning agent, and are therefore applicable to online adaptation as well as automatic controller design. However, we must overcome the challenges due to the fundamental partial observability inherent in a distributed system such as a self reconfiguring modular robot. We use a family of policy search methods that we adapt to our distributed problem. The outcome of a local search is always influenced by the search space dimensionality, its starting point, and the amount and quality of available exploration through experience.(cont) We undertake a systematic study of the effects that certain robot and task parameters, such as the number of modules, presence of exploration constraints, availability of nearest-neighbor communications, and partial behavioral knowledge from previous experience, have on the speed and reliability of learning through policy search in self-reconfiguring modular robots. In the process, we develop novel algorithmic variations and compact search space representations for learning in our domain, which we test experimentally on a number of tasks. This thesis is an empirical study of reinforcement learning in a simulated lattice based self-reconfiguring modular robot domain. However, our results contribute to the broader understanding of automatic generation of group control and design of distributed reinforcement learning algorithms.by Paulina Varshavskaya.Ph.D
White Paper 11: Artificial intelligence, robotics & data science
198 p. : 17 cmSIC white paper on Artificial Intelligence, Robotics and Data Science sketches a preliminary roadmap for addressing current R&D challenges associated with automated and autonomous machines. More than 50 research challenges investigated all over Spain by more than 150 experts within CSIC are presented in eight chapters. Chapter One introduces key concepts and tackles the issue of the integration of knowledge (representation), reasoning and learning in the design of artificial entities. Chapter Two analyses challenges associated with the development of theories –and supporting technologies– for modelling the behaviour of autonomous agents. Specifically, it pays attention to the interplay between elements at micro level (individual autonomous agent interactions) with the macro world (the properties we seek in large and complex societies). While Chapter Three discusses the variety of data science applications currently used in all fields of science, paying particular attention to Machine Learning (ML) techniques, Chapter Four presents current development in various areas of robotics. Chapter Five explores the challenges associated with computational cognitive models. Chapter Six pays attention to the ethical, legal, economic and social challenges coming alongside the development of smart systems. Chapter Seven engages with the problem of the environmental sustainability of deploying intelligent systems at large scale. Finally, Chapter Eight deals with the complexity of ensuring the security, safety, resilience and privacy-protection of smart systems against cyber threats.18 EXECUTIVE SUMMARY ARTIFICIAL INTELLIGENCE, ROBOTICS AND DATA SCIENCE Topic Coordinators Sara Degli Esposti ( IPP-CCHS, CSIC ) and Carles Sierra ( IIIA, CSIC ) 18 CHALLENGE 1 INTEGRATING KNOWLEDGE, REASONING AND LEARNING Challenge Coordinators Felip Manyà ( IIIA, CSIC ) and Adrià Colomé ( IRI, CSIC – UPC ) 38 CHALLENGE 2 MULTIAGENT SYSTEMS Challenge Coordinators N. Osman ( IIIA, CSIC ) and D. López ( IFS, CSIC ) 54 CHALLENGE 3 MACHINE LEARNING AND DATA SCIENCE Challenge Coordinators J. J. Ramasco Sukia ( IFISC ) and L. Lloret Iglesias ( IFCA, CSIC ) 80 CHALLENGE 4 INTELLIGENT ROBOTICS Topic Coordinators G. Alenyà ( IRI, CSIC – UPC ) and J. Villagra ( CAR, CSIC ) 100 CHALLENGE 5 COMPUTATIONAL COGNITIVE MODELS Challenge Coordinators M. D. del Castillo ( CAR, CSIC) and M. Schorlemmer ( IIIA, CSIC ) 120 CHALLENGE 6 ETHICAL, LEGAL, ECONOMIC, AND SOCIAL IMPLICATIONS Challenge Coordinators P. Noriega ( IIIA, CSIC ) and T. AusÃn ( IFS, CSIC ) 142 CHALLENGE 7 LOW-POWER SUSTAINABLE HARDWARE FOR AI Challenge Coordinators T. Serrano ( IMSE-CNM, CSIC – US ) and A. Oyanguren ( IFIC, CSIC - UV ) 160 CHALLENGE 8 SMART CYBERSECURITY Challenge Coordinators D. Arroyo Guardeño ( ITEFI, CSIC ) and P. Brox Jiménez ( IMSE-CNM, CSIC – US )Peer reviewe
Metaphor-based negotiation and its application in AGV movement planning
The theme of this thesis is "metaphor-based negotiation". By metaphor-based negotiation I mean a category of approaches for problem-solving in Distributed Artificial
Intelligence (DAI) that mimic some aspects of human negotiation behaviour. The
research in this dissertation is divided into two closely related parts. Cooperative interaction among agents in a multiagent system (MAS) is discussed in general, and
the discussion leads to a formal definition of metaphor-based negotiation. Then, as
a specific application, a "spring-based" computational model for metaphor-based negotiation is developed as an approach to solving movement planning, specifically the
AGV scheduling problem (AGVSP) — determing the timings of AGVs' activities, of
automated guided vehicles (AGVs) in a factory.By formally addressing the multi-agent cooperative interaction problem and assuming
that agents in a MAS are rational, benevolent and fully informed, an initial strategy
set of cooperative interaction can be reduced to a strategy set by eliminating strategies
that are irrational in a group sense. However, it is proved in this dissertation that, in
the remaining strategy set, no unique strategy can be found that is acceptable to all
agents according their individual preferences. More specifically, in this smaller strategy
set, if one agent moves from one strategy to another in an attempt to better its individual goal achievement, then there is at least one agent whose goal achievement will
be negatively affected by such a move. So, the cooperative interaction problem can
only be partially solved if no further knowledge is given to those agents. The idea of a
common sense principle is introduced in this dissertation to overcome the deficiencies
of the assumptions of rationality, benevolence and full-informedness.In reality, the assumption of full-informedness of agents may not be practical. Communication is needed for agents to (1) exchange their local problem solving information,
and (2) exchange proposals for global problem solving, when their views are in conflict.
Based on the discussion of cooperative interaction, a formal definition of metaphorbased
negotiation is proposed to formally indicate what is a proposal and what is the
condition for accepting a proposal from another agent. In this definition, the common
sense principle is one of the most important features, not found in definitions of negotiation available so far in the literature, which guides agents to find an agreement
when negotiation is running into difficulties.The AGVSP involves timing activities for each AGV in a AGV-based factory. The
AGVSP is naturally distributed: the whole problem can be easily divided into several
subproblems each of which involves timing of activities of one AGV. Therefore, it is
intuitively straightforward for us to seek DAI approaches to solving the AGVSP. In
spired by Kwa's Iterative Negotiation Model [Kwa 88b] [Kwa 88a] for the AGVSP, we
developed a spring-based (metaphor-based) negotiation model for the AGVSP to overcome some vital problems in Kwa's model. The idea of the spring-based negotiation
model is described below:The AGVSP can be regarded as a Distributed Constraint Satisfaction Problem (DCSP)
and solved in a MAS. Each agent in the MAS is designed to solve a subproblem — a
local scheduling problem which is a small Constraint Satisfaction Problem (CSP). Conflicts exist when intra-agent constraints or inter-agent constraints are violated. These
constraints can be classified into hard constraints— those that can not be relaxed at
the agent level unless the system designer permits (e.g., by providing an arbitrator),
and soft constraints — those that can be relaxed at the agent level when necessary.
When agents are in conflict, i.e, when some inter-agent constraints are violated (or
say, when one agent's timings of its activities overlap those of some other agents),
these agents involved will resolve the conflicts through a (metaphor-based) negotiation
procedure in which conflicts will be gradually resolved by each agent's relaxation of
its intra-agent constraints, i.e, by yielding some amount of its initially allocated resources to other agents or by shifting its initially allocated resources. The negotiation
can be viewed as a process of exchanging proposals (of cooperative strategies) between
conflicting agents, where a cooperative strategy is a possible resolution to a conflict
according to the viewpoint of the proposing agent. However, since agents are designed
to be rational, each agent that is involved in the conflicts will try hard to relax its
intra-agent constraints as little as possible. Further, it is reasonably acceptable that
the more an intra-agent constraint has been relaxed the less the respective agent is
willing to relax it further. This feature can be modeled by a spring — the more it
has been compressed the harder it is to compress it further. Based on this inspiration,
a spring-based computational model of metaphor-based negotiation is proposed: each
agent's local schedule is represented by a local spring network in which each spring element represents a soft intra-agent constraint. Relaxation of an intra-agent constraint
is likened to a spring being compressed by external forces from other agents. As a
consequence, the compressed spring will also show a reacting force upon those compressing agents. An agreement will be reached when those forces and reacting forces
are balanced. This is the common sense principle in the spring-based negotiation. The
model solves some key issues, e.g., how to select negotiation techniques and skills during the process of negotiation, that have not been solved by Kwa's iterative negotiation
model. Some experimental evidence of the value of this model is presented
Eyes in the sky: multi-drones surveillance technology
Neste projeto pretende-se desenvolver uma rede de segurança baseada no trabalho cooperativo entre vários UAVs. Sabendo que os UAVs podem variar na sua autonomia, velocidade de voo, estabilidade e muitos outros fatores, será feito um estudo onde tentaremos potenciar as melhores caracterÃsticas para a rede de segurança a desenvolver. Em simultâneo com este estudo serão aplicados algoritmos de controlo de distribuição aos vários agentes para que a cobertura da área seja máxima. O resultado final esperado deste projeto é conseguir criar um miniprograma capaz de comunicar com vários agentes de patrulha, receber as suas localizações, calcular as suas posições ideais ou, no caso de não conseguirem cobrir por completo a área, calcular uma rota de patrulha e, enviar as informações calculadas. Esperamos também que este programa possa ser usado em simulação e se possÃvel no terreno.In this project, we will develop a security network based on the cooperation between several UAVs. Knowing that UAV's autonomy, speed, stability and many other factors, a study will be made where we will leverage the best characteristics for our goals. Simultaneously, we will design and apply a coverage algorithm to control the distribution of the agents in the area to maximize their coverage. As result of this project we wish to have a mini-program capable of communicate with several agents, read their locations, calculate their optimal positions or patrolling routes, if they can't cover all the area with their sensor range, and send them the information needed. We also want this program to be at least simulated and if possible on the field
Analysis of radiofrequency-based methods for position and velocity determination of autonomous robots in lunar surface exploration missions
The use of distributed systems has been disruptive in almost any industrial sector, from manufacturing to processing plants from environmental monitoring to vehicle control, and many more. It is therefore natural to assess the benefits that such an advantageous engineering paradigm could bring to space exploration. In recent years, we have been witness to the emergence of concepts such as fractionated satellite systems, formation flying, megaconstellations, and femtoswarms. Most of these space missions have evolved from the idea of a decentralization of processes that were formerly performed in platforms conceived as monolithic systems.
The application of this concept to robotic systems is not new, and a great deal of scientific contributions on multi-robot systems exists, focusing on different aspects such as cooperative robotics, behavioural or reactive control, distributed artificial intelligence, swarm multi-agent systems etc. The intrinsic advantages of distribution (improved reliability and efficiency, higher robustness, etc.) has been boosted by the exponential growing of computational power density and a simultaneous miniaturization of technology, leading to smaller and more powerful robotic platforms, which could make a distributed robotic system, made of small robotic agents, a powerful substitute to classical large robotic platforms.
This thesis proposes, in the framework of multi-robot systems, a localization method for robotic agents in planetary surface exploration scenarios based on RF range and Doppler frequency shift analysis. The relevance of spatial localization awareness in agents belonging to a distributed robotic system is defined in the context of the advantages of robotic exploration. Different range determination techniques and, specifically, the advantages of including Doppler Effect in the determination of the relative position within the robotic system deployed are considered and the strengths and weaknesses analysed accordingly. Special attention is devoted to the noise sources present in the lunar environment, related to a practical (i.e. non-ideal) implementation architecture and its influence on the system performance. From this point of view, we develop a theoretical model for localization accuracy estimation, generated from power spectrum characteristics, in accordance with the system architecture proposed, and consolidated with numerical simulations and a parametrical assessment on a set of real references of components playing a key role in the overall performance.
The selected system architecture is then implemented in a representative set-up and tested under laboratory conditions. Algorithms used for carrier frequency generation and frequency measurement are developed, applied and tested in the hardware-on-the-loop breadboard. The results show that Doppler frequency component can be measured with the proposed architecture, yielding a high sensitivity in the determination of relative speed even at standard communication frequencies (UHF), and improving significantly at higher bands (S, C, etc.). This enables the possibility of adding relative speed to relative position determination via sensor fusion techniques, improving the response time and accuracy during navigation through the exploration scenario
Hierarchical reinforcement learning for trading agents
Autonomous software agents, the use of which has increased due to the recent growth in computer power, have considerably improved electronic commerce processes by facilitating automated trading actions between the market participants (sellers, brokers and buyers). The rapidly changing market environments pose challenges to the performance of such agents, which are generally developed for specific market settings. To this end, this thesis is concerned with designing agents that can gradually adapt to variable, dynamic and uncertain markets and that are able to reuse the acquired trading skills in new markets. This thesis proposes the use of reinforcement learning techniques to develop adaptive trading agents and puts forward a novel software architecture based on the semi-Markov decision process and on an innovative knowledge transfer framework. To evaluate my approach, the developed trading agents are tested in internationally well-known market simulations and their behaviours when buying or/and selling in the retail and wholesale markets are analysed. The proposed approach has been shown to improve the adaptation of the trading agent in a specific market as well as to enable the portability of the its knowledge in new markets
Deep Learning for Safe Autonomous Driving: Current Challenges and Future Directions
[EN] Advances in information and signal processing technologies have a significant impact on autonomous driving (AD), improving driving safety while minimizing the efforts of human drivers with the help of advanced artificial intelligence (AI) techniques. Recently, deep learning (DL) approaches have solved several real-world problems of complex nature. However, their strengths in terms of control processes for AD have not been deeply investigated and highlighted yet. This survey highlights the power of DL architectures in terms of reliability and efficient real-time performance and overviews state-of-the-art strategies for safe AD, with their major achievements and limitations. Furthermore, it covers major embodiments of DL along the AD pipeline including measurement, analysis, and execution, with a focus on road, lane, vehicle, pedestrian, drowsiness detection, collision avoidance, and traffic sign detection through sensing and vision-based DL methods. In addition, we discuss on the performance of several reviewed methods by using different evaluation metrics, with critics on their pros and cons. Finally, this survey highlights the current issues of safe DL-based AD with a prospect of recommendations for future research, rounding up a reference material for newcomers and researchers willing to join this vibrant area of Intelligent Transportation Systems.This work was supported by Institute of Information & Communications Technology Planning & Evaluation (IITP) Grant funded by the Korea Government (MSIT) (2019-0-00136, Development of AI-Convergence Technologies for Smart City Industry Productivity Innovation); The work of Javier Del Ser was supported by the Basque Government through the EMAITEK and ELKARTEK Programs, as well as by the Department of Education of this institution (Consolidated Research Group MATHMODE, IT1294-19); VHCA received support from the Brazilian National Council for Research and Development (CNPq, Grant #304315/2017-6 and #430274/2018-1).Muhammad, K.; Ullah, A.; Lloret, J.; Del Ser, J.; De Albuquerque, VHC. (2021). Deep Learning for Safe Autonomous Driving: Current Challenges and Future Directions. IEEE Transactions on Intelligent Transportation Systems. 22(7):4316-4336. https://doi.org/10.1109/TITS.2020.30322274316433622