127 research outputs found

    Pyrus Base: An Open Source Python Framework for the RoboCup 2D Soccer Simulation

    Full text link
    Soccer, also known as football in some parts of the world, involves two teams of eleven players whose objective is to score more goals than the opposing team. To simulate this game and attract scientists from all over the world to conduct research and participate in an annual computer-based soccer world cup, Soccer Simulation 2D (SS2D) was one of the leagues initiated in the RoboCup competition. In every SS2D game, two teams of 11 players and one coach connect to the RoboCup Soccer Simulation Server and compete against each other. Over the past few years, several C++ base codes have been employed to control agents' behavior and their communication with the server. Although C++ base codes have laid the foundation for the SS2D, developing them requires an advanced level of C++ programming. C++ language complexity is a limiting disadvantage of C++ base codes for all users, especially for beginners. To conquer the challenges of C++ base codes and provide a powerful baseline for developing machine learning concepts, we introduce Pyrus, the first Python base code for SS2D. Pyrus is developed to encourage researchers to efficiently develop their ideas and integrate machine learning algorithms into their teams. Pyrus base is open-source code, and it is publicly available under MIT License on GitHu

    Expert-Free Online Transfer Learning in Multi-Agent Reinforcement Learning

    Full text link
    Transfer learning in Reinforcement Learning (RL) has been widely studied to overcome training issues of Deep-RL, i.e., exploration cost, data availability and convergence time, by introducing a way to enhance training phase with external knowledge. Generally, knowledge is transferred from expert-agents to novices. While this fixes the issue for a novice agent, a good understanding of the task on expert agent is required for such transfer to be effective. As an alternative, in this paper we propose Expert-Free Online Transfer Learning (EF-OnTL), an algorithm that enables expert-free real-time dynamic transfer learning in multi-agent system. No dedicated expert exists, and transfer source agent and knowledge to be transferred are dynamically selected at each transfer step based on agents' performance and uncertainty. To improve uncertainty estimation, we also propose State Action Reward Next-State Random Network Distillation (sars-RND), an extension of RND that estimates uncertainty from RL agent-environment interaction. We demonstrate EF-OnTL effectiveness against a no-transfer scenario and advice-based baselines, with and without expert agents, in three benchmark tasks: Cart-Pole, a grid-based Multi-Team Predator-Prey (mt-pp) and Half Field Offense (HFO). Our results show that EF-OnTL achieve overall comparable performance when compared against advice-based baselines while not requiring any external input nor threshold tuning. EF-OnTL outperforms no-transfer with an improvement related to the complexity of the task addressed

    Context Awareness in Swarm Systems

    Full text link
    Recent swarms of Uncrewed Systems (UxS) require substantial human input to support their operation. The little 'intelligence' on these platforms limits their potential value and increases their overall cost. Artificial Intelligence (AI) solutions are needed to allow a single human to guide swarms of larger sizes. Shepherding is a bio-inspired swarm guidance approach with one or a few sheepdogs guiding a larger number of sheep. By designing AI-agents playing the role of sheepdogs, humans can guide the swarm by using these AI agents in the same manner that a farmer uses biological sheepdogs to muster sheep. A context-aware AI-sheepdog offers human operators a smarter command and control system. It overcomes the current limiting assumption in the literature of swarm homogeneity to manage heterogeneous swarms and allows the AI agents to better team with human operators. This thesis aims to demonstrate the use of an ontology-guided architecture to deliver enhanced contextual awareness for swarm control agents. The proposed architecture increases the contextual awareness of AI-sheepdogs to improve swarm guidance and control, enabling individual and collective UxS to characterise and respond to ambiguous swarm behavioural patterns. The architecture, associated methods, and algorithms advance the swarm literature by allowing improved contextual awareness to guide heterogeneous swarms. Metrics and methods are developed to identify the sources of influence in the swarm, recognise and discriminate the behavioural traits of heterogeneous influencing agents, and design AI algorithms to recognise activities and behaviours. The proposed contributions will enable the next generation of UxS with higher levels of autonomy to generate more effective Human-Swarm Teams (HSTs)

    Efficient Learning with Subgoals and Gaussian Process

    Full text link
    This thesis demonstrates how data efficiency in reinforcement learning can be improved through the use of subgoals and Gaussian process. Data efficiency is extremely important in a range of problems in which gathering additional data is expensive. This tends to be the case in most problems that involve actual interactions with the physical world, such as a robot kicking a ball, an autonomous vehicle driving or a drone manoeuvring. State of the art data efficiency is achieved on several well researched problems. The systems that achieve this learn Gaussian process state transition models of the problem. The model based learner system uses the state transition model to learn the action to take in each state. The subgoal planner makes use of the state transition model to build an explicit plan to solve the problem. The subgoal planner is improved through the use of learned subgoals to aid navigation of the problem space. The resource managed learner balances the costs of computation against the value of selecting better experiments in order to improve data efficiency. An active learning system is used to estimate the value of the experiments in terms of how much they may improve the current solution. This is compared to an estimate of how much better an experiment found by expending additional computation will be along with the costs of performing that computation. A theoretical framework around the use of subgoals in problem solving is presented. This framework provides insights into when and why subgoals are effective, along with avenues for future research. This includes a detailed proposal for a system built off the subgoal theory framework intended to make full use of subgoals to create an effective reinforcement learning system

    Agents and Robots for Reliable Engineered Autonomy

    Get PDF
    This book contains the contributions of the Special Issue entitled "Agents and Robots for Reliable Engineered Autonomy". The Special Issue was based on the successful first edition of the "Workshop on Agents and Robots for reliable Engineered Autonomy" (AREA 2020), co-located with the 24th European Conference on Artificial Intelligence (ECAI 2020). The aim was to bring together researchers from autonomous agents, as well as software engineering and robotics communities, as combining knowledge from these three research areas may lead to innovative approaches that solve complex problems related to the verification and validation of autonomous robotic systems

    Control-Theoretical Perspective in Feedback-Based Systems Testing

    Get PDF
    Self-Adaptive Systems (SAS) and Cyber-Physical Systems (CPS) have received significant attention in recent computer engineering research. This is due to their ability to improve the level of autonomy of engineering artefacts. In both cases, this autonomy increase is achieved through feedback. Feedback is the iteration of sens- ing and actuation to respectively acquire knowledge about the current state of said artefacts and steer them toward a desired state or behaviour. In this thesis we dis- cuss the challenges that the introduction of feedback poses on the verification and validation process for such systems, more specifically, on their testing. We highlight three types of new challenges with respect to traditional software testing: alteration of testing input and output definition, and intertwining of components with different nature. Said challenges affect the ways we can define different elements of the test- ing process: coverage criteria, testing set-ups, test-case generation strategies, and oracles in the testing process. This thesis consists of a collection of three papers and contributes to the definition of each of the mentioned testing elements. In terms of coverage criteria for SAS, Paper I proposes the casting of the testing problem, to a semi-infinite optimisation problem. This allows to leverage the Scenario Theory from the field of robust control, and provide a worst-case probabilistic bound on a given performance metric of the system under test. For what concerns the definition of testing set-ups for control-based CPS, Paper II investigates the implications of the use of different abstractions (i.e., the use of implemented or emulated compo- nents) on the significance of the testing. The paper provides evidence that confutes the common assumption present in previous literature on the existence of a hierar- chy among commonly used testing set-ups. Finally, regarding the test-case gener- ation and oracle definition, Paper III defines the problem of stress testing control- based CPS software. We contribute to the generation and identification of stress test cases for such software by proposing a novel test case parametrisation. Leveraging the proposed parametrisation we define metamorphic relations on the expected be- haviour of the system under test. We use said relations for the development of stress testing approach and sanity checks on the testing results

    Reservoir Computing with Dynamical Systems

    Get PDF

    Homeostatic action selection for simultaneous multi-tasking

    Get PDF
    Mobile robots are rapidly developing and gaining in competence, but the potential of available hardware still far outstrips our ability to harness. Domain-specific applications are most successful due to customised programming tailored to a narrow area of application. Resulting systems lack extensibility and autonomy, leading to increased cost of development. This thesis investigates the possibility of designing and implementing a general framework capable of simultaneously coordinating multiple tasks that can be added or removed in a plug and play manner. A homeostatic mechanism is proposed for resolving the contentions inevitably arising between tasks competing for the use of the same robot actuators. In order to evaluate the developed system, demonstrator tasks are constructed to reach a goal location, prevent collision, follow a contour around obstacles and balance a ball within a spherical bowl atop the robot. Experiments show preliminary success with the homeostatic coordination mechanism but a restriction to local search causes issues that preclude conclusive evaluation. Future work identifies avenues for further research and suggests switching to a planner with the sufficient foresight to continue evaluation."This work was supported by the Engineering and Physical Sciences Research Council [grant number EP/K503162/1]." -- Acknowledgement

    Distributed, decentralised and compensational mechanisms for platoon formation

    Get PDF
    Verkehrsprobleme nehmen mit der weltweiten Urbanisierung und der Zunahme der Anzahl der Fahrzeuge pro Kopf zu. Platoons, eine Formation von eng hintereinander fahrenden Fahrzeugen, stellen sich als mögliche Lösung dar, da bestehende Forschungen darauf hinweisen, dass sie zu einer besseren Straßenauslastung beitragen, den Kraftstoffverbrauch und die Emissionen reduzieren und Engpässe schneller entlasten können. Rund um das Thema Platooning gibt es viele Aspekte zu erforschen: Sicherheit, Stabilität, Kommunikation, Steuerung und Betrieb, die allesamt notwendig sind, um den Einsatz von Platooning im Alltagsverkehr näher zu bringen. Während in allen genannten Bereichen bereits umfangreiche Forschungen durchgeführt wurden, gibt es bisher nur wenige Arbeiten, die sich mit der logischen Gruppierung von Fahrzeugen in Platoons beschäftigen. Daher befasst sich diese Arbeit mit dem noch wenig erforschten Problem der Platoonbildung, wobei sich die vorhandenen Beispiele mit auf Autobahnen fahrenden Lastkraftwagen beschäftigen. Diese Fälle befinden sich auf der strategischen und taktischen Ebene der Planung, da sie von einem großen Zeithorizont profitieren und die Gruppierung entsprechend optimiert werden kann. Die hier vorgestellten Ansätze befinden sich hingegen auf der operativen Ebene, indem Fahrzeuge aufgrund der verteilten und dezentralen Natur dieser Ansätze spontan und organisch gruppiert und gesteuert werden. Dadurch entstehen sogenannte opportunistische Platoons, die aufgrund ihrer Flexibilität eine vielversprechende Voraussetzung für alle Netzwerkarte bieten könnten. Insofern werden in dieser Arbeit zwei neuartige Algorithmen zur Bildung von Platoons vorgestellt: ein verteilter Ansatz, der von klassischen Routing-Problemen abgeleitet wurde, und ein ergänzender dezentraler kompensatorischer Ansatz. Letzteres nutzt automatisierte Verhandlungen, um es den Fahrzeugen zu erleichtern, sich auf der Basis eines monetären Austausches in einem Platoon zu organisieren. In Anbetracht der Tatsache, dass alle Verkehrsteilnehmer über eine Reihe von Präferenzen, Einschränkungen und Zielen verfügen, muss das vorgeschlagene System sicherstellen, dass jede angebotene Lösung für die einzelnen Fahrzeuge akzeptabel und vorteilhaft ist und den möglichen Aufwand, die Kosten und die Opfer überwiegt. Dies wird erreicht, indem den Platooning-Fahrzeugen eine Form von Anreiz geboten wird, im Sinne von entweder Kostensenkung oder Ampelpriorisierung. Um die vorgeschlagenen Algorithmen zu testen, wurde eine Verkehrssimulation unter Verwendung realer Netzwerke mit realistischer Verkehrsnachfrage entwickelt. Die Verkehrsteilnehmer wurden in Agenten umgewandelt und mit der notwendigen Funktionalität ausgestattet, um Platoons zu bilden und innerhalb dieser zu operieren. Die Anwendbarkeit und Eignung beider Ansätze wurde zusammen mit verschiedenen anderen Aspekten untersucht, die den Betrieb von Platoons betreffen, wie Größe, Verkehrszustand, Netzwerkpositionierung und Anreizmethoden. Die Ergebnisse zeigen, dass die vorgeschlagenen Mechanismen die Bildung von spontanen Platoons ermöglichen. Darüber hinaus profitierten die teilnehmenden Fahrzeuge mit dem auf verteilter Optimierung basierenden Ansatz und unter Verwendung kostensenkender Anreize unabhängig von der Platoon-Größe, dem Verkehrszustand und der Positionierung, mit Nutzenverbesserungen von 20% bis über 50% im Vergleich zur untersuchten Baseline. Bei zeitbasierten Anreizen waren die Ergebnisse uneinheitlich, wobei sich der Nutzen einiger Fahrzeuge verbesserte, bei einigen keine Veränderung eintrat und bei anderen eine Verschlechterung zu verzeichnen war. Daher wird die Verwendung solcher Anreize aufgrund ihrer mangelnden Pareto-Effizienz nicht empfohlen. Der kompensatorische und vollständig dezentralisierte Ansatz weißt einige Vorteile auf, aber die daraus resultierende Verbesserung war insgesamt vernachlässigbar. Die vorgestellten Mechanismen stellen einen neuartigen Ansatz zur Bildung von Platoons dar und geben einen aussagekräftigen Einblick in die Mechanik und Anwendbarkeit von Platoons. Dies schafft die Voraussetzungen für zukünftige Erweiterungen in der Planung, Konzeption und Implementierung effektiverer Infrastrukturen und Verkehrssysteme.Traffic problems have been on the rise corresponding with the increase in worldwide urbanisation and the number of vehicles per capita. Platoons, which are a formation of vehicles travelling close together, present themselves as a possible solution, as existing research indicates that they can contribute to better road usage, reduce fuel consumption and emissions and decongest bottlenecks faster. There are many aspects to be explored pertaining to the topic of platooning: safety, stability, communication, controllers and operations, all of which are necessary to bring platoons closer to use in everyday traffic. While extensive research has already made substantial strides in all the aforementioned fields, there is so far little work on the logical grouping of vehicles in platoons. Therefore, this work addresses the platoon formation problem, which has not been heavily researched, with existing examples being focused on large, freight vehicles travelling on highways. These cases find themselves on the strategic and tactical level of planning since they benefit from a large time horizon and the grouping can be optimised accordingly. The approaches presented here, however, are on the operational level, grouping and routing vehicles spontaneously and organically thanks to their distributed and decentralised nature. This creates so-called opportunistic platoons which could provide a promising premise for all networks given their flexibility. To this extent, this thesis presents two novel platoon forming algorithms: a distributed approach derived from classical routing problems, and a supplementary decentralised compensational approach. The latter uses automated negotiation to facilitate vehicles organising themselves in a platoon based on monetary exchanges. Considering that all traffic participants have a set of preferences, limitations and goals, the proposed system must ensure that any solution provided is acceptable and beneficial for the individual vehicles, outweighing any potential effort, cost and sacrifices. This is achieved by offering platooning vehicles some form of incentivisation, either cost reductions or traffic light prioritisation. To test the proposed algorithms, a traffic simulation was developed using real networks with realistic traffic demand. The traffic participants were transformed into agents and given the necessary functionality to build platoons and operate within them. The applicability and suitability of both approaches were investigated along with several other aspects pertaining to platoon operations such as size, traffic state, network positioning and incentivisation methods. The results indicate that the mechanisms proposed allow for spontaneous platoons to be created. Moreover, with the distributed optimisation-based approach and using cost-reducing incentives, participating vehicles benefited regardless of the platoon size, traffic state and positioning, with utility improvements ranging from 20% to over 50% compared to the studied baseline. For time-based incentives the results were mixed, with the utility of some vehicles improving, some seeing no change and for others, deteriorating. Therefore, the usage of such incentives would not be recommended due to their lack of Pareto-efficiency. The compensational and completely decentralised approach shows some benefits, but the resulting improvement was overall negligible. The presented mechanisms are a novel approach to platoon formation and provide meaningful insight into the mechanics and applicability of platoons. This sets the stage for future expansions into planning, designing and implementing more effective infrastructures and traffic systems

    Reinforcement Learning-based Optimization of Multiple Access in Wireless Networks

    Get PDF
    In this thesis, we study the problem of Multiple Access (MA) in wireless networks and design adaptive solutions based on Reinforcement Learning (RL). We analyze the importance of MA in the current communications scenery, where bandwidth-hungry applications emerge due to the co-evolution of technological progress and societal needs, and explain that improvements brought by new standards cannot overcome the problem of resource scarcity. We focus on resource-constrained networks, where devices have restricted hardware-capabilities, there is no centralized point of control and coordination is prohibited or limited. The protocols that we optimize follow a Random Access (RA) approach, where sensing the common medium prior to transmission is not possible. We begin with the study of time access and provide two reinforcement learning algorithms for optimizing Irregular Repetition Slotted ALOHA (IRSA), a state-of-the-art RA protocol. First, we focus on ensuring low complexity and propose a Q-learning variant where learners act independently and converge quickly. We, then, design an algorithm in the area of coordinated learning and focus on deriving convergence guarantees for learning while minimizing the complexity of coordination. We provide simulations that showcase how coordination can help achieve a fine balance, in terms of complexity and performance, between fully decentralized and centralized solutions. In addition to time access, we study channel access, a problem that has recently attracted significant attention in cognitive radio. We design learning algorithms in the framework of Multi-player Multi-armed Bandits (MMABs), both for static and dynamic settings, where devices arrive at different time steps. Our focus is on deriving theoretical guarantees and ensuring that performance scales well with the size of the network. Our works constitute an important step towards addressing the challenges that the properties of decentralization and partial observability, inherent in resource-constrained networks, pose for RL algorithms
    corecore