4 research outputs found

    Learning Automata as a Basis for Multi Agent Reinforcement Learning

    No full text
    Learning Automata (LA) are adaptive decision making devices suited for operation in unknown environments [12]. Originally they were developed in the area of mathematical psychology and used for modeling observed behavior. In its current form, LA are closely related to Reinforcement Learning (RL) approaches and most popular in the area of engineering. LA combine fast and accurate convergence with low computational complexity, and have been applied to a broad range of modeling and control problems. However, the intuitive, yet analytically tractable concept of learning automata makes them also very suitable as a theoretical framework for Multi agent Reinforcement Learning (MARL). Reinforcement Learning (RL) is already an established and profound theoretical framework for learning in stand-alone or single-agent systems. Yet, extending RL to multi-agent systems (MAS) does not guarantee the same theoretical grounding. As long as the environment an agent is experiencing is Markov, and the agent can experiment sufficiently, RL guarantees convergence to the optimal strategy. In a MAS however, the reinforcement an agent receives, may depend on the actions taken by the other agents acting in the same environment. Hence, the Markov property no longer holds. And as such, guarantees of convergence are lost. In the light of the above problem i

    Rule-based interactive assisted reinforcement learning

    Get PDF
    Reinforcement Learning (RL) has seen increasing interest over the past few years, partially owing to breakthroughs in the digestion and application of external information. The use of external information results in improved learning speeds and solutions to more complex domains. This thesis, a collection of five key contributions, demonstrates that comparable performance gains to existing Interactive Reinforcement Learning methods can be achieved using less data, sourced during operation, and without prior verifcation and validation of the information's integrity. First, this thesis introduces Assisted Reinforcement Learning (ARL), a collective term referring to RL methods that utilise external information to leverage the learning process, and provides a non-exhaustive review of current ARL methods. Second, two advice delivery methods common in ARL, evaluative and informative, are compared through human trials. The comparison highlights how human engagement, accuracy of advice, agent performance, and advice utility differ between the two methods. Third, this thesis introduces simulated users as a methodology for testing and comparing ARL methods. Simulated users enable testing and comparing of ARL systems without costly and time-consuming human trials. While not a replacement for well-designed human trials, simulated users offer a cheap and robust approach to ARL design and comparison. Fourth, the concept of persistence is introduced to Interactive Reinforcement Learning. The retention and reuse of advice maximises utility and can lead to improved performance and reduced human demand. Finally, this thesis presents rule-based interactive RL, an iterative method for providing advice to an agent. Existing interactive RL methods rely on constant human supervision and evaluation, requiring a substantial commitment from the advice-giver. Rule-based advice can be provided proactively and be generalised over the state-space while remaining flexible enough to handle potentially inaccurate or irrelevant information. Ultimately, the thesis contributions are validated empirically and clearly show that rule-based advice signicantly reduces human guidance requirements while improving agent performance.Doctor of Pholosoph

    The impact of decentral dispatching strategies on the performance of intralogistics transport systems

    Get PDF
    This thesis focuses on control strategies for intralogistics transport systems. It evaluates how switching from central to decentral dispatching approaches influences the performance of these systems. Many ideas and prototypes for implementing decentral control have been suggested by the scientific community. But usually only the qualitative advantages of this new paradigm are stated. The impact on the performance is not quantified and analyzed. Additionally, decentral control is often confused with distributed algorithms or uses the aggregation of local to global information. In the case of the latter, the technological limitations due to the communication overhead are not considered. The decentral prototypes usually only focus on routing. This paper takes a step back and provides a generic simulation environment which can be used by other researchers to test and compare control strategies in the future. The test environment is used for developing four truly decentral dispatching strategies which work only based on local information. These strategies are compared to a central approach for controlling transportation systems. Input data from two real-world applications is used for a series of simulation experiments with three different layout complexities. Based on the simulation studies neither the central nor the decentral dispatching strategies show a universally superior performance. The results depend on the combination of input data set and layout scenario. The expected efficiency loss for the decentral approaches can be confirmed for stable input patterns. Regardless of the layout complexity the decentral strategies always need more vehicles to reach the performance level of the central control rule when these input characteristics are present. In the case of varying input data and high throughput the decentral strategies outperform the central approach in simple layouts. They require fewer vehicles and less vehicle movement to achieve the central performance. Layout simplicity makes the central dispatching strategy prone to undesired effects. The simple-minded decentral decision rules can achieve a better performance in this kind of environment. But only complex layouts are a relevant benchmark scenario for transferring decentral ideas to real-world applications. In such a scenario the decentral performance deteriorates while the layout-dependent influences on the central strategy become less relevant. This is true for both analyzed input data sets. Consequently, the decentral strategies require at least 36% to 53% more vehicles and 20% to 42% more vehicle movement to achieve the lowest central performance level. Therefore their usage can currently not be justified based on investment and operating costs. The characteristics of decentral systems limit their own performance. The restriction to local information leads to poor dispatching decisions which in return induce self-enforcing inefficiencies. In addition, the application of decentral strategies requires bigger storage location capacity. In several disturbance scenarios the decentral strategies perform fairly well and show their ability to adapt to changed environmental conditions. However, their performance after the disturbance remains in some cases unpredictable and relates to the properties of self-organizing complex systems. A real-world applicability has to be called into question
    corecore