Search CORE

144 research outputs found

The design of absorbing Bayesian pursuit algorithms and the formal analyses of their ε-optimality

Author: Granmo Ole-Christoffer
Oommen John
Zhang Xuan
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2016
Field of study

The fundamental phenomenon that has been used to enhance the convergence speed of learning automata (LA) is that of incorporating the running maximum likelihood (ML) estimates of the action reward probabilities into the probability updating rules for selecting the actions. The frontiers of this field have been recently expanded by replacing the ML estimates with their corresponding Bayesian counterparts that incorporate the properties of the conjugate priors. These constitute the Bayesian pursuit algorithm (BPA), and the discretized Bayesian pursuit algorithm. Although these algorithms have been designed and efficiently implemented, and are, arguably, the fastest and most accurate LA reported in the literature, the proofs of their ϵϵ-optimal convergence has been unsolved. This is precisely the intent of this paper. In this paper, we present a single unifying analysis by which the proofs of both the continuous and discretized schemes are proven. We emphasize that unlike the ML-based pursuit schemes, the Bayesian schemes have to not only consider the estimates themselves but also the distributional forms of their conjugate posteriors and their higher order moments—all of which render the proofs to be particularly challenging. As far as we know, apart from the results themselves, the methodologies of this proof have been unreported in the literature—they are both pioneering and novel

Agder University Research Archive

Solving Two-Person Zero-Sum Stochastic Games With Incomplete Information Using Learning Automata With Artificial Barriers

Author: Oommen John
Silvestre Daniel
Yazidi Anis
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2021
Field of study

Learning automata (LA) with artificially absorbing barriers was a completely new horizon of research in the 1980s (Oommen, 1986). These new machines yielded properties that were previously unknown. More recently, absorbing barriers have been introduced in continuous estimator algorithms so that the proofs could follow a martingale property, as opposed to monotonicity (Zhang et al., 2014), (Zhang et al., 2015). However, the applications of LA with artificial barriers are almost nonexistent. In that regard, this article is pioneering in that it provides effective and accurate solutions to an extremely complex application domain, namely that of solving two-person zero-sum stochastic games that are provided with incomplete information. LA have been previously used (Sastry et al., 1994) to design algorithms capable of converging to the game's Nash equilibrium under limited information. Those algorithms have focused on the case where the saddle point of the game exists in a pure strategy. However, the majority of the LA algorithms used for games are absorbing in the probability simplex space, and thus, they converge to an exclusive choice of a single action. These LA are thus unable to converge to other mixed Nash equilibria when the game possesses no saddle point for a pure strategy. The pioneering contribution of this article is that we propose an LA solution that is able to converge to an optimal mixed Nash equilibrium even though there may be no saddle point when a pure strategy is invoked. The scheme, being of the linear reward-inaction (

L_{R-I}

) paradigm, is in and of itself, absorbing. However, by incorporating artificial barriers, we prevent it from being ``stuck'' or getting absorbed in pure strategies. Unlike the linear reward-εpenalty (

L_{R-ε P}

) scheme proposed by Lakshmivarahan and Narendra almost four decades ago, our new scheme achieves the same goal with much less parameter tuning and in a more elegant manner. This article includes the nontrial proofs of the theoretical results characterizing our scheme and also contains experimental verification that confirms our theoretical findings.acceptedVersio

Agder University Research Archive

The Hierarchical Discrete Learning Automaton Suitable for Environments with Many Actions and High Accuracy Requirements

Author: Jiao Lei
Omslandseter Rebekka Olsson
Oommen John
Yazidi Anis
Zhang Xuan
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2022
Field of study

Author's accepted manuscriptSince its early beginning, the paradigm of Learning Automata (LA), has attracted much interest. Over the last decades, new concepts and various improvements have been introduced to increase the LA’s speed and accuracy, including employing probability updating functions, discretizing the probability space, and implementing the “Pursuit” concept. The concept of incorporating “structure” into the ordering of the LA’s actions is one of the latest advancements to the field, leading to the ϵ-optimal Hierarchical Continuous Pursuit LA (HCPA) that has superior performance to other LA variants when the number of actions is large. Although the previously proposed HCPA is powerful, its speed has a handicap when the required action probability of an action is approaching unity. The reason for this slow convergence is that the learning parameter operates in a multiplicative manner within the probability space, making the increment of the action probability smaller as its probability becomes close to unity. Therefore, we propose the novel Hierarchical Discrete Learning Automata (HDPA) in this paper, which does not possess the same impediment as the HCPA. The proposed machine infuse the principle of discretization into the action probability vector’s updating functionality, where this type of updating is invoked recursively at every depth within a hierarchical tree structure and we pursue the best estimated action in all iterations through utilization of the Estimator phenomenon. The proposed machine is ϵ-optimal, and our experimental results demonstrate that the number of iterations required before convergence is significantly reduced for the HDPA, when compared with the HCPA.acceptedVersio

Agder University Research Archive

The Hierarchical Discrete Pursuit Learning Automaton: A Novel Scheme With Fast Convergence and Epsilon-Optimality

Author: Jiao Lei
Omslandseter Rebekka Olsson
Oommen John
Yazidi Anis
Zhang Xuan
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2022
Field of study

Author's accepted manuscript© 2022 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting /republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.Since the early 1960s, the paradigm of learning automata (LA) has experienced abundant interest. Arguably, it has also served as the foundation for the phenomenon and field of reinforcement learning (RL). Over the decades, new concepts and fundamental principles have been introduced to increase the LA’s speed and accuracy. These include using probability updating functions, discretizing the probability space, and using the “Pursuit” concept. Very recently, the concept of incorporating “structure” into the ordering of the LA’s actions has improved both the speed and accuracy of the corresponding hierarchical machines, when the number of actions is large. This has led to the ϵ -optimal hierarchical continuous pursuit LA (HCPA). This article pioneers the inclusion of all the above-mentioned phenomena into a new single LA, leading to the novel hierarchical discretized pursuit LA (HDPA). Indeed, although the previously proposed HCPA is powerful, its speed has an impediment when any action probability is close to unity, because the updates of the components of the probability vector are correspondingly smaller when any action probability becomes closer to unity. We propose here, the novel HDPA, where we infuse the phenomenon of discretization into the action probability vector’s updating functionality, and which is invoked recursively at every stage of the machine’s hierarchical structure. This discretized functionality does not possess the same impediment, because discretization prohibits it. We demonstrate the HDPA’s robustness and validity by formally proving the ϵ -optimality by utilizing the moderation property. We also invoke the submartingale characteristic at every level, to prove that the action probability of the optimal action converges to unity as time goes to infinity. Apart from the new machine being ϵ -optimal, the numerical results demonstrate that the number of iterations required for convergence is significantly reduce...acceptedVersio

Agder University Research Archive

Utilising policy types for effective ad hoc coordination in multiagent systems

Author: Albrecht Stefano Vittorino
Publication venue: The University of Edinburgh
Publication date: 26/11/2015
Field of study

This thesis is concerned with the ad hoc coordination problem. Therein, the goal is to design an autonomous agent which can achieve high flexibility and efficiency in a multiagent system that admits no prior coordination between the designed agent and the other agents. Flexibility describes the agent’s ability to solve its task with a variety of other agents in the system; efficiency is the relation between the agent’s payoffs and time needed to solve the task; and no prior coordination means that the agent does not a priori know how the other agents behave. This problem is relevant for a number of practical applications, including human-machine interaction tasks, such as adaptive user interfaces, robotic elderly care, and automated trading agents. Motivated by this problem, the central idea studied in this thesis is to utilise a set of policies, or types, to characterise the behaviour of other agents. Specifically, the idea is to reduce the complexity of the interaction problem by assuming that the other agents draw their latent type from some known or hypothesised space of types, and that the assignment of types is governed by an unknown distribution. Based on the current interaction history, we can form posterior beliefs about the relative likelihood of types. These beliefs, combined with the future predictions of the types, can then be used in a planning procedure to compute optimal responses. The aim of this thesis is to study the potential and limitations of this idea in the context of ad hoc coordination. We formulate the ad hoc coordination problem using a game-theoretic model called the stochastic Bayesian game. Based on this model, we derive a canonical algorithmic description of the idea outlined above, called Harsanyi-Bellman Ad Hoc Coordination (HBA). The practical potential of HBA is demonstrated in two case studies, including a human-machine experiment and a simulated logistics domain. We formulate basic ways to incorporate evidence (i.e. observed actions) into posterior beliefs and analyse the conditions under which the posterior beliefs converge to the true distribution of types. Furthermore, we study the impact of prior beliefs over types (that is, before any actions are observed) on the long-term performance of HBA, and show empirically that automatic methods can compute prior beliefs with consistent performance effects. For hypothesised (i.e. “guessed”) type spaces, we analyse the relations between hypothesised and true type spaces under which HBA is still guaranteed to solve its task, despite inaccuracies in hypothesised types. Finally, we show how HBA can perform an automatic statistical analysis to decide whether to reject its behavioural hypothesis, i.e. the combination of posterior beliefs and types

Edinburgh Research Archive

Strategic analysis of complex security scenarios.

Author
Publication venue: 'Office of Scientific and Technical Information (OSTI)'
Publication date
Field of study

Crossref

Reinforcement Learning

Author
Publication venue: 'IntechOpen'
Publication date: 20/04/2021
Field of study

Brains rule the world, and brain-like computation is increasingly used in computers and electronic devices. Brain-like computation is about processing and interpreting data or directly putting forward and performing actions. Learning is a very important aspect. This book is on reinforcement learning which involves performing actions to achieve a goal. The first 11 chapters of this book describe and extend the scope of reinforcement learning. The remaining 11 chapters show that there is already wide usage in numerous fields. Reinforcement learning can tackle control tasks that are too complex for traditional, hand-designed, non-learning controllers. As learning computers can deal with technical complexities, the tasks of human operators remain to specify goals on increasingly higher levels. This book shows that reinforcement learning is a very dynamic area in terms of theory and applications and it shall stimulate and encourage new research in this field

Directory of Open Access Books (DOAB)

Multiobjective in-core fuel management optimisation for nuclear research reactors

Author: Schlunz Evert Barend
Publication venue: Stellenbosch : Stellenbosch University
Publication date: 01/12/2016
Field of study

Thesis (PhD)--Stellenbosch University, 2016.ENGLISH SUMMARY : The efficiency and effectiveness of fuel usage in a typical nuclear reactor is influenced by the specific arrangement of available fuel assemblies in the reactor core positions. This arrangement of assemblies is referred to as a fuel reload configuration and usually has to be determined anew for each operational cycle of a reactor. Very often, multiple objectives are pursued simultaneously when designing a reload configuration, especially in the context of nuclear research reactors. In the multiobjective in-core fuel management optimization (MICFMO) problem, the aim is to identify a Pareto optimal set of compromise or trade-off reload configurations. Such a set may then be presented to a decision maker (i.e. a nuclear reactor operator) for consideration so as to select a preferred configuration. In the first part of this dissertation, a secularization-based methodology for MICFMO is pro- posed in order to address several shortcomings associated with the popular weighting method often employed in the literature for solving the MICFMO problem. The proposed methodology has been implemented in a reactor simulation code, called the OSCAR-4 system. In order to demonstrate its practical applicability, the methodology is applied to solve several MICFMO problem instances in the context of two research reactors. In the second part of the dissertation, an extensive investigation is conducted into the suitability of several multiobjective optimization algorithms for solving the constrained MICFMO problem. The computation time required to perform the investigation is reduced through the usage of several artificial neural networks constructed in the dissertation for objective and constraint function evaluations. Eight multiobjective metaheuristics are compared in the context of a test suite of several MICFMO problem instances, based on the SAFARI-1 research reactor in South Africa. The investigation reveals that the NSGA-II, the P-ACO algorithm and the MOOCEM are generally the best-performing metaheuristics across the problem instances in the test suite, while the MOVNS algorithm also performs well in the context of bi-objective problem instances. As part of this investigation, a multiplicative penalty function (MPF) constraint handling technique is also proposed and compared to an existing constraint handling technique, called constrained-domination. The comparison reveals that the MPF technique is a competitive alternative to constrained-domination. In an attempt to raise the level of generality at which MICFMO may be performed and potentially improve the quality of optimization results, a multiobjective hyperheuristic, called the AMALGAM method, is also considered in this dissertation. This hyperheuristic incorporates multiple metaheuristic sub-algorithms simultaneously for optimization. Testing reveals that the AMALGAM method yields superior results in the majority of problem instances in the test suite, thus achieving the dual goal of raising the level of generality and of yielding improved optimization results. The method has also been implemented in the OSCAR-4 system and is applied to solve several MICFMO case study problem instances, based on two research reactors, in order to demonstrate its practical applicability. Finally, in the third part of this dissertation, a conceptual framework is proposed for an optimization-based personal decision support system, dedicated to MICFM. This framework may serve as the basis for developing a computerized tool to aid nuclear reactor operators in designing suitable reload configurations.AFRIKAANSE OPSOMMING : Die doeltreffendheid en doelmatigheid van brandstofverbruik in 'n tipiese kernreaktor word deur die spesieke rangskikking van beskikbare brandstofelemente in die laaiposisies van die reaktor beinvloed. Hierdie rangskikking staan bekend as 'n brandstof herlaaikongurasie en word gewoonlik opnuut bepaal vir elke operasionele siklus van 'n reaktor. Die gelyktydige optimering van veelvuldige doele word dikwels tydens die ontwerp van 'n herlaaikongurasie nagestreef, veral binne die konteks van navorsingsreaktore. Die doelwit van meerdoelige binne-kern brandstofbeheeroptimering (MBKBBO) is om 'n Pareto optimale versameling van herlaaikongurasieafruilings te identiseer. So 'n versameling mag dan vir oorweging (deur byvoorbeeld 'n kernreaktoroperateur) voorgele word sodat 'n voorkeurkongurasie gekies kan word. In die eerste gedeelte van hierdie proefskrif word 'n skalariseringsgebaseerde metodologie vir MBKBBO voorgestel om verskeie tekortkominge in die gewilde gewigverswaringsmetode aan te spreek. Laasgenoemde metode word gereeld in die literatuur gebruik om die MBKBBO probleem op te los. Die voorgestelde metodologie is in 'n reaktorsimulasiestelsel, bekend as die OSCAR-4 stelsel, geimplementeer. Om die praktiese toepasbaarheid daarvan te demonstreer, word die metodologie gebruik om 'n aantal MBKBBO probleemgevalle binne die konteks van twee navorsingsreaktore op te los. In die tweede gedeelte van die proefskrif word 'n uitgebreide ondersoek ingestel om die geskiktheid van verskeie meerdoelige optimeringsalgoritmes vir die oplos van die beperkte MBKBBO probleem te bepaal. Die berekeningstyd wat vir die ondersoek benodig word, word verminder deur die gebruik van kunsmatige neurale netwerke, wat in die proefskrif gekonstrueer word, om doelfunksies en beperkings te evalueer. Agt meerdoelige metaheuristieke word binne die konteks van verskeie MBKBBO toetsprobleemgevalle vergelyk wat op die SAFARI-1 navorsingsreaktor in Suid-Afrika gebaseer is. Toetse dui daarop dat die NSGA-II, die P-ACO algoritme en die MOOCEM oor die algemeen die beste oor al die toetsprobleemgevalle presteer. Die MOVNS algoritme presteer ook goed in die konteks van tweedoelige probleemgevalle. 'n Vermenigvuldigende boetefunksie (VBF) beperkinghanteringstegniek word ook voorgestel en vergelyk met 'n bestaande tegniek bekend as beperkte dominasie. Daar word bevind dat the VBF tegniek 'n mededingende alternatief tot beperkte dominasie is. 'n Poging word aangewend om die vlak van algemeenheid waarmee MBKBBO uitgevoer word, te verhoog, asook om potensieel die kwaliteit van die optimeringsresultate te verbeter. 'n Meerdoelige hiperheuristiek, bekend as die AMALGAM metode, word in die nastreef van hierdie twee doelwitte oorweeg. Die metode funksioneer deur middel van die gelyktydige insluiting van 'n aantal metaheuristieke deel-algoritmes. Toetse dui daarop dat the AMALGAM metode beter resultate vir die meerderheid van toetsprobleme lewer, en dus word die bogenoemde twee doelwitte bereik. Die metode is ook in the OSCAR-4 stelsel ge mplementeer en word gebruik om 'n aantal MBKBBO gevallestudie probleemgevalle (binne die konteks van twee navorsingsreaktore) op te los. Sodoende word die praktiese toepasbaarheid van die metode gedemonstreer. In die derde deel van die proefskrif word 'n konseptuele raamwerk laastens vir 'n optimeringsgebaseerde persoonlike besluitsteunstelsel gemik op MBKBB, voorgestel. Hierdie raamwerk mag as grondslag dien vir die ontwikkeling van 'n gerekenariseerde hulpmiddel vir kernreaktoroperateurs om aanvaarbare herlaaikongurasies te ontwerp.Doctora

Stellenbosch University SUNScholar Repository