24 research outputs found

    On incorporating the paradigms of discretization and Bayesian estimation to create a new family of pursuit learning automata

    Get PDF
    There are currently two fundamental paradigms that have been used to enhance the convergence speed of Learning Automata (LA). The first involves the concept of utilizing the estimates of the reward probabilities, while the second involves discretizing the probability space in which the LA operates. This paper demonstrates how both of these can be simultaneously utilized, and in particular, by using the family of Bayesian estimates that have been proven to have distinct advantages over their maximum likelihood counterparts. The success of LA-based estimator algorithms over the classical, Linear Reward-Inaction (LRI)-like schemes, can be explained by their ability to pursue the actions with the highest reward probability estimates. Without access to reward probability estimates, it makes sense for schemes like the LRI to first make large exploring steps, and then to gradually turn exploration into exploitation by making progressively smaller learning steps. However, this behavior becomes counter-intuitive when pursuing actions based on their estimated reward probabilities. Learning should then ideally proceed in progressively larger steps, as the reward probability estimates turn more accurate. This paper introduces a new estimator algorithm, the Discretized Bayesian Pursuit Algorithm (DBPA), that achieves this by incorporating both the above paradigms. The DBPA is implemented by linearly discretizing the action probability space of the Bayesian Pursuit Algorithm (BPA) (Zhang et al. in IEA-AIE 2011, Springer, New York, pp. 608-620, 2011). The key innovation of this paper is that the linear discrete updating rules mitigate the counter-intuitive behavior of the corresponding linear continuous updating rules, by augmenting them with the reward probability estimates. Extensive experimental results show the superiority of DBPA over previous estimator algorithms. Indeed, the DBPA is probably the fastest reported LA to date. Apart from the rigorous experimental demonstration of the strength of the DBPA, the paper also briefly records the proofs of why the BPA and the DBPA are ε{lunate}-optimal in stationary environments

    A Stochastic Search on the Line-Based Solution to Discretized Estimation

    Get PDF
    Recently, Oommen and Rueda [11] presented a strategy by which the parameters of a binomial/multinomial distribution can be estimated when the underlying distribution is nonstationary. The method has been referred to as the Stochastic Learning Weak Estimator (SLWE), and is based on the principles of continuous stochastic Learning Automata (LA). In this paper, we consider a new family of stochastic discretized weak estimators pertinent to tracking time-varying binomial distributions. As opposed to the SLWE, our proposed estimator is discretized , i.e., the estimate can assume only a finite number of values. It is well known in the field of LA that discretized schemes achieve faster convergence speed than their corresponding continuous counterparts. By virtue of discretization, our estimator realizes extremely fast adjustments of the running estimates by jumps, and it is thus able to robustly, and very quickly, track changes in the parameters of the distribution after a switch has occurred in the environment. The design principle of our strategy is based on a solution, pioneered by Oommen [7], for the Stochastic Search on the Line (SSL) problem. The SSL solution proposed in [7], assumes the existence of an Oracle which informs the LA whether to go “right” or “left”. In our application domain, in order to achieve efficient estimation, we have to first infer (or rather simulate ) such an Oracle. In order to overcome this difficulty, we rather intelligently construct an “Artificial Oracle” that suggests whether we are to increase the current estimate or to decrease it. The paper briefly reports conclusive experimental results that demonstrate the ability of the proposed estimator to cope with non-stationary environments with a high adaptation rate, and with an accuracy that depends on its resolution. The results which we present are, to the best of our knowledge, the first reported results that resolve the problem of discretized weak estimation using a SSL-based solution

    Generalized pursuit learning schemes: new families of continuous and discretized learning automata

    Full text link

    On Novel Variants of the Hierarchical Stochastic Searching on the Line

    Get PDF
    Master's thesis Mechatronics MAS500 - University of Agder, 2012Konfidensiell til / confidential until 01.07.201

    Solving Two-Person Zero-Sum Stochastic Games With Incomplete Information Using Learning Automata With Artificial Barriers

    Get PDF
    Learning automata (LA) with artificially absorbing barriers was a completely new horizon of research in the 1980s (Oommen, 1986). These new machines yielded properties that were previously unknown. More recently, absorbing barriers have been introduced in continuous estimator algorithms so that the proofs could follow a martingale property, as opposed to monotonicity (Zhang et al., 2014), (Zhang et al., 2015). However, the applications of LA with artificial barriers are almost nonexistent. In that regard, this article is pioneering in that it provides effective and accurate solutions to an extremely complex application domain, namely that of solving two-person zero-sum stochastic games that are provided with incomplete information. LA have been previously used (Sastry et al., 1994) to design algorithms capable of converging to the game's Nash equilibrium under limited information. Those algorithms have focused on the case where the saddle point of the game exists in a pure strategy. However, the majority of the LA algorithms used for games are absorbing in the probability simplex space, and thus, they converge to an exclusive choice of a single action. These LA are thus unable to converge to other mixed Nash equilibria when the game possesses no saddle point for a pure strategy. The pioneering contribution of this article is that we propose an LA solution that is able to converge to an optimal mixed Nash equilibrium even though there may be no saddle point when a pure strategy is invoked. The scheme, being of the linear reward-inaction ( LRIL_{R-I} ) paradigm, is in and of itself, absorbing. However, by incorporating artificial barriers, we prevent it from being ``stuck'' or getting absorbed in pure strategies. Unlike the linear reward-εpenalty ( LRεPL_{R-ε P} ) scheme proposed by Lakshmivarahan and Narendra almost four decades ago, our new scheme achieves the same goal with much less parameter tuning and in a more elegant manner. This article includes the nontrial proofs of the theoretical results characterizing our scheme and also contains experimental verification that confirms our theoretical findings.acceptedVersio

    Achieving Fair Load Balancing by Invoking a Learning Automata-based Two Time Scale Separation Paradigm

    Get PDF
    Author's accepted manuscript.© 2020 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.In this article, we consider the problem of load balancing (LB), but, unlike the approaches that have been proposed earlier, we attempt to resolve the problem in a fair manner (or rather, it would probably be more appropriate to describe it as an ε-fair manner because, although the LB can, probably, never be totally fair, we achieve this by being ``as close to fair as possible''). The solution that we propose invokes a novel stochastic learning automaton (LA) scheme, so as to attain a distribution of the load to a number of nodes, where the performance level at the different nodes is approximately equal and each user experiences approximately the same Quality of the Service (QoS) irrespective of which node that he/she is connected to. Since the load is dynamically varying, static resource allocation schemes are doomed to underperform. This is further relevant in cloud environments, where we need dynamic approaches because the available resources are unpredictable (or rather, uncertain) by virtue of the shared nature of the resource pool. Furthermore, we prove here that there is a coupling involving LA's probabilities and the dynamics of the rewards themselves, which renders the environments to be nonstationary. This leads to the emergence of the so-called property of ``stochastic diminishing rewards.'' Our newly proposed novel LA algorithm ε-optimally solves the problem, and this is done by resorting to a two-time-scale-based stochastic learning paradigm. As far as we know, the results presented here are of a pioneering sort, and we are unaware of any comparable results.acceptedVersio

    A novel strategy for solving the stochastic point location problem using a hierarchical searching scheme

    Get PDF
    -Stochastic point location (SPL) deals with the problem of a learning mechanism (LM) determining the optimal point on the line when the only input it receives are stochastic signals about the direction in which it should move. One can differentiate the SPL from the traditional class of optimization problems by the fact that the former considers the case where the directional information, for example, as inferred from an Oracle (which possibly computes the derivatives), suffices to achieve the optimization—without actually explicitly computing any derivatives. The SPL can be described in terms of a LM (algorithm) attempting to locate a point on a line. The LM interacts with a random environment which essentially informs it, possibly erroneously, if the unknown parameter is on the left or the right of a given point. Given a current estimate of the optimal solution, all the reported solutions to this problem effectively move along the line to yield updated estimates which are in the neighborhood of the current solution.1 This paper proposes a dramatically distinct strategy, namely, that of partitioning the line in a hierarchical tree-like manner, and of moving to relatively distant points, as characterized by those along the path of the tree. We are thus attempting to merge the rich fields of stochastic optimization and data structures. Indeed, as in the original discretized solution to the SPL, in one sense, our solution utilizes the concept of discretization and operates a uni-dimensional controlled random walk (RW) in the discretized space, to locate the unknown parameter. However, by moving to nonneighbor points in the space, our newly proposed hierarchical stochastic searching on the line (HSSL) solution performs such a controlled RW on the discretized space structured on a superimposed binary tree. We demonstrate that the HSSL solution is orders of magnitude faster than the original SPL solution proposed by - ommen. By a rigorous analysis, the HSSL is shown to be optimal if the effectiveness (or credibility) of the environment, given by pp , is greater than the golden ratio conjugate. The solution has been both analytically solved and simulated, and the results obtained are extremely fascinating, as this is the first reported use of time reversibility in the analysis of stochastic learning. The learning automata extensions of the scheme are currently being investigated

    Reinforcement Learning: A Survey

    Full text link
    This paper surveys the field of reinforcement learning from a computer-science perspective. It is written to be accessible to researchers familiar with machine learning. Both the historical basis of the field and a broad selection of current work are summarized. Reinforcement learning is the problem faced by an agent that learns behavior through trial-and-error interactions with a dynamic environment. The work described here has a resemblance to work in psychology, but differs considerably in the details and in the use of the word ``reinforcement.'' The paper discusses central issues of reinforcement learning, including trading off exploration and exploitation, establishing the foundations of the field via Markov decision theory, learning from delayed reinforcement, constructing empirical models to accelerate learning, making use of generalization and hierarchy, and coping with hidden state. It concludes with a survey of some implemented systems and an assessment of the practical utility of current methods for reinforcement learning.Comment: See http://www.jair.org/ for any accompanying file

    The design of absorbing Bayesian pursuit algorithms and the formal analyses of their ε-optimality

    Get PDF
    The fundamental phenomenon that has been used to enhance the convergence speed of learning automata (LA) is that of incorporating the running maximum likelihood (ML) estimates of the action reward probabilities into the probability updating rules for selecting the actions. The frontiers of this field have been recently expanded by replacing the ML estimates with their corresponding Bayesian counterparts that incorporate the properties of the conjugate priors. These constitute the Bayesian pursuit algorithm (BPA), and the discretized Bayesian pursuit algorithm. Although these algorithms have been designed and efficiently implemented, and are, arguably, the fastest and most accurate LA reported in the literature, the proofs of their ϵϵ-optimal convergence has been unsolved. This is precisely the intent of this paper. In this paper, we present a single unifying analysis by which the proofs of both the continuous and discretized schemes are proven. We emphasize that unlike the ML-based pursuit schemes, the Bayesian schemes have to not only consider the estimates themselves but also the distributional forms of their conjugate posteriors and their higher order moments—all of which render the proofs to be particularly challenging. As far as we know, apart from the results themselves, the methodologies of this proof have been unreported in the literature—they are both pioneering and novel
    corecore