4 research outputs found

    On incorporating the paradigms of discretization and Bayesian estimation to create a new family of pursuit learning automata

    Get PDF
    There are currently two fundamental paradigms that have been used to enhance the convergence speed of Learning Automata (LA). The first involves the concept of utilizing the estimates of the reward probabilities, while the second involves discretizing the probability space in which the LA operates. This paper demonstrates how both of these can be simultaneously utilized, and in particular, by using the family of Bayesian estimates that have been proven to have distinct advantages over their maximum likelihood counterparts. The success of LA-based estimator algorithms over the classical, Linear Reward-Inaction (LRI)-like schemes, can be explained by their ability to pursue the actions with the highest reward probability estimates. Without access to reward probability estimates, it makes sense for schemes like the LRI to first make large exploring steps, and then to gradually turn exploration into exploitation by making progressively smaller learning steps. However, this behavior becomes counter-intuitive when pursuing actions based on their estimated reward probabilities. Learning should then ideally proceed in progressively larger steps, as the reward probability estimates turn more accurate. This paper introduces a new estimator algorithm, the Discretized Bayesian Pursuit Algorithm (DBPA), that achieves this by incorporating both the above paradigms. The DBPA is implemented by linearly discretizing the action probability space of the Bayesian Pursuit Algorithm (BPA) (Zhang et al. in IEA-AIE 2011, Springer, New York, pp. 608-620, 2011). The key innovation of this paper is that the linear discrete updating rules mitigate the counter-intuitive behavior of the corresponding linear continuous updating rules, by augmenting them with the reward probability estimates. Extensive experimental results show the superiority of DBPA over previous estimator algorithms. Indeed, the DBPA is probably the fastest reported LA to date. Apart from the rigorous experimental demonstration of the strength of the DBPA, the paper also briefly records the proofs of why the BPA and the DBPA are ε{lunate}-optimal in stationary environments

    A novel learning automata game with local feedback for parallel optimization of hydropower production

    Get PDF
    Master's thesis Information- and communication technology IKT590 - University of Agder 2017Hydropower optimization for multi-reservoir systems is classi ed as a combinatorial optimization problem with large state-space that is particularly di cult to solve. There exist no golden standard when solving such problems, and many proposed algorithms are domain speci c. The literature describes several di erent techniques where linear programming approaches are extensively discussed, but tends to succumb to the curse of dimensionality problem when the state vector dimensions increase. This thesis introduces LA LCS, a novel learning automata algorithm that utilizes a parallel form of local feedback. This enables each individual automaton to receive direct feedback, resulting in faster convergence. In addition, the algorithm is implemented using a parallel architecture on a CUDA enabled GPU, along with exhaustive and random search. LA LCS has been veri ed through several scenarios. Experiments show that the algorithm is able to quickly adapt and nd optimal production strategies for problems of variable complexity. The algorithm is empirically veri ed and shown to hold great promise for solving optimization problems, including hydropower production strategies

    The design of absorbing Bayesian pursuit algorithms and the formal analyses of their ε-optimality

    Get PDF
    The fundamental phenomenon that has been used to enhance the convergence speed of learning automata (LA) is that of incorporating the running maximum likelihood (ML) estimates of the action reward probabilities into the probability updating rules for selecting the actions. The frontiers of this field have been recently expanded by replacing the ML estimates with their corresponding Bayesian counterparts that incorporate the properties of the conjugate priors. These constitute the Bayesian pursuit algorithm (BPA), and the discretized Bayesian pursuit algorithm. Although these algorithms have been designed and efficiently implemented, and are, arguably, the fastest and most accurate LA reported in the literature, the proofs of their ϵϵ-optimal convergence has been unsolved. This is precisely the intent of this paper. In this paper, we present a single unifying analysis by which the proofs of both the continuous and discretized schemes are proven. We emphasize that unlike the ML-based pursuit schemes, the Bayesian schemes have to not only consider the estimates themselves but also the distributional forms of their conjugate posteriors and their higher order moments—all of which render the proofs to be particularly challenging. As far as we know, apart from the results themselves, the methodologies of this proof have been unreported in the literature—they are both pioneering and novel

    Using stochastic AI techniques to achieve unbounded resolution in finite player Goore Games and its applications

    No full text
    The Goore Game (GG) introduced by M. L. Tsetlin in 1973 has the fascinating property that it can be resolved in a completely distributed manner with no intercommunication between the players. The game has recently found applications in many domains, including the field of sensor networks and Quality-of-Service (QoS) routing. In actual implementations of the solution, the players are typically replaced by Learning Automata (LA). The problem with the existing reported approaches is that the accuracy of the solution achieved is intricately related to the number of players participating in the game - which, in turn, determines the resolution. In other words, an arbitrary accuracy can be obtained only if the game has an infinite number of players. In this paper, we show how we can attain an unbounded accuracy for the GG by utilizing no more than three stochastic learning machines, and by recursively pruning the solution space to guarantee that the retained domain contains the solution to the game with a probability as close to unity as desired. The paper also conjectures on how the solution can be applied to some of the application domains
    corecore