5,811 research outputs found

    A Learning Automata Based Solution to Service Selection in Stochastic Environments

    Get PDF
    With the abundance of services available in today’s world, identifying those of high quality is becoming increasingly difficult. Reputation systems can offer generic recommendations by aggregating user provided opinions about service quality, however, are prone to ballot stuffing and badmouthing . In general, unfair ratings may degrade the trustworthiness of reputation systems, and changes in service quality over time render previous ratings unreliable. In this paper, we provide a novel solution to the above problems based on Learning Automata (LA), which can learn the optimal action when operating in unknown stochastic environments. Furthermore, they combine rapid and accurate convergence with low computational complexity. In additional to its computational simplicity, unlike most reported approaches, our scheme does not require prior knowledge of the degree of any of the above mentioned problems with reputation systems. Instead, it gradually learns which users provide fair ratings, and which users provide unfair ratings, even when users unintentionally make mistakes. Comprehensive empirical results show that our LA based scheme efficiently handles any degree of unfair ratings (as long as ratings are binary). Furthermore, if the quality of services and/or the trustworthiness of users change, our scheme is able to robustly track such changes over time. Finally, the scheme is ideal for decentralized processing. Accordingly, we believe that our LA based scheme forms a promising basis for improving the performance of reputation systems in general

    Achieving Fair Load Balancing by Invoking a Learning Automata-based Two Time Scale Separation Paradigm

    Get PDF
    Author's accepted manuscript.© 2020 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.In this article, we consider the problem of load balancing (LB), but, unlike the approaches that have been proposed earlier, we attempt to resolve the problem in a fair manner (or rather, it would probably be more appropriate to describe it as an ε-fair manner because, although the LB can, probably, never be totally fair, we achieve this by being ``as close to fair as possible''). The solution that we propose invokes a novel stochastic learning automaton (LA) scheme, so as to attain a distribution of the load to a number of nodes, where the performance level at the different nodes is approximately equal and each user experiences approximately the same Quality of the Service (QoS) irrespective of which node that he/she is connected to. Since the load is dynamically varying, static resource allocation schemes are doomed to underperform. This is further relevant in cloud environments, where we need dynamic approaches because the available resources are unpredictable (or rather, uncertain) by virtue of the shared nature of the resource pool. Furthermore, we prove here that there is a coupling involving LA's probabilities and the dynamics of the rewards themselves, which renders the environments to be nonstationary. This leads to the emergence of the so-called property of ``stochastic diminishing rewards.'' Our newly proposed novel LA algorithm ε-optimally solves the problem, and this is done by resorting to a two-time-scale-based stochastic learning paradigm. As far as we know, the results presented here are of a pioneering sort, and we are unaware of any comparable results.acceptedVersio

    Towards Thompson Sampling for Complex Bayesian Reasoning

    Get PDF
    Paper III, IV, and VI are not available as a part of the dissertation due to the copyright.Thompson Sampling (TS) is a state-of-art algorithm for bandit problems set in a Bayesian framework. Both the theoretical foundation and the empirical efficiency of TS is wellexplored for plain bandit problems. However, the Bayesian underpinning of TS means that TS could potentially be applied to other, more complex, problems as well, beyond the bandit problem, if suitable Bayesian structures can be found. The objective of this thesis is the development and analysis of TS-based schemes for more complex optimization problems, founded on Bayesian reasoning. We address several complex optimization problems where the previous state-of-art relies on a relatively myopic perspective on the problem. These includes stochastic searching on the line, the Goore game, the knapsack problem, travel time estimation, and equipartitioning. Instead of employing Bayesian reasoning to obtain a solution, they rely on carefully engineered rules. In all brevity, we recast each of these optimization problems in a Bayesian framework, introducing dedicated TS based solution schemes. For all of the addressed problems, the results show that besides being more effective, the TS based approaches we introduce are also capable of solving more adverse versions of the problems, such as dealing with stochastic liars.publishedVersio

    On incorporating the paradigms of discretization and Bayesian estimation to create a new family of pursuit learning automata

    Get PDF
    There are currently two fundamental paradigms that have been used to enhance the convergence speed of Learning Automata (LA). The first involves the concept of utilizing the estimates of the reward probabilities, while the second involves discretizing the probability space in which the LA operates. This paper demonstrates how both of these can be simultaneously utilized, and in particular, by using the family of Bayesian estimates that have been proven to have distinct advantages over their maximum likelihood counterparts. The success of LA-based estimator algorithms over the classical, Linear Reward-Inaction (LRI)-like schemes, can be explained by their ability to pursue the actions with the highest reward probability estimates. Without access to reward probability estimates, it makes sense for schemes like the LRI to first make large exploring steps, and then to gradually turn exploration into exploitation by making progressively smaller learning steps. However, this behavior becomes counter-intuitive when pursuing actions based on their estimated reward probabilities. Learning should then ideally proceed in progressively larger steps, as the reward probability estimates turn more accurate. This paper introduces a new estimator algorithm, the Discretized Bayesian Pursuit Algorithm (DBPA), that achieves this by incorporating both the above paradigms. The DBPA is implemented by linearly discretizing the action probability space of the Bayesian Pursuit Algorithm (BPA) (Zhang et al. in IEA-AIE 2011, Springer, New York, pp. 608-620, 2011). The key innovation of this paper is that the linear discrete updating rules mitigate the counter-intuitive behavior of the corresponding linear continuous updating rules, by augmenting them with the reward probability estimates. Extensive experimental results show the superiority of DBPA over previous estimator algorithms. Indeed, the DBPA is probably the fastest reported LA to date. Apart from the rigorous experimental demonstration of the strength of the DBPA, the paper also briefly records the proofs of why the BPA and the DBPA are ε{lunate}-optimal in stationary environments
    • …