516 research outputs found

    Adaptive Channel Recommendation For Opportunistic Spectrum Access

    Full text link
    We propose a dynamic spectrum access scheme where secondary users recommend "good" channels to each other and access accordingly. We formulate the problem as an average reward based Markov decision process. We show the existence of the optimal stationary spectrum access policy, and explore its structure properties in two asymptotic cases. Since the action space of the Markov decision process is continuous, it is difficult to find the optimal policy by simply discretizing the action space and use the policy iteration, value iteration, or Q-learning methods. Instead, we propose a new algorithm based on the Model Reference Adaptive Search method, and prove its convergence to the optimal policy. Numerical results show that the proposed algorithms achieve up to 18% and 100% performance improvement than the static channel recommendation scheme in homogeneous and heterogeneous channel environments, respectively, and is more robust to channel dynamics

    Equilibria, Fixed Points, and Complexity Classes

    Get PDF
    Many models from a variety of areas involve the computation of an equilibrium or fixed point of some kind. Examples include Nash equilibria in games; market equilibria; computing optimal strategies and the values of competitive games (stochastic and other games); stable configurations of neural networks; analysing basic stochastic models for evolution like branching processes and for language like stochastic context-free grammars; and models that incorporate the basic primitives of probability and recursion like recursive Markov chains. It is not known whether these problems can be solved in polynomial time. There are certain common computational principles underlying different types of equilibria, which are captured by the complexity classes PLS, PPAD, and FIXP. Representative complete problems for these classes are respectively, pure Nash equilibria in games where they are guaranteed to exist, (mixed) Nash equilibria in 2-player normal form games, and (mixed) Nash equilibria in normal form games with 3 (or more) players. This paper reviews the underlying computational principles and the corresponding classes

    Pandora's Box Problem with Order Constraints

    Get PDF
    The Pandora's Box Problem, originally formalized by Weitzman in 1979, models selection from set of random, alternative options, when evaluation is costly. This includes, for example, the problem of hiring a skilled worker, where only one hire can be made, but the evaluation of each candidate is an expensive procedure. Weitzman showed that the Pandora's Box Problem admits an elegant, simple solution, where the options are considered in decreasing order of reservation value,i.e., the value that reduces to zero the expected marginal gain for opening the box. We study for the first time this problem when order - or precedence - constraints are imposed between the boxes. We show that, despite the difficulty of defining reservation values for the boxes which take into account both in-depth and in-breath exploration of the various options, greedy optimal strategies exist and can be efficiently computed for tree-like order constraints. We also prove that finding approximately optimal adaptive search strategies is NP-hard when certain matroid constraints are used to further restrict the set of boxes which may be opened, or when the order constraints are given as reachability constraints on a DAG. We complement the above result by giving approximate adaptive search strategies based on a connection between optimal adaptive strategies and non-adaptive strategies with bounded adaptivity gap for a carefully relaxed version of the problem

    Evolutionary Multiagent Transfer Learning With Model-Based Opponent Behavior Prediction

    Get PDF
    This article embarks a study on multiagent transfer learning (TL) for addressing the specific challenges that arise in complex multiagent systems where agents have different or even competing objectives. Specifically, beyond the essential backbone of a state-of-the-art evolutionary TL framework (eTL), this article presents the novel TL framework with prediction (eTL-P) as an upgrade over existing eTL to endow agents with abilities to interact with their opponents effectively by building candidate models and accordingly predicting their behavioral strategies. To reduce the complexity of candidate models, eTL-P constructs a monotone submodular function, which facilitates to select Top-K models from all available candidate models based on their representativeness in terms of behavioral coverage as well as reward diversity. eTL-P also integrates social selection mechanisms for agents to identify their better-performing partners, thus improving their learning performance and reducing the complexity of behavior prediction by reusing useful knowledge with respect to their partners' mind universes. Experiments based on a partner-opponent minefield navigation task (PO-MNT) have shown that eTL-P exhibits the superiority in achieving higher learning capability and efficiency of multiple agents when compared to the state-of-the-art multiagent TL approaches
    • …
    corecore