516 research outputs found
Adaptive Channel Recommendation For Opportunistic Spectrum Access
We propose a dynamic spectrum access scheme where secondary users recommend
"good" channels to each other and access accordingly. We formulate the problem
as an average reward based Markov decision process. We show the existence of
the optimal stationary spectrum access policy, and explore its structure
properties in two asymptotic cases. Since the action space of the Markov
decision process is continuous, it is difficult to find the optimal policy by
simply discretizing the action space and use the policy iteration, value
iteration, or Q-learning methods. Instead, we propose a new algorithm based on
the Model Reference Adaptive Search method, and prove its convergence to the
optimal policy. Numerical results show that the proposed algorithms achieve up
to 18% and 100% performance improvement than the static channel recommendation
scheme in homogeneous and heterogeneous channel environments, respectively, and
is more robust to channel dynamics
Equilibria, Fixed Points, and Complexity Classes
Many models from a variety of areas involve the computation of an equilibrium
or fixed point of some kind. Examples include Nash equilibria in games; market
equilibria; computing optimal strategies and the values of competitive games
(stochastic and other games); stable configurations of neural networks;
analysing basic stochastic models for evolution like branching processes and
for language like stochastic context-free grammars; and models that incorporate
the basic primitives of probability and recursion like recursive Markov chains.
It is not known whether these problems can be solved in polynomial time. There
are certain common computational principles underlying different types of
equilibria, which are captured by the complexity classes PLS, PPAD, and FIXP.
Representative complete problems for these classes are respectively, pure Nash
equilibria in games where they are guaranteed to exist, (mixed) Nash equilibria
in 2-player normal form games, and (mixed) Nash equilibria in normal form games
with 3 (or more) players. This paper reviews the underlying computational
principles and the corresponding classes
Pandora's Box Problem with Order Constraints
The Pandora's Box Problem, originally formalized by Weitzman in 1979, models
selection from set of random, alternative options, when evaluation is costly.
This includes, for example, the problem of hiring a skilled worker, where only
one hire can be made, but the evaluation of each candidate is an expensive
procedure. Weitzman showed that the Pandora's Box Problem admits an elegant,
simple solution, where the options are considered in decreasing order of
reservation value,i.e., the value that reduces to zero the expected marginal
gain for opening the box. We study for the first time this problem when order -
or precedence - constraints are imposed between the boxes. We show that,
despite the difficulty of defining reservation values for the boxes which take
into account both in-depth and in-breath exploration of the various options,
greedy optimal strategies exist and can be efficiently computed for tree-like
order constraints. We also prove that finding approximately optimal adaptive
search strategies is NP-hard when certain matroid constraints are used to
further restrict the set of boxes which may be opened, or when the order
constraints are given as reachability constraints on a DAG. We complement the
above result by giving approximate adaptive search strategies based on a
connection between optimal adaptive strategies and non-adaptive strategies with
bounded adaptivity gap for a carefully relaxed version of the problem
Evolutionary Multiagent Transfer Learning With Model-Based Opponent Behavior Prediction
This article embarks a study on multiagent transfer learning (TL) for addressing the specific challenges that arise in complex multiagent systems where agents have different or even competing objectives. Specifically, beyond the essential backbone of a state-of-the-art evolutionary TL framework (eTL), this article presents the novel TL framework with prediction (eTL-P) as an upgrade over existing eTL to endow agents with abilities to interact with their opponents effectively by building candidate models and accordingly predicting their behavioral strategies. To reduce the complexity of candidate models, eTL-P constructs a monotone submodular function, which facilitates to select Top-K models from all available candidate models based on their representativeness in terms of behavioral coverage as well as reward diversity. eTL-P also integrates social selection mechanisms for agents to identify their better-performing partners, thus improving their learning performance and reducing the complexity of behavior prediction by reusing useful knowledge with respect to their partners' mind universes. Experiments based on a partner-opponent minefield navigation task (PO-MNT) have shown that eTL-P exhibits the superiority in achieving higher learning capability and efficiency of multiple agents when compared to the state-of-the-art multiagent TL approaches
- …