18,028 research outputs found

    Deterministic Sequencing of Exploration and Exploitation for Multi-Armed Bandit Problems

    Full text link
    In the Multi-Armed Bandit (MAB) problem, there is a given set of arms with unknown reward models. At each time, a player selects one arm to play, aiming to maximize the total expected reward over a horizon of length T. An approach based on a Deterministic Sequencing of Exploration and Exploitation (DSEE) is developed for constructing sequential arm selection policies. It is shown that for all light-tailed reward distributions, DSEE achieves the optimal logarithmic order of the regret, where regret is defined as the total expected reward loss against the ideal case with known reward models. For heavy-tailed reward distributions, DSEE achieves O(T^1/p) regret when the moments of the reward distributions exist up to the pth order for 1<p<=2 and O(T^1/(1+p/2)) for p>2. With the knowledge of an upperbound on a finite moment of the heavy-tailed reward distributions, DSEE offers the optimal logarithmic regret order. The proposed DSEE approach complements existing work on MAB by providing corresponding results for general reward distributions. Furthermore, with a clearly defined tunable parameter-the cardinality of the exploration sequence, the DSEE approach is easily extendable to variations of MAB, including MAB with various objectives, decentralized MAB with multiple players and incomplete reward observations under collisions, MAB with unknown Markov dynamics, and combinatorial MAB with dependent arms that often arise in network optimization problems such as the shortest path, the minimum spanning, and the dominating set problems under unknown random weights.Comment: 22 pages, 2 figure

    Power Allocation in Multiuser Parallel Gaussian Broadcast Channels With Common and Confidential Messages

    Get PDF
    We consider a broadcast communication over parallel channels, where the transmitter sends K+1 messages: one common message to all users, and K confidential messages to each user, which need to be kept secret from all unintended users. We assume partial channel state information at the transmitter, stemming from noisy channel estimation. Our main goal is to design a power allocation algorithm in order to maximize the weighted sum rate of common and confidential messages under a total power constraint. The resulting problem for joint encoding across channels is formulated as the cascade of two problems, the inner min problem being discrete, and the outer max problem being convex. Thereby, efficient algorithms for this kind of optimization program can be used as solutions to our power allocation problem. For the special case K=2 , we provide an almost closed-form solution, where only two single variables must be optimized, e.g., through dichotomic searches. To reduce computational complexity, we propose three new algorithms, maximizing the weighted sum rate achievable by two suboptimal schemes that perform per-user and per-channel encoding. By numerical results, we assess the performance of all proposed algorithms as a function of different system parameters
    • …
    corecore