18,028 research outputs found
Deterministic Sequencing of Exploration and Exploitation for Multi-Armed Bandit Problems
In the Multi-Armed Bandit (MAB) problem, there is a given set of arms with
unknown reward models. At each time, a player selects one arm to play, aiming
to maximize the total expected reward over a horizon of length T. An approach
based on a Deterministic Sequencing of Exploration and Exploitation (DSEE) is
developed for constructing sequential arm selection policies. It is shown that
for all light-tailed reward distributions, DSEE achieves the optimal
logarithmic order of the regret, where regret is defined as the total expected
reward loss against the ideal case with known reward models. For heavy-tailed
reward distributions, DSEE achieves O(T^1/p) regret when the moments of the
reward distributions exist up to the pth order for 1<p<=2 and O(T^1/(1+p/2))
for p>2. With the knowledge of an upperbound on a finite moment of the
heavy-tailed reward distributions, DSEE offers the optimal logarithmic regret
order. The proposed DSEE approach complements existing work on MAB by providing
corresponding results for general reward distributions. Furthermore, with a
clearly defined tunable parameter-the cardinality of the exploration sequence,
the DSEE approach is easily extendable to variations of MAB, including MAB with
various objectives, decentralized MAB with multiple players and incomplete
reward observations under collisions, MAB with unknown Markov dynamics, and
combinatorial MAB with dependent arms that often arise in network optimization
problems such as the shortest path, the minimum spanning, and the dominating
set problems under unknown random weights.Comment: 22 pages, 2 figure
Power Allocation in Multiuser Parallel Gaussian Broadcast Channels With Common and Confidential Messages
We consider a broadcast communication over parallel channels, where the transmitter sends K+1 messages: one common message to all users, and K confidential messages to each user, which need to be kept secret from all unintended users. We assume partial channel state information at the transmitter, stemming from noisy channel estimation. Our main goal is to design a power allocation algorithm in order to maximize the weighted sum rate of common and confidential messages under a total power constraint. The resulting problem for joint encoding across channels is formulated as the cascade of two problems, the inner min problem being discrete, and the outer max problem being convex. Thereby, efficient algorithms for this kind of optimization program can be used as solutions to our power allocation problem. For the special case K=2 , we provide an almost closed-form solution, where only two single variables must be optimized, e.g., through dichotomic searches. To reduce computational complexity, we propose three new algorithms, maximizing the weighted sum rate achievable by two suboptimal schemes that perform per-user and per-channel encoding. By numerical results, we assess the performance of all proposed algorithms as a function of different system parameters
- …