87 research outputs found
A survey on multi-player bandits
works released after June 2022 are not considered in this surveyDue mostly to its application to cognitive radio networks, multiplayer bandits gained a lot of interest in the last decade. A considerable progress has been made on its theoretical aspect. However, the current algorithms are far from applicable and many obstacles remain between these theoretical results and a possible implementation of multiplayer bandits algorithms in real cognitive radio networks. This survey contextualizes and organizes the rich multiplayer bandits literature. In light of the existing works, some clear directions for future research appear. We believe that a further study of these different directions might lead to theoretical algorithms adapted to real-world situations
Multi-Flow Transmission in Wireless Interference Networks: A Convergent Graph Learning Approach
We consider the problem of of multi-flow transmission in wireless networks,
where data signals from different flows can interfere with each other due to
mutual interference between links along their routes, resulting in reduced link
capacities. The objective is to develop a multi-flow transmission strategy that
routes flows across the wireless interference network to maximize the network
utility. However, obtaining an optimal solution is computationally expensive
due to the large state and action spaces involved. To tackle this challenge, we
introduce a novel algorithm called Dual-stage Interference-Aware Multi-flow
Optimization of Network Data-signals (DIAMOND). The design of DIAMOND allows
for a hybrid centralized-distributed implementation, which is a characteristic
of 5G and beyond technologies with centralized unit deployments. A centralized
stage computes the multi-flow transmission strategy using a novel design of
graph neural network (GNN) reinforcement learning (RL) routing agent. Then, a
distributed stage improves the performance based on a novel design of
distributed learning updates. We provide a theoretical analysis of DIAMOND and
prove that it converges to the optimal multi-flow transmission strategy as time
increases. We also present extensive simulation results over various network
topologies (random deployment, NSFNET, GEANT2), demonstrating the superior
performance of DIAMOND compared to existing methods
Regret Analysis of Stochastic and Nonstochastic Multi-armed Bandit Problems
Multi-armed bandit problems are the most basic examples of sequential
decision problems with an exploration-exploitation trade-off. This is the
balance between staying with the option that gave highest payoffs in the past
and exploring new options that might give higher payoffs in the future.
Although the study of bandit problems dates back to the Thirties,
exploration-exploitation trade-offs arise in several modern applications, such
as ad placement, website optimization, and packet routing. Mathematically, a
multi-armed bandit is defined by the payoff process associated with each
option. In this survey, we focus on two extreme cases in which the analysis of
regret is particularly simple and elegant: i.i.d. payoffs and adversarial
payoffs. Besides the basic setting of finitely many actions, we also analyze
some of the most important variants and extensions, such as the contextual
bandit model.Comment: To appear in Foundations and Trends in Machine Learnin
A Gentle Introduction to Reinforcement Learning and its Application in Different Fields
Due to the recent progress in Deep Neural Networks, Reinforcement Learning (RL) has become one of the most important and useful technology. It is a learning method where a software agent interacts with an unknown environment, selects actions, and progressively discovers the environment dynamics. RL has been effectively applied in many important areas of real life. This article intends to provide an in-depth introduction of the Markov Decision Process, RL and its algorithms. Moreover, we present a literature review of the application of RL to a variety of fields, including robotics and autonomous control, communication and networking, natural language processing, games and self-organized system, scheduling management and configuration of resources, and computer vision
Energy efficient resource allocation for future wireless communication systemsy
Next generation of wireless communication systems envisions a massive number of connected battery powered wireless devices. Replacing the battery of such devices is expensive, costly, or infeasible. To this end, energy harvesting (EH) is a promising technique to prolong the lifetime of such devices. Because of randomness in amount and availability of the harvested energy, existing communication techniques require revisions to address the issues specific to EH systems. In this thesis, we aim at revisiting fundamental wireless communication problems and addressing the future perspective on service based applications with the specific characteristics of the EH in mind. In the first part of the thesis, we address three fundamental problems that exist in the wireless communication systems, namely; multiple access strategy, overcoming the wireless channel, and providing reliability. Since the wireless channel is a shared medium, concurrent transmissions of multiple devices cause interference which results in collision and eventual loss of the transmitted data. Multiple access protocols aim at providing a coordination mechanism between multiple transmissions so as to enable a collision free medium. We revisit the random access protocol for its distributed and low energy characteristics while incorporating the statistical correlation of the EH processes across two transmitters. We design a simple threshold based policy which only allows transmission if the battery state is above a certain threshold. By optimizing the threshold values, we show that by carefully addressing the correlation information, the randomness can be turned into an opportunity in some cases providing optimal coordination between transmitters without any collisions. Upon accessing the channel, a wireless transmitter is faced with a transmission medium that exhibits random and time varying properties. A transmitter can adapt its transmission strategy to the specific state of the channel for an efficient transmission of information. This requires a process known as channel sensing to acquire the channel state which is costly in terms of time and energy. The contribution of the channel sensing operation to the energy consumption in EH wireless transmitters is not negligible and requires proper optimization. We developed an intelligent channel sensing strategy for an EH transmitter communicating over a time-correlated wireless channel. Our results demonstrate that, despite the associated time and energy cost, sensing the channel intelligently to track the channel state improves the achievable long-term throughput significantly as compared to the performance of those protocols lacking this ability as well as the one that always senses the channel. Next, we study an EH receiver employing Hybrid Automatic Repeat reQuest (HARQ) to ensure reliable end-to-end communications. In inherently error-prone wireless communications systems, re-transmissions triggered by decoding errors have a major impact on the energy consumption of wireless devices. We take into account the energy consumption induced by HARQ to develop simple-toimplement optimal algorithms that minimizes the number of retransmissions required to successfully decode the packet. The large number of connected edge devices envisioned in future wireless technologies enable a wide range of resources with significant sensing capabilities. The ability to collect various data from the sensors has enabled many exciting smart applications. Providing data at a certain quality greatly improves the performance of many of such applications. However, providing high quality is demanding for energy limited sensors. Thus, in the second part of the thesis, we optimize the sensing resolution of an EH wireless sensor in order to efficiently utilize the harvested energy to maximize an application dependent utilit
New Directions in Online Learning: Boosting, Partial Information, and Non-Stationarity
Online learning, where a learning algorithm fits a model on-the-fly with streaming data, has become an important research area in machine learning. Batch learning, where the entire data set has to be available to the learning algorithm, is not always a suitable paradigm for the big data era. It is increasingly common in many practical situations, such as online ads prediction or control of self-driving cars, that data instances naturally arrive in a sequential manner. In these situations, researchers want to update their model in an online fashion. This dissertation pursues several topics at the frontier of online learning research.
In Chapter 2 and Chapter 3, the journey starts with online boosting. Online boosting studies how to combine multiple online weak learners to get a stronger learner. Chapter 2 considers online multi-class classification problems. Chapter 3 focuses on the more challenging multi-label ranking problem where there are multiple correct labels and the learner outputs a ranking of labels based on their relevance. In both chapters, an optimal algorithm and an adaptive algorithm are proposed. The optimal algorithms require a minimal number of weak learners to attain the desired accuracy. The adaptive algorithms are practically more useful since they do not require a priori knowledge about the strength of weak learners and are more computationally efficient. The adaptive algorithms are not statistically optimal but they still come with reasonable performance guarantees. The empirical results on real data sets support the theoretical findings and the proposed boosting algorithms outperformed existing competitors on benchmark data sets.
Chapter 4 considers the partial information setting, where the learner does not receive the true labels. Partial feedback is common in practice as obtaining complete feedback can be costly.
The chapter revisits the boosting algorithms that are presented in Chapter 2 and Chapter 3 and extends them to work with partial information feedback. Despite the learner receiving much less information, comparable performance guarantees can be made.
Later in Chapter 5 and Chapter 6, we move on to another interesting area in online learning called restless bandit problems. Unlike the classical (stochastic) multi-armed bandit problems where the reward distributions are unknown but stationary, in restless bandit problems the distributions can change over time. This extra layer of complexity allows us to study more complicated models, but the analysis becomes even more difficult. In restless bandit problems, it is assumed that each arm has a state that evolves according to an unknown Markov process, and the reward distribution depends on the arm's current state. This setting can be thought of as a sub-class of reinforcement learning and the partial observability inherent in this problem makes the analysis very challenging. The well known Thompson Sampling algorithm is analyzed and a Bayesian regret bound for it is derived. Chapter 5 considers the episodic case where the system periodically resets. Chapter 6 extends the analysis to the more challenging non-episodic (i.e., infinite time horizon) case. In both settings, Thompson Sampling algorithms (with slight modifications) enjoy sub-linear regret bounds, and the empirical results on simulated data support this fact. The experiments also suggest the possibility that the algorithm can be used in the frequentist setting even though the theoretical bounds are only shown for the Bayesian regret.PHDStatisticsUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttps://deepblue.lib.umich.edu/bitstream/2027.42/155110/1/yhjung_1.pd
- …