5 research outputs found
On Optimality of Myopic Sensing Policy with Imperfect Sensing in Multi-channel Opportunistic Access
We consider the channel access problem under imperfect sensing of channel
state in a multi-channel opportunistic communication system, where the state of
each channel evolves as an independent and identically distributed Markov
process. The considered problem can be cast into a restless multi-armed bandit
(RMAB) problem that is of fundamental importance in decision theory. It is
well-known that solving the RMAB problem is PSPACE-hard, with the optimal
policy usually intractable due to the exponential computation complexity. A
natural alternative is to consider the easily implementable myopic policy that
maximizes the immediate reward but ignores the impact of the current strategy
on the future reward. In this paper, we perform an analytical study on the
optimality of the myopic policy under imperfect sensing for the considered RMAB
problem. Specifically, for a family of generic and practically important
utility functions, we establish the closed-form conditions under which the
myopic policy is guaranteed to be optimal even under imperfect sensing. Despite
our focus on the opportunistic channel access, the obtained results are generic
in nature and are widely applicable in a wide range of engineering domains.Comment: 21 pages regular pape
Reinforcement Learning in Education: A Multi-Armed Bandit Approach
Advances in reinforcement learning research have demonstrated the ways in
which different agent-based models can learn how to optimally perform a task
within a given environment. Reinforcement leaning solves unsupervised problems
where agents move through a state-action-reward loop to maximize the overall
reward for the agent, which in turn optimizes the solving of a specific problem
in a given environment. However, these algorithms are designed based on our
understanding of actions that should be taken in a real-world environment to
solve a specific problem. One such problem is the ability to identify,
recommend and execute an action within a system where the users are the
subject, such as in education. In recent years, the use of blended learning
approaches integrating face-to-face learning with online learning in the
education context, has in-creased. Additionally, online platforms used for
education require the automation of certain functions such as the
identification, recommendation or execution of actions that can benefit the
user, in this sense, the student or learner. As promising as these scientific
advances are, there is still a need to conduct research in a variety of
different areas to ensure the successful deployment of these agents within
education systems. Therefore, the aim of this study was to contextualise and
simulate the cumulative reward within an environment for an intervention
recommendation problem in the education context.Comment: 17 pages, 6 figures, 1 table, EAI AFRICATEK 2022 Conferenc