Search CORE

20 research outputs found

Whittle index based Q-learning for restless bandits with average reward

Author: Avrachenkov Konstantin,
Borkar Vivek
Publication venue: 'Elsevier BV'
Publication date: 01/05/2022
Field of study

International audienceA novel reinforcement learning algorithm is introduced for multiarmed restless bandits with average reward, using the paradigms of Q-learning and Whittle index. Specifically, we leverage the structure of the Whittle index policy to reduce the search space of Q-learning, resulting in major computational gains. Rigorous convergence analysis is provided, supported by numerical experiments. The numerical experiments show excellent empirical performance of the proposed scheme

INRIA a CCSD electronic archive server

Restless Bandits with Average Reward: Breaking the Uniform Global Attractor Assumption

Author: Chen Yudong
Hong Yige
Wang Weina
Xie Qiaomin
Publication venue
Publication date: 31/05/2023
Field of study

We study the infinite-horizon restless bandit problem with the average reward criterion, under both discrete-time and continuous-time settings. A fundamental question is how to design computationally efficient policies that achieve a diminishing optimality gap as the number of arms,

N

, grows large. Existing results on asymptotical optimality all rely on the uniform global attractor property (UGAP), a complex and challenging-to-verify assumption. In this paper, we propose a general, simulation-based framework that converts any single-armed policy into a policy for the original

N

-armed problem. This is accomplished by simulating the single-armed policy on each arm and carefully steering the real state towards the simulated state. Our framework can be instantiated to produce a policy with an

O(1/\sqrt{N})

optimality gap. In the discrete-time setting, our result holds under a simpler synchronization assumption, which covers some problem instances that do not satisfy UGAP. More notably, in the continuous-time setting, our result does not require any additional assumptions beyond the standard unichain condition. In both settings, we establish the first asymptotic optimality result that does not require UGAP.Comment: 29 pages, 4 figure

arXiv.org e-Print Archive

Online algorithms for estimating change rates of web pages

Author: Avrachenkov Konstantin
Patil Kishor
Thoppe Gugan
Publication venue: 'Elsevier BV'
Publication date: 04/11/2021
Field of study

International audienceA search engine maintains local copies of different web pages to provide quick search results. This local cache is kept up-to-date by a web crawler that frequently visits these different pages to track changes in them. Ideally, the local copy should be updated as soon as a page changes on the web. However, finite bandwidth availability and server restrictions limit how frequently different pages can be crawled. This brings forth the following optimization problem: maximize the freshness of the local cache subject to the crawling frequencies being within prescribed bounds. While tractable algorithms do exist to solve this problem, these either assume the knowledge of exact page change rates or use inefficient methods such as MLE for estimating the same. We address this issue here. We provide three novel schemes for online estimation of page change rates, all of which have extremely low running times per iteration. The first is based on the law of large numbers and the second on stochastic approximation. The third is an extension of the second and includes a heavy-ball momentum term. All these schemes only need partial information about the page change process, i.e., they only need to know if the page has changed or not since the last crawled instance. Our main theoretical results concern asymptotic convergence and convergence rates of these three schemes. In fact, our work is the first to show convergence of the original stochastic heavy-ball method when neither the gradient nor the noise variance is uniformly bounded. We also provide some numerical experiments (based on real and synthetic data) to demonstrate the superiority of our proposed estimators over existing ones such as MLE. We emphasize that our algorithms are also readily applicable to the synchronization of databases and network inventory management

arXiv.org e-Print Archive

INRIA a CCSD electronic archive server

HAL Descartes

Maintenance Management of Wind Turbines

Author
Publication venue: 'MDPI AG'
Publication date: 01/05/2021
Field of study

“Maintenance Management of Wind Turbines” considers the main concepts and the state-of-the-art, as well as advances and case studies on this topic. Maintenance is a critical variable in industry in order to reach competitiveness. It is the most important variable, together with operations, in the wind energy industry. Therefore, the correct management of corrective, predictive and preventive politics in any wind turbine is required. The content also considers original research works that focus on content that is complementary to other sub-disciplines, such as economics, finance, marketing, decision and risk analysis, engineering, etc., in the maintenance management of wind turbines. This book focuses on real case studies. These case studies concern topics such as failure detection and diagnosis, fault trees and subdisciplines (e.g., FMECA, FMEA, etc.) Most of them link these topics with financial, schedule, resources, downtimes, etc., in order to increase productivity, profitability, maintainability, reliability, safety, availability, and reduce costs and downtime, etc., in a wind turbine. Advances in mathematics, models, computational techniques, dynamic analysis, etc., are employed in analytics in maintenance management in this book. Finally, the book considers computational techniques, dynamic analysis, probabilistic methods, and mathematical optimization techniques that are expertly blended to support the analysis of multi-criteria decision-making problems with defined constraints and requirements

Directory of Open Access Books (DOAB)