232 research outputs found

    Trend Detection based Regret Minimization for Bandit Problems

    Full text link
    We study a variation of the classical multi-armed bandits problem. In this problem, the learner has to make a sequence of decisions, picking from a fixed set of choices. In each round, she receives as feedback only the loss incurred from the chosen action. Conventionally, this problem has been studied when losses of the actions are drawn from an unknown distribution or when they are adversarial. In this paper, we study this problem when the losses of the actions also satisfy certain structural properties, and especially, do show a trend structure. When this is true, we show that using \textit{trend detection}, we can achieve regret of order O~(NTK)\tilde{O} (N \sqrt{TK}) with respect to a switching strategy for the version of the problem where a single action is chosen in each round and O~(NmTK)\tilde{O} (Nm \sqrt{TK}) when mm actions are chosen each round. This guarantee is a significant improvement over the conventional benchmark. Our approach can, as a framework, be applied in combination with various well-known bandit algorithms, like Exp3. For both versions of the problem, we give regret guarantees also for the \textit{anytime} setting, i.e. when the length of the choice-sequence is not known in advance. Finally, we pinpoint the advantages of our method by comparing it to some well-known other strategies

    Sequential Design for Ranking Response Surfaces

    Full text link
    We propose and analyze sequential design methods for the problem of ranking several response surfaces. Namely, given L≥2L \ge 2 response surfaces over a continuous input space X\cal X, the aim is to efficiently find the index of the minimal response across the entire X\cal X. The response surfaces are not known and have to be noisily sampled one-at-a-time. This setting is motivated by stochastic control applications and requires joint experimental design both in space and response-index dimensions. To generate sequential design heuristics we investigate stepwise uncertainty reduction approaches, as well as sampling based on posterior classification complexity. We also make connections between our continuous-input formulation and the discrete framework of pure regret in multi-armed bandits. To model the response surfaces we utilize kriging surrogates. Several numerical examples using both synthetic data and an epidemics control problem are provided to illustrate our approach and the efficacy of respective adaptive designs.Comment: 26 pages, 7 figures (updated several sections and figures

    A Bayesian multi-armed bandit algorithm for dynamic end-to-end routing in SDN-based networks with piecewise-stationary rewards

    Get PDF
    To handle the exponential growth of data-intensive network edge services and automatically solve new challenges in routing management, machine learning is steadily being incorporated into software-defined networking solutions. In this line, the article presents the design of a piecewise-stationary Bayesian multi-armed bandit approach for the online optimum end-to-end dynamic routing of data flows in the context of programmable networking systems. This learning-based approach has been analyzed with simulated and emulated data, showing the proposal’s ability to sequentially and proactively self-discover the end-to-end routing path with minimal delay among a considerable number of alternatives, even when facing abrupt changes in transmission delay distributions due to both variable congestion levels on path network devices and dynamic delays to transmission links.info:eu-repo/semantics/publishedVersio

    EMM: Energy-Aware Mobility Management for Mobile Edge Computing in Ultra Dense Networks

    Full text link
    Merging mobile edge computing (MEC) functionality with the dense deployment of base stations (BSs) provides enormous benefits such as a real proximity, low latency access to computing resources. However, the envisioned integration creates many new challenges, among which mobility management (MM) is a critical one. Simply applying existing radio access oriented MM schemes leads to poor performance mainly due to the co-provisioning of radio access and computing services of the MEC-enabled BSs. In this paper, we develop a novel user-centric energy-aware mobility management (EMM) scheme, in order to optimize the delay due to both radio access and computation, under the long-term energy consumption constraint of the user. Based on Lyapunov optimization and multi-armed bandit theories, EMM works in an online fashion without future system state information, and effectively handles the imperfect system state information. Theoretical analysis explicitly takes radio handover and computation migration cost into consideration and proves a bounded deviation on both the delay performance and energy consumption compared to the oracle solution with exact and complete future system information. The proposed algorithm also effectively handles the scenario in which candidate BSs randomly switch on/off during the offloading process of a task. Simulations show that the proposed algorithms can achieve close-to-optimal delay performance while satisfying the user energy consumption constraint.Comment: 14 pages, 6 figures, an extended version of the paper submitted to IEEE JSA
    • …
    corecore