20 research outputs found
Upper-Confidence-Bound Algorithms for Active Learning in Multi-Armed Bandits
International audienceIn this paper, we study the problem of estimating the mean values of all the arms uniformly well in the multi-armed bandit setting. If the variances of the arms were known, one could design an optimal sampling strategy by pulling the arms proportionally to their variances. However, since the distributions are not known in advance, we need to design adaptive sampling strategies to select an arm at each round based on the previous observed samples. We describe two strategies based on pulling the arms proportionally to an upper-bound on their variances and derive regret bounds for these strategies. %on the excess estimation error compared to the optimal allocation. We show that the performance of these allocation strategies depends not only on the variances of the arms but also on the full shape of their distributions
Parallel and Multi-Objective Falsification with Scenic and VerifAI
Falsification has emerged as an important tool for simulation-based
verification of autonomous systems. In this paper, we present extensions to the
Scenic scenario specification language and VerifAI toolkit that improve the
scalability of sampling-based falsification methods by using parallelism and
extend falsification to multi-objective specifications. We first present a
parallelized framework that is interfaced with both the simulation and sampling
capabilities of Scenic and the falsification capabilities of VerifAI, reducing
the execution time bottleneck inherently present in simulation-based testing.
We then present an extension of VerifAI's falsification algorithms to support
multi-objective optimization during sampling, using the concept of rulebooks to
specify a preference ordering over multiple metrics that can be used to guide
the counterexample search process. Lastly, we evaluate the benefits of these
extensions with a comprehensive set of benchmarks written in the Scenic
language
Parallel Gaussian Process Optimization with Upper Confidence Bound and Pure Exploration
In this paper, we consider the challenge of maximizing an unknown function f
for which evaluations are noisy and are acquired with high cost. An iterative
procedure uses the previous measures to actively select the next estimation of
f which is predicted to be the most useful. We focus on the case where the
function can be evaluated in parallel with batches of fixed size and analyze
the benefit compared to the purely sequential procedure in terms of cumulative
regret. We introduce the Gaussian Process Upper Confidence Bound and Pure
Exploration algorithm (GP-UCB-PE) which combines the UCB strategy and Pure
Exploration in the same batch of evaluations along the parallel iterations. We
prove theoretical upper bounds on the regret with batches of size K for this
procedure which show the improvement of the order of sqrt{K} for fixed
iteration cost over purely sequential versions. Moreover, the multiplicative
constants involved have the property of being dimension-free. We also confirm
empirically the efficiency of GP-UCB-PE on real and synthetic problems compared
to state-of-the-art competitors
EdgeAISim: A Toolkit for Simulation and Modelling of AI Models in Edge Computing Environments
To meet next-generation Internet of Things (IoT) application demands, edge computing moves processing power and storage closer to the network edge to minimize latency and bandwidth utilization. Edge computing is becoming increasingly popular as a result of these benefits, but it comes with challenges such as managing resources efficiently. Researchers are utilising Artificial Intelligence (AI) models to solve the challenge of resource management in edge computing systems. However, existing simulation tools are only concerned with typical resource management policies, not the adoption and implementation of AI models for resource management, especially. Consequently, researchers continue to face significant challenges, making it hard and time-consuming to use AI models when designing novel resource management policies for edge computing with existing simulation tools. To overcome these issues, we propose a lightweight Python-based toolkit called EdgeAISim for the simulation and modelling of AI models for designing resource management policies in edge computing environments. In EdgeAISim, we extended the basic components of the EdgeSimPy framework and developed new AI-based simulation models for task scheduling, energy management, service migration, network flow scheduling, and mobility support for edge computing environments. In EdgeAISim, we have utilized advanced AI models such as Multi-Armed Bandit with Upper Confidence Bound, Deep Q-Networks, Deep Q-Networks with Graphical Neural Network, and Actor-Critic Network to optimize power usage while efficiently managing task migration within the edge computing environment. The performance of these proposed models of EdgeAISim is compared with the baseline, which uses a worst-fit algorithm-based resource management policy in different settings. Experimental results indicate that EdgeAISim exhibits a substantial reduction in power consumption, highlighting the compelling success of power optimization strategies in EdgeAISim. The development of EdgeAISim represents a promising step towards sustainable edge computing, providing eco-friendly and energy-efficient solutions that facilitate efficient task management in edge environments for different large-scale scenarios
Sequential Design for Ranking Response Surfaces
We propose and analyze sequential design methods for the problem of ranking
several response surfaces. Namely, given response surfaces over a
continuous input space , the aim is to efficiently find the index of
the minimal response across the entire . The response surfaces are not
known and have to be noisily sampled one-at-a-time. This setting is motivated
by stochastic control applications and requires joint experimental design both
in space and response-index dimensions. To generate sequential design
heuristics we investigate stepwise uncertainty reduction approaches, as well as
sampling based on posterior classification complexity. We also make connections
between our continuous-input formulation and the discrete framework of pure
regret in multi-armed bandits. To model the response surfaces we utilize
kriging surrogates. Several numerical examples using both synthetic data and an
epidemics control problem are provided to illustrate our approach and the
efficacy of respective adaptive designs.Comment: 26 pages, 7 figures (updated several sections and figures
Online Decision Mediation
Consider learning a decision support assistant to serve as an intermediary
between (oracle) expert behavior and (imperfect) human behavior: At each time,
the algorithm observes an action chosen by a fallible agent, and decides
whether to *accept* that agent's decision, *intervene* with an alternative, or
*request* the expert's opinion. For instance, in clinical diagnosis,
fully-autonomous machine behavior is often beyond ethical affordances, thus
real-world decision support is often limited to monitoring and forecasting.
Instead, such an intermediary would strike a prudent balance between the former
(purely prescriptive) and latter (purely descriptive) approaches, while
providing an efficient interface between human mistakes and expert feedback. In
this work, we first formalize the sequential problem of *online decision
mediation* -- that is, of simultaneously learning and evaluating mediator
policies from scratch with *abstentive feedback*: In each round, deferring to
the oracle obviates the risk of error, but incurs an upfront penalty, and
reveals the otherwise hidden expert action as a new training data point.
Second, we motivate and propose a solution that seeks to trade off (immediate)
loss terms against (future) improvements in generalization error; in doing so,
we identify why conventional bandit algorithms may fail. Finally, through
experiments and sensitivities on a variety of datasets, we illustrate
consistent gains over applicable benchmarks on performance measures with
respect to the mediator policy, the learned model, and the decision-making
system as a whole