244 research outputs found
Learning Contextual Bandits in a Non-stationary Environment
Multi-armed bandit algorithms have become a reference solution for handling
the explore/exploit dilemma in recommender systems, and many other important
real-world problems, such as display advertisement. However, such algorithms
usually assume a stationary reward distribution, which hardly holds in practice
as users' preferences are dynamic. This inevitably costs a recommender system
consistent suboptimal performance. In this paper, we consider the situation
where the underlying distribution of reward remains unchanged over (possibly
short) epochs and shifts at unknown time instants. In accordance, we propose a
contextual bandit algorithm that detects possible changes of environment based
on its reward estimation confidence and updates its arm selection strategy
respectively. Rigorous upper regret bound analysis of the proposed algorithm
demonstrates its learning effectiveness in such a non-trivial environment.
Extensive empirical evaluations on both synthetic and real-world datasets for
recommendation confirm its practical utility in a changing environment.Comment: 10 pages, 13 figures, To appear on ACM Special Interest Group on
Information Retrieval (SIGIR) 201
Adaptive Grey-Box Fuzz-Testing with Thompson Sampling
Fuzz testing, or "fuzzing," refers to a widely deployed class of techniques
for testing programs by generating a set of inputs for the express purpose of
finding bugs and identifying security flaws. Grey-box fuzzing, the most popular
fuzzing strategy, combines light program instrumentation with a data driven
process to generate new program inputs. In this work, we present a machine
learning approach that builds on AFL, the preeminent grey-box fuzzer, by
adaptively learning a probability distribution over its mutation operators on a
program-specific basis. These operators, which are selected uniformly at random
in AFL and mutational fuzzers in general, dictate how new inputs are generated,
a core part of the fuzzer's efficacy. Our main contributions are two-fold:
First, we show that a sampling distribution over mutation operators estimated
from training programs can significantly improve performance of AFL. Second, we
introduce a Thompson Sampling, bandit-based optimization approach that
fine-tunes the mutator distribution adaptively, during the course of fuzzing an
individual program. A set of experiments across complex programs demonstrates
that tuning the mutational operator distribution generates sets of inputs that
yield significantly higher code coverage and finds more crashes faster and more
reliably than both baseline versions of AFL as well as other AFL-based learning
approaches.Comment: Published as a workshop paper in the 11th ACM Workshop on Artificial
Intelligence and Security (AISec '18) with the 25th ACM Conference on
Computer and Communications Security (CCS '18
Sequential Design for Ranking Response Surfaces
We propose and analyze sequential design methods for the problem of ranking
several response surfaces. Namely, given response surfaces over a
continuous input space , the aim is to efficiently find the index of
the minimal response across the entire . The response surfaces are not
known and have to be noisily sampled one-at-a-time. This setting is motivated
by stochastic control applications and requires joint experimental design both
in space and response-index dimensions. To generate sequential design
heuristics we investigate stepwise uncertainty reduction approaches, as well as
sampling based on posterior classification complexity. We also make connections
between our continuous-input formulation and the discrete framework of pure
regret in multi-armed bandits. To model the response surfaces we utilize
kriging surrogates. Several numerical examples using both synthetic data and an
epidemics control problem are provided to illustrate our approach and the
efficacy of respective adaptive designs.Comment: 26 pages, 7 figures (updated several sections and figures
- …