1,242 research outputs found
Enhancing Evolutionary Conversion Rate Optimization via Multi-armed Bandit Algorithms
Conversion rate optimization means designing web interfaces such that more
visitors perform a desired action (such as register or purchase) on the site.
One promising approach, implemented in Sentient Ascend, is to optimize the
design using evolutionary algorithms, evaluating each candidate design online
with actual visitors. Because such evaluations are costly and noisy, several
challenges emerge: How can available visitor traffic be used most efficiently?
How can good solutions be identified most reliably? How can a high conversion
rate be maintained during optimization? This paper proposes a new technique to
address these issues. Traffic is allocated to candidate solutions using a
multi-armed bandit algorithm, using more traffic on those evaluations that are
most useful. In a best-arm identification mode, the best candidate can be
identified reliably at the end of evolution, and in a campaign mode, the
overall conversion rate can be optimized throughout the entire evolution
process. Multi-armed bandit algorithms thus improve performance and reliability
of machine discovery in noisy real-world environments.Comment: The Thirty-First Innovative Applications of Artificial Intelligence
Conferenc
Parameter estimation in softmax decision-making models with linear objective functions
With an eye towards human-centered automation, we contribute to the
development of a systematic means to infer features of human decision-making
from behavioral data. Motivated by the common use of softmax selection in
models of human decision-making, we study the maximum likelihood parameter
estimation problem for softmax decision-making models with linear objective
functions. We present conditions under which the likelihood function is convex.
These allow us to provide sufficient conditions for convergence of the
resulting maximum likelihood estimator and to construct its asymptotic
distribution. In the case of models with nonlinear objective functions, we show
how the estimator can be applied by linearizing about a nominal parameter
value. We apply the estimator to fit the stochastic UCL (Upper Credible Limit)
model of human decision-making to human subject data. We show statistically
significant differences in behavior across related, but distinct, tasks.Comment: In pres
Multi-Armed Bandits for Intelligent Tutoring Systems
We present an approach to Intelligent Tutoring Systems which adaptively
personalizes sequences of learning activities to maximize skills acquired by
students, taking into account the limited time and motivational resources. At a
given point in time, the system proposes to the students the activity which
makes them progress faster. We introduce two algorithms that rely on the
empirical estimation of the learning progress, RiARiT that uses information
about the difficulty of each exercise and ZPDES that uses much less knowledge
about the problem.
The system is based on the combination of three approaches. First, it
leverages recent models of intrinsically motivated learning by transposing them
to active teaching, relying on empirical estimation of learning progress
provided by specific activities to particular students. Second, it uses
state-of-the-art Multi-Arm Bandit (MAB) techniques to efficiently manage the
exploration/exploitation challenge of this optimization process. Third, it
leverages expert knowledge to constrain and bootstrap initial exploration of
the MAB, while requiring only coarse guidance information of the expert and
allowing the system to deal with didactic gaps in its knowledge. The system is
evaluated in a scenario where 7-8 year old schoolchildren learn how to
decompose numbers while manipulating money. Systematic experiments are
presented with simulated students, followed by results of a user study across a
population of 400 school children
Interactive Restless Multi-armed Bandit Game and Swarm Intelligence Effect
We obtain the conditions for the emergence of the swarm intelligence effect
in an interactive game of restless multi-armed bandit (rMAB). A player competes
with multiple agents. Each bandit has a payoff that changes with a probability
per round. The agents and player choose one of three options: (1)
Exploit (a good bandit), (2) Innovate (asocial learning for a good bandit among
randomly chosen bandits), and (3) Observe (social learning for a good
bandit). Each agent has two parameters to specify the decision:
(i) , the threshold value for Exploit, and (ii) , the probability
for Observe in learning. The parameters are uniformly
distributed. We determine the optimal strategies for the player using complete
knowledge about the rMAB. We show whether or not social or asocial learning is
more optimal in the space and define the swarm intelligence
effect. We conduct a laboratory experiment (67 subjects) and observe the swarm
intelligence effect only if are chosen so that social learning
is far more optimal than asocial learning.Comment: 18 pages, 4 figure
- …