1,031 research outputs found
Beyond A/B Testing: Sequential Randomization for Developing Interventions in Scaled Digital Learning Environments
Randomized experiments ensure robust causal inference that are critical to
effective learning analytics research and practice. However, traditional
randomized experiments, like A/B tests, are limiting in large scale digital
learning environments. While traditional experiments can accurately compare two
treatment options, they are less able to inform how to adapt interventions to
continually meet learners' diverse needs. In this work, we introduce a trial
design for developing adaptive interventions in scaled digital learning
environments -- the sequential randomized trial (SRT). With the goal of
improving learner experience and developing interventions that benefit all
learners at all times, SRTs inform how to sequence, time, and personalize
interventions. In this paper, we provide an overview of SRTs, and we illustrate
the advantages they hold compared to traditional experiments. We describe a
novel SRT run in a large scale data science MOOC. The trial results
contextualize how learner engagement can be addressed through inclusive
culturally targeted reminder emails. We also provide practical advice for
researchers who aim to run their own SRTs to develop adaptive interventions in
scaled digital learning environments
Experimenting with sequential allocation procedures
In experiments that consider the use of subjects, a crucial part is deciding which treatment to allocate to which subject – in other words, constructing the treatment allocation procedure. In a classical experiment, this treatment allocation procedure often simply constitutes randomly assigning subjects to a number of different treatments. Subsequently, when all outcomes have been observed, the resulting data is used to conduct an analysis that is specified a priori. Practically, however, the subjects often arrive at an experiment one-by-one. This allows the data generating process to be viewed differently: instead of considering the subjects in a batch, intermediate data from previous interactions with other subjects can be used to influence the decisions of the treatment allocation in future interactions. A heavily researched formalization that helps developing strategies for sequentially allocating subjects is the multi-armed bandit problem. In this thesis, several methods are developed to expedite the use of sequential allocation procedures by (social) scientists in field experiments. This is done by building upon the extensive literature of the multi-armed bandit problem. The thesis also introduces and shows many (empirical) examples of the usefulness and applicability of sequential allocation procedures in practice
Causal Reinforcement Learning: A Survey
Reinforcement learning is an essential paradigm for solving sequential
decision problems under uncertainty. Despite many remarkable achievements in
recent decades, applying reinforcement learning methods in the real world
remains challenging. One of the main obstacles is that reinforcement learning
agents lack a fundamental understanding of the world and must therefore learn
from scratch through numerous trial-and-error interactions. They may also face
challenges in providing explanations for their decisions and generalizing the
acquired knowledge. Causality, however, offers a notable advantage as it can
formalize knowledge in a systematic manner and leverage invariance for
effective knowledge transfer. This has led to the emergence of causal
reinforcement learning, a subfield of reinforcement learning that seeks to
enhance existing algorithms by incorporating causal relationships into the
learning process. In this survey, we comprehensively review the literature on
causal reinforcement learning. We first introduce the basic concepts of
causality and reinforcement learning, and then explain how causality can
address core challenges in non-causal reinforcement learning. We categorize and
systematically review existing causal reinforcement learning approaches based
on their target problems and methodologies. Finally, we outline open issues and
future directions in this emerging field.Comment: 48 pages, 10 figure
On (Normalised) Discounted Cumulative Gain as an Off-Policy Evaluation Metric for Top- Recommendation
Approaches to recommendation are typically evaluated in one of two ways: (1)
via a (simulated) online experiment, often seen as the gold standard, or (2)
via some offline evaluation procedure, where the goal is to approximate the
outcome of an online experiment. Several offline evaluation metrics have been
adopted in the literature, inspired by ranking metrics prevalent in the field
of Information Retrieval. (Normalised) Discounted Cumulative Gain (nDCG) is one
such metric that has seen widespread adoption in empirical studies, and higher
(n)DCG values have been used to present new methods as the state-of-the-art in
top- recommendation for many years.
Our work takes a critical look at this approach, and investigates when we can
expect such metrics to approximate the gold standard outcome of an online
experiment. We formally present the assumptions that are necessary to consider
DCG an unbiased estimator of online reward and provide a derivation for this
metric from first principles, highlighting where we deviate from its
traditional uses in IR. Importantly, we show that normalising the metric
renders it inconsistent, in that even when DCG is unbiased, ranking competing
methods by their normalised DCG can invert their relative order. Through a
correlation analysis between off- and on-line experiments conducted on a
large-scale recommendation platform, we show that our unbiased DCG estimates
strongly correlate with online reward, even when some of the metric's inherent
assumptions are violated. This statement no longer holds for its normalised
variant, suggesting that nDCG's practical utility may be limited
- …