33,842 research outputs found
Beyond A/B Testing: Sequential Randomization for Developing Interventions in Scaled Digital Learning Environments
Randomized experiments ensure robust causal inference that are critical to
effective learning analytics research and practice. However, traditional
randomized experiments, like A/B tests, are limiting in large scale digital
learning environments. While traditional experiments can accurately compare two
treatment options, they are less able to inform how to adapt interventions to
continually meet learners' diverse needs. In this work, we introduce a trial
design for developing adaptive interventions in scaled digital learning
environments -- the sequential randomized trial (SRT). With the goal of
improving learner experience and developing interventions that benefit all
learners at all times, SRTs inform how to sequence, time, and personalize
interventions. In this paper, we provide an overview of SRTs, and we illustrate
the advantages they hold compared to traditional experiments. We describe a
novel SRT run in a large scale data science MOOC. The trial results
contextualize how learner engagement can be addressed through inclusive
culturally targeted reminder emails. We also provide practical advice for
researchers who aim to run their own SRTs to develop adaptive interventions in
scaled digital learning environments
Sequential anomaly detection in the presence of noise and limited feedback
This paper describes a methodology for detecting anomalies from sequentially
observed and potentially noisy data. The proposed approach consists of two main
elements: (1) {\em filtering}, or assigning a belief or likelihood to each
successive measurement based upon our ability to predict it from previous noisy
observations, and (2) {\em hedging}, or flagging potential anomalies by
comparing the current belief against a time-varying and data-adaptive
threshold. The threshold is adjusted based on the available feedback from an
end user. Our algorithms, which combine universal prediction with recent work
on online convex programming, do not require computing posterior distributions
given all current observations and involve simple primal-dual parameter
updates. At the heart of the proposed approach lie exponential-family models
which can be used in a wide variety of contexts and applications, and which
yield methods that achieve sublinear per-round regret against both static and
slowly varying product distributions with marginals drawn from the same
exponential family. Moreover, the regret against static distributions coincides
with the minimax value of the corresponding online strongly convex game. We
also prove bounds on the number of mistakes made during the hedging step
relative to the best offline choice of the threshold with access to all
estimated beliefs and feedback signals. We validate the theory on synthetic
data drawn from a time-varying distribution over binary vectors of high
dimensionality, as well as on the Enron email dataset.Comment: 19 pages, 12 pdf figures; final version to be published in IEEE
Transactions on Information Theor
Universal Lossless Compression with Unknown Alphabets - The Average Case
Universal compression of patterns of sequences generated by independently
identically distributed (i.i.d.) sources with unknown, possibly large,
alphabets is investigated. A pattern is a sequence of indices that contains all
consecutive indices in increasing order of first occurrence. If the alphabet of
a source that generated a sequence is unknown, the inevitable cost of coding
the unknown alphabet symbols can be exploited to create the pattern of the
sequence. This pattern can in turn be compressed by itself. It is shown that if
the alphabet size is essentially small, then the average minimax and
maximin redundancies as well as the redundancy of every code for almost every
source, when compressing a pattern, consist of at least 0.5 log(n/k^3) bits per
each unknown probability parameter, and if all alphabet letters are likely to
occur, there exist codes whose redundancy is at most 0.5 log(n/k^2) bits per
each unknown probability parameter, where n is the length of the data
sequences. Otherwise, if the alphabet is large, these redundancies are
essentially at least O(n^{-2/3}) bits per symbol, and there exist codes that
achieve redundancy of essentially O(n^{-1/2}) bits per symbol. Two sub-optimal
low-complexity sequential algorithms for compression of patterns are presented
and their description lengths analyzed, also pointing out that the pattern
average universal description length can decrease below the underlying i.i.d.\
entropy for large enough alphabets.Comment: Revised for IEEE Transactions on Information Theor
On Probability Estimation by Exponential Smoothing
Probability estimation is essential for every statistical data compression
algorithm. In practice probability estimation should be adaptive, recent
observations should receive a higher weight than older observations. We present
a probability estimation method based on exponential smoothing that satisfies
this requirement and runs in constant time per letter. Our main contribution is
a theoretical analysis in case of a binary alphabet for various smoothing rate
sequences: We show that the redundancy w.r.t. a piecewise stationary model with
segments is for any bit sequence of length , an
improvement over redundancy of previous
approaches with similar time complexity
- …