33,842 research outputs found

    Beyond A/B Testing: Sequential Randomization for Developing Interventions in Scaled Digital Learning Environments

    Full text link
    Randomized experiments ensure robust causal inference that are critical to effective learning analytics research and practice. However, traditional randomized experiments, like A/B tests, are limiting in large scale digital learning environments. While traditional experiments can accurately compare two treatment options, they are less able to inform how to adapt interventions to continually meet learners' diverse needs. In this work, we introduce a trial design for developing adaptive interventions in scaled digital learning environments -- the sequential randomized trial (SRT). With the goal of improving learner experience and developing interventions that benefit all learners at all times, SRTs inform how to sequence, time, and personalize interventions. In this paper, we provide an overview of SRTs, and we illustrate the advantages they hold compared to traditional experiments. We describe a novel SRT run in a large scale data science MOOC. The trial results contextualize how learner engagement can be addressed through inclusive culturally targeted reminder emails. We also provide practical advice for researchers who aim to run their own SRTs to develop adaptive interventions in scaled digital learning environments

    Sequential anomaly detection in the presence of noise and limited feedback

    Full text link
    This paper describes a methodology for detecting anomalies from sequentially observed and potentially noisy data. The proposed approach consists of two main elements: (1) {\em filtering}, or assigning a belief or likelihood to each successive measurement based upon our ability to predict it from previous noisy observations, and (2) {\em hedging}, or flagging potential anomalies by comparing the current belief against a time-varying and data-adaptive threshold. The threshold is adjusted based on the available feedback from an end user. Our algorithms, which combine universal prediction with recent work on online convex programming, do not require computing posterior distributions given all current observations and involve simple primal-dual parameter updates. At the heart of the proposed approach lie exponential-family models which can be used in a wide variety of contexts and applications, and which yield methods that achieve sublinear per-round regret against both static and slowly varying product distributions with marginals drawn from the same exponential family. Moreover, the regret against static distributions coincides with the minimax value of the corresponding online strongly convex game. We also prove bounds on the number of mistakes made during the hedging step relative to the best offline choice of the threshold with access to all estimated beliefs and feedback signals. We validate the theory on synthetic data drawn from a time-varying distribution over binary vectors of high dimensionality, as well as on the Enron email dataset.Comment: 19 pages, 12 pdf figures; final version to be published in IEEE Transactions on Information Theor

    Universal Lossless Compression with Unknown Alphabets - The Average Case

    Full text link
    Universal compression of patterns of sequences generated by independently identically distributed (i.i.d.) sources with unknown, possibly large, alphabets is investigated. A pattern is a sequence of indices that contains all consecutive indices in increasing order of first occurrence. If the alphabet of a source that generated a sequence is unknown, the inevitable cost of coding the unknown alphabet symbols can be exploited to create the pattern of the sequence. This pattern can in turn be compressed by itself. It is shown that if the alphabet size kk is essentially small, then the average minimax and maximin redundancies as well as the redundancy of every code for almost every source, when compressing a pattern, consist of at least 0.5 log(n/k^3) bits per each unknown probability parameter, and if all alphabet letters are likely to occur, there exist codes whose redundancy is at most 0.5 log(n/k^2) bits per each unknown probability parameter, where n is the length of the data sequences. Otherwise, if the alphabet is large, these redundancies are essentially at least O(n^{-2/3}) bits per symbol, and there exist codes that achieve redundancy of essentially O(n^{-1/2}) bits per symbol. Two sub-optimal low-complexity sequential algorithms for compression of patterns are presented and their description lengths analyzed, also pointing out that the pattern average universal description length can decrease below the underlying i.i.d.\ entropy for large enough alphabets.Comment: Revised for IEEE Transactions on Information Theor

    On Probability Estimation by Exponential Smoothing

    Full text link
    Probability estimation is essential for every statistical data compression algorithm. In practice probability estimation should be adaptive, recent observations should receive a higher weight than older observations. We present a probability estimation method based on exponential smoothing that satisfies this requirement and runs in constant time per letter. Our main contribution is a theoretical analysis in case of a binary alphabet for various smoothing rate sequences: We show that the redundancy w.r.t. a piecewise stationary model with ss segments is O(sn)O\left(s\sqrt n\right) for any bit sequence of length nn, an improvement over redundancy O(snlogn)O\left(s\sqrt{n\log n}\right) of previous approaches with similar time complexity
    corecore