13 research outputs found
Selective Sampling with Drift
Recently there has been much work on selective sampling, an online active
learning setting, in which algorithms work in rounds. On each round an
algorithm receives an input and makes a prediction. Then, it can decide whether
to query a label, and if so to update its model, otherwise the input is
discarded. Most of this work is focused on the stationary case, where it is
assumed that there is a fixed target model, and the performance of the
algorithm is compared to a fixed model. However, in many real-world
applications, such as spam prediction, the best target function may drift over
time, or have shifts from time to time. We develop a novel selective sampling
algorithm for the drifting setting, analyze it under no assumptions on the
mechanism generating the sequence of instances, and derive new mistake bounds
that depend on the amount of drift in the problem. Simulations on synthetic and
real-world datasets demonstrate the superiority of our algorithms as a
selective sampling algorithm in the drifting setting
Variance Estimation For Dynamic Regression via Spectrum Thresholding
We consider the dynamic linear regression problem, where the predictor vector
may vary with time. This problem can be modeled as a linear dynamical system,
where the parameters that need to be learned are the variance of both the
process noise and the observation noise. While variance estimation for dynamic
regression is a natural problem, with a variety of applications, existing
approaches to this problem either lack guarantees or only have asymptotic
guarantees without explicit rates. In addition, all existing approaches rely
strongly on Guassianity of the noises. In this paper we study the global system
operator: the operator that maps the noise vectors to the output. In
particular, we obtain estimates on its spectrum, and as a result derive the
first known variance estimators with finite sample complexity guarantees.
Moreover, our results hold for arbitrary sub Gaussian distributions of noise
terms. We evaluate the approach on synthetic and real-world benchmarks
Continual Learning in Linear Classification on Separable Data
We analyze continual learning on a sequence of separable linear
classification tasks with binary labels. We show theoretically that learning
with weak regularization reduces to solving a sequential max-margin problem,
corresponding to a special case of the Projection Onto Convex Sets (POCS)
framework. We then develop upper bounds on the forgetting and other quantities
of interest under various settings with recurring tasks, including cyclic and
random orderings of tasks. We discuss several practical implications to popular
training practices like regularization scheduling and weighting. We point out
several theoretical differences between our continual classification setting
and a recently studied continual regression setting