10 research outputs found
Learning to Crawl
Web crawling is the problem of keeping a cache of webpages fresh, i.e.,
having the most recent copy available when a page is requested. This problem is
usually coupled with the natural restriction that the bandwidth available to
the web crawler is limited. The corresponding optimization problem was solved
optimally by Azar et al. [2018] under the assumption that, for each webpage,
both the elapsed time between two changes and the elapsed time between two
requests follow a Poisson distribution with known parameters. In this paper, we
study the same control problem but under the assumption that the change rates
are unknown a priori, and thus we need to estimate them in an online fashion
using only partial observations (i.e., single-bit signals indicating whether
the page has changed since the last refresh). As a point of departure, we
characterise the conditions under which one can solve the problem with such
partial observability. Next, we propose a practical estimator and compute
confidence intervals for it in terms of the elapsed time between the
observations. Finally, we show that the explore-and-commit algorithm achieves
an regret with a carefully chosen exploration horizon.
Our simulation study shows that our online policy scales well and achieves
close to optimal performance for a wide range of the parameters.Comment: Published at AAAI 202
Easy Learning from Label Proportions
We consider the problem of Learning from Label Proportions (LLP), a weakly
supervised classification setup where instances are grouped into "bags", and
only the frequency of class labels at each bag is available. Albeit, the
objective of the learner is to achieve low task loss at an individual instance
level. Here we propose Easyllp: a flexible and simple-to-implement debiasing
approach based on aggregate labels, which operates on arbitrary loss functions.
Our technique allows us to accurately estimate the expected loss of an
arbitrary model at an individual level. We showcase the flexibility of our
approach by applying it to popular learning frameworks, like Empirical Risk
Minimization (ERM) and Stochastic Gradient Descent (SGD) with provable
guarantees on instance level performance. More concretely, we exhibit a
variance reduction technique that makes the quality of LLP learning deteriorate
only by a factor of k (k being bag size) in both ERM and SGD setups, as
compared to full supervision. Finally, we validate our theoretical results on
multiple datasets demonstrating our algorithm performs as well or better than
previous LLP approaches in spite of its simplicity
Interactive Q-Learning with Ordinal Rewards and Unreliable Tutor
Abstract. Conventional reinforcement learning (RL) requires the specification of a numeric reward function, which is often a difficult task. In this paper, we extend the Q-learning approach toward the handling of ordinal rewards. The method we propose is interactive in the sense of allowing the agent to query a tutor for comparing sequences of ordinal rewards. More specifically, this method can be seen as an extension of a recently proposed interactive value iteration (IVI) algorithm for Markov Decision Processes to the setting of reinforcement learning; in contrast to the original IVI algorithm, our method is tolerant toward unreliable and inconsistent tutor feedback
Proceedings of the DA2PL'2016 EURO Mini Conference
International audienc