Search CORE

1,900 research outputs found

Agnostic Active Learning Without Constraints

Author: Beygelzimer Alina
Hsu Daniel
Langford John
Zhang Tong
Publication venue
Publication date: 01/01/2010
Field of study

We present and analyze an agnostic active learning algorithm that works without keeping a version space. This is unlike all previous approaches where a restricted set of candidate hypotheses is maintained throughout learning, and only hypotheses from this set are ever returned. By avoiding this version space approach, our algorithm sheds the computational burden and brittleness associated with maintaining version spaces, yet still allows for substantial improvements over supervised learning for classification

arXiv.org e-Print Archive

CiteSeerX

Double Clipping: Less-Biased Variance Reduction in Off-Policy Evaluation

Author: Buchholz Alexander
Di Benedetto Giuseppe
Lichtenberg Jan Malte
London Ben
Ruffini Matteo
Publication venue
Publication date: 03/09/2023
Field of study

"Clipping" (a.k.a. importance weight truncation) is a widely used variance-reduction technique for counterfactual off-policy estimators. Like other variance-reduction techniques, clipping reduces variance at the cost of increased bias. However, unlike other techniques, the bias introduced by clipping is always a downward bias (assuming non-negative rewards), yielding a lower bound on the true expected reward. In this work we propose a simple extension, called

\textit{double clipping}

, which aims to compensate this downward bias and thus reduce the overall bias, while maintaining the variance reduction properties of the original estimator.Comment: Presented at CONSEQUENCES '23 workshop at RecSys 2023 conference in Singapor

arXiv.org e-Print Archive

Prediction & Model Evaluation for Space-Time Data

Author: Jerrett Michael
Reid Colleen E.
Telesca Donatello
Watson Gregory L.
Publication venue
Publication date: 27/12/2020
Field of study

Evaluation metrics for prediction error, model selection and model averaging on space-time data are understudied and poorly understood. The absence of independent replication makes prediction ambiguous as a concept and renders evaluation procedures developed for independent data inappropriate for most space-time prediction problems. Motivated by air pollution data collected during California wildfires in 2008, this manuscript attempts a formalization of the true prediction error associated with spatial interpolation. We investigate a variety of cross-validation (CV) procedures employing both simulations and case studies to provide insight into the nature of the estimand targeted by alternative data partition strategies. Consistent with recent best practice, we find that location-based cross-validation is appropriate for estimating spatial interpolation error as in our analysis of the California wildfire data. Interestingly, commonly held notions of bias-variance trade-off of CV fold size do not trivially apply to dependent data, and we recommend leave-one-location-out (LOLO) CV as the preferred prediction error metric for spatial interpolation.Comment: 15 pages, 5 figure

arXiv.org e-Print Archive

A review of domain adaptation without target labels

Author: Kouw Wouter M.
Loog Marco
Publication venue
Publication date: 01/01/2019
Field of study

Domain adaptation has become a prominent problem setting in machine learning and related fields. This review asks the question: how can a classifier learn from a source domain and generalize to a target domain? We present a categorization of approaches, divided into, what we refer to as, sample-based, feature-based and inference-based methods. Sample-based methods focus on weighting individual observations during training based on their importance to the target domain. Feature-based methods revolve around on mapping, projecting and representing features such that a source classifier performs well on the target domain and inference-based methods incorporate adaptation into the parameter estimation procedure, for instance through constraints on the optimization procedure. Additionally, we review a number of conditions that allow for formulating bounds on the cross-domain generalization error. Our categorization highlights recurring ideas and raises questions important to further research.Comment: 20 pages, 5 figure

arXiv.org e-Print Archive

Crossref