2 research outputs found
Data Distillation: A Survey
The popularity of deep learning has led to the curation of a vast number of
massive and multifarious datasets. Despite having close-to-human performance on
individual tasks, training parameter-hungry models on large datasets poses
multi-faceted problems such as (a) high model-training time; (b) slow research
iteration; and (c) poor eco-sustainability. As an alternative, data
distillation approaches aim to synthesize terse data summaries, which can serve
as effective drop-in replacements of the original dataset for scenarios like
model training, inference, architecture search, etc. In this survey, we present
a formal framework for data distillation, along with providing a detailed
taxonomy of existing approaches. Additionally, we cover data distillation
approaches for different data modalities, namely images, graphs, and user-item
interactions (recommender systems), while also identifying current challenges
and future research directions.Comment: Accepted at TMLR '23. 21 pages, 4 figure
Off-Policy Evaluation for Large Action Spaces via Policy Convolution
Developing accurate off-policy estimators is crucial for both evaluating and
optimizing for new policies. The main challenge in off-policy estimation is the
distribution shift between the logging policy that generates data and the
target policy that we aim to evaluate. Typically, techniques for correcting
distribution shift involve some form of importance sampling. This approach
results in unbiased value estimation but often comes with the trade-off of high
variance, even in the simpler case of one-step contextual bandits. Furthermore,
importance sampling relies on the common support assumption, which becomes
impractical when the action space is large. To address these challenges, we
introduce the Policy Convolution (PC) family of estimators. These methods
leverage latent structure within actions -- made available through action
embeddings -- to strategically convolve the logging and target policies. This
convolution introduces a unique bias-variance trade-off, which can be
controlled by adjusting the amount of convolution. Our experiments on synthetic
and benchmark datasets demonstrate remarkable mean squared error (MSE)
improvements when using PC, especially when either the action space or policy
mismatch becomes large, with gains of up to 5 - 6 orders of magnitude over
existing estimators.Comment: Under review. 36 pages, 31 figure