2 research outputs found
Toward Understanding Privileged Features Distillation in Learning-to-Rank
In learning-to-rank problems, a privileged feature is one that is available
during model training, but not available at test time. Such features naturally
arise in merchandised recommendation systems; for instance, "user clicked this
item" as a feature is predictive of "user purchased this item" in the offline
data, but is clearly not available during online serving. Another source of
privileged features is those that are too expensive to compute online but
feasible to be added offline. Privileged features distillation (PFD) refers to
a natural idea: train a "teacher" model using all features (including
privileged ones) and then use it to train a "student" model that does not use
the privileged features.
In this paper, we first study PFD empirically on three public ranking
datasets and an industrial-scale ranking problem derived from Amazon's logs. We
show that PFD outperforms several baselines (no-distillation,
pretraining-finetuning, self-distillation, and generalized distillation) on all
these datasets. Next, we analyze why and when PFD performs well via both
empirical ablation studies and theoretical analysis for linear models. Both
investigations uncover an interesting non-monotone behavior: as the predictive
power of a privileged feature increases, the performance of the resulting
student model initially increases but then decreases. We show the reason for
the later decreasing performance is that a very predictive privileged teacher
produces predictions with high variance, which lead to high variance student
estimates and inferior testing performance.Comment: Accepted by NeurIPS 202
PLAtE: A Large-scale Dataset for List Page Web Extraction
Recently, neural models have been leveraged to significantly improve the
performance of information extraction from semi-structured websites. However, a
barrier for continued progress is the small number of datasets large enough to
train these models. In this work, we introduce the PLAtE (Pages of Lists
Attribute Extraction) dataset as a challenging new web extraction task. PLAtE
focuses on shopping data, specifically extractions from product review pages
with multiple items. PLAtE encompasses both the tasks of: (1) finding
product-list segmentation boundaries and (2) extracting attributes for each
product. PLAtE is composed of 53, 905 items from 6, 810 pages, making it the
first large-scale list page web extraction dataset. We construct PLAtE by
collecting list pages from Common Crawl, then annotating them on Mechanical
Turk. Quantitative and qualitative analyses are performed to demonstrate PLAtE
has high-quality annotations. We establish strong baseline performance on PLAtE
with a SOTA model achieving an F1-score of 0.750 for attribute classification
and 0.915 for segmentation, indicating opportunities for future research
innovations in web extraction