136 research outputs found
Generalized Delayed Feedback Model with Post-Click Information in Recommender Systems
Predicting conversion rate (e.g., the probability that a user will purchase
an item) is a fundamental problem in machine learning based recommender
systems. However, accurate conversion labels are revealed after a long delay,
which harms the timeliness of recommender systems. Previous literature
concentrates on utilizing early conversions to mitigate such a delayed feedback
problem. In this paper, we show that post-click user behaviors are also
informative to conversion rate prediction and can be used to improve
timeliness. We propose a generalized delayed feedback model (GDFM) that unifies
both post-click behaviors and early conversions as stochastic post-click
information, which could be utilized to train GDFM in a streaming manner
efficiently. Based on GDFM, we further establish a novel perspective that the
performance gap introduced by delayed feedback can be attributed to a temporal
gap and a sampling gap. Inspired by our analysis, we propose to measure the
quality of post-click information with a combination of temporal distance and
sample complexity. The training objective is re-weighted accordingly to
highlight informative and timely signals. We validate our analysis on public
datasets, and experimental performance confirms the effectiveness of our
method.Comment: NeurIPS'2
Beyond Probability Partitions: Calibrating Neural Networks with Semantic Aware Grouping
Research has shown that deep networks tend to be overly optimistic about
their predictions, leading to an underestimation of prediction errors. Due to
the limited nature of data, existing studies have proposed various methods
based on model prediction probabilities to bin the data and evaluate
calibration error. We propose a more generalized definition of calibration
error called Partitioned Calibration Error (PCE), revealing that the key
difference among these calibration error metrics lies in how the data space is
partitioned. We put forth an intuitive proposition that an accurate model
should be calibrated across any partition, suggesting that the input space
partitioning can extend beyond just the partitioning of prediction
probabilities, and include partitions directly related to the input. Through
semantic-related partitioning functions, we demonstrate that the relationship
between model accuracy and calibration lies in the granularity of the
partitioning function. This highlights the importance of partitioning criteria
for training a calibrated and accurate model. To validate the aforementioned
analysis, we propose a method that involves jointly learning a semantic aware
grouping function based on deep model features and logits to partition the data
space into subsets. Subsequently, a separate calibration function is learned
for each subset. Experimental results demonstrate that our approach achieves
significant performance improvements across multiple datasets and network
architectures, thus highlighting the importance of the partitioning function
for calibration
The Capacity and Robustness Trade-off: Revisiting the Channel Independent Strategy for Multivariate Time Series Forecasting
Multivariate time series data comprises various channels of variables. The
multivariate forecasting models need to capture the relationship between the
channels to accurately predict future values. However, recently, there has been
an emergence of methods that employ the Channel Independent (CI) strategy.
These methods view multivariate time series data as separate univariate time
series and disregard the correlation between channels. Surprisingly, our
empirical results have shown that models trained with the CI strategy
outperform those trained with the Channel Dependent (CD) strategy, usually by a
significant margin. Nevertheless, the reasons behind this phenomenon have not
yet been thoroughly explored in the literature. This paper provides
comprehensive empirical and theoretical analyses of the characteristics of
multivariate time series datasets and the CI/CD strategy. Our results conclude
that the CD approach has higher capacity but often lacks robustness to
accurately predict distributionally drifted time series. In contrast, the CI
approach trades capacity for robust prediction. Practical measures inspired by
these analyses are proposed to address the capacity and robustness dilemma,
including a modified CD method called Predict Residuals with Regularization
(PRReg) that can surpass the CI strategy. We hope our findings can raise
awareness among researchers about the characteristics of multivariate time
series and inspire the construction of better forecasting models.Comment: under revie
Few-Shot Learning with a Strong Teacher
Few-shot learning (FSL) aims to train a strong classifier using limited
labeled examples. Many existing works take the meta-learning approach, sampling
few-shot tasks in turn and optimizing the few-shot learner's performance on
classifying the query examples. In this paper, we point out two potential
weaknesses of this approach. First, the sampled query examples may not provide
sufficient supervision for the few-shot learner. Second, the effectiveness of
meta-learning diminishes sharply with increasing shots (i.e., the number of
training examples per class). To resolve these issues, we propose a novel
objective to directly train the few-shot learner to perform like a strong
classifier. Concretely, we associate each sampled few-shot task with a strong
classifier, which is learned with ample labeled examples. The strong classifier
has a better generalization ability and we use it to supervise the few-shot
learner. We present an efficient way to construct the strong classifier, making
our proposed objective an easily plug-and-play term to existing meta-learning
based FSL methods. We validate our approach in combinations with many
representative meta-learning methods. On several benchmark datasets including
miniImageNet and tiredImageNet, our approach leads to a notable improvement
across a variety of tasks. More importantly, with our approach, meta-learning
based FSL methods can consistently outperform non-meta-learning based ones,
even in a many-shot setting, greatly strengthening their applicability
Unlocking the Transferability of Tokens in Deep Models for Tabular Data
Fine-tuning a pre-trained deep neural network has become a successful
paradigm in various machine learning tasks. However, such a paradigm becomes
particularly challenging with tabular data when there are discrepancies between
the feature sets of pre-trained models and the target tasks. In this paper, we
propose TabToken, a method aims at enhancing the quality of feature tokens
(i.e., embeddings of tabular features). TabToken allows for the utilization of
pre-trained models when the upstream and downstream tasks share overlapping
features, facilitating model fine-tuning even with limited training examples.
Specifically, we introduce a contrastive objective that regularizes the tokens,
capturing the semantics within and across features. During the pre-training
stage, the tokens are learned jointly with top-layer deep models such as
transformer. In the downstream task, tokens of the shared features are kept
fixed while TabToken efficiently fine-tunes the remaining parts of the model.
TabToken not only enables knowledge transfer from a pre-trained model to tasks
with heterogeneous features, but also enhances the discriminative ability of
deep tabular models in standard classification and regression tasks
A Model or 603 Exemplars: Towards Memory-Efficient Class-Incremental Learning
Real-world applications require the classification model to adapt to new
classes without forgetting old ones. Correspondingly, Class-Incremental
Learning (CIL) aims to train a model with limited memory size to meet this
requirement. Typical CIL methods tend to save representative exemplars from
former classes to resist forgetting, while recent works find that storing
models from history can substantially boost the performance. However, the
stored models are not counted into the memory budget, which implicitly results
in unfair comparisons. We find that when counting the model size into the total
budget and comparing methods with aligned memory size, saving models do not
consistently work, especially for the case with limited memory budgets. As a
result, we need to holistically evaluate different CIL methods at different
memory scales and simultaneously consider accuracy and memory size for
measurement. On the other hand, we dive deeply into the construction of the
memory buffer for memory efficiency. By analyzing the effect of different
layers in the network, we find that shallow and deep layers have different
characteristics in CIL. Motivated by this, we propose a simple yet effective
baseline, denoted as MEMO for Memory-efficient Expandable MOdel. MEMO extends
specialized layers based on the shared generalized representations, efficiently
extracting diverse representations with modest cost and maintaining
representative exemplars. Extensive experiments on benchmark datasets validate
MEMO's competitive performance. Code is available at:
https://github.com/wangkiw/ICLR23-MEMOComment: Accepted to ICLR 2023 as a Spotlight Presentation. Code is available
at: https://github.com/wangkiw/ICLR23-MEM
- …