2 research outputs found
Feature Selection Methods for Uplift Modeling
Uplift modeling is a predictive modeling technique that estimates the
user-level incremental effect of a treatment using machine learning models. It
is often used for targeting promotions and advertisements, as well as for the
personalization of product offerings. In these applications, there are often
hundreds of features available to build such models. Keeping all the features
in a model can be costly and inefficient. Feature selection is an essential
step in the modeling process for multiple reasons: improving the estimation
accuracy by eliminating irrelevant features, accelerating model training and
prediction speed, reducing the monitoring and maintenance workload for feature
data pipeline, and providing better model interpretation and diagnostics
capability. However, feature selection methods for uplift modeling have been
rarely discussed in the literature. Although there are various feature
selection methods for standard machine learning models, we will demonstrate
that those methods are sub-optimal for solving the feature selection problem
for uplift modeling. To address this problem, we introduce a set of feature
selection methods designed specifically for uplift modeling, including both
filter methods and embedded methods. To evaluate the effectiveness of the
proposed feature selection methods, we use different uplift models and measure
the accuracy of each model with a different number of selected features. We use
both synthetic and real data to conduct these experiments. We also implemented
the proposed filter methods in an open source Python package (CausalML)
Exploring uplift modeling with high class imbalance
Uplift modeling refers to individual level causal inference. Existing research on the topic ignores one prevalent and important aspect: high class imbalance. For instance in online environments uplift modeling is used to optimally target ads and discounts, but very few users ever end up clicking an ad or buying. One common approach to deal with imbalance in classification is by undersampling the dataset. In this work, we show how undersampling can be extended to uplift modeling. We propose four undersampling methods for uplift modeling. We compare the proposed methods empirically and show when some methods have a tendency to break down. One key observation is that accounting for the imbalance is particularly important for uplift random forests, which explains the poor performance of the model in earlier works. Undersampling is also crucial for class-variable transformation based models.Peer reviewe