Click-through rate (CTR) prediction is one of the fundamental tasks for
online advertising and recommendation. While multi-layer perceptron (MLP)
serves as a core component in many deep CTR prediction models, it has been
widely recognized that applying a vanilla MLP network alone is inefficient in
learning multiplicative feature interactions. As such, many two-stream
interaction models (e.g., DeepFM and DCN) have been proposed by integrating an
MLP network with another dedicated network for enhanced CTR prediction. As the
MLP stream learns feature interactions implicitly, existing research focuses
mainly on enhancing explicit feature interactions in the complementary stream.
In contrast, our empirical study shows that a well-tuned two-stream MLP model
that simply combines two MLPs can even achieve surprisingly good performance,
which has never been reported before by existing work. Based on this
observation, we further propose feature gating and interaction aggregation
layers that can be easily plugged to make an enhanced two-stream MLP model,
FinalMLP. In this way, it not only enables differentiated feature inputs but
also effectively fuses stream-level interactions across two streams. Our
evaluation results on four open benchmark datasets as well as an online A/B
test in our industrial system show that FinalMLP achieves better performance
than many sophisticated two-stream CTR models. Our source code will be
available at MindSpore/models.Comment: Accepted by AAAI 2023. Code available at
https://xpai.github.io/FinalML