Pre-training on large models is prevalent and emerging with the ever-growing
user-generated content in many machine learning application categories. It has
been recognized that learning contextual knowledge from the datasets depicting
user-content interaction plays a vital role in downstream tasks. Despite
several studies attempting to learn contextual knowledge via pre-training
methods, finding an optimal training objective and strategy for this type of
task remains a challenging problem. In this work, we contend that there are two
distinct aspects of contextual knowledge, namely the user-side and the
content-side, for datasets where user-content interaction can be represented as
a bipartite graph. To learn contextual knowledge, we propose a pre-training
method that learns a bi-directional mapping between the spaces of the user-side
and the content-side. We formulate the training goal as a contrastive learning
task and propose a dual-Transformer architecture to encode the contextual
knowledge. We evaluate the proposed method for the recommendation task. The
empirical studies have demonstrated that the proposed method outperformed all
the baselines with significant gains