MoCoUTRL: a momentum contrastive framework for unsupervised text representation learning

Abstract

This paper presents MoCoUTRL: a Momentum Contrastive Framework for Unsupervised Text Representation Learning. This model improves two aspects of recently popular contrastive learning algorithms in natural language processing (NLP). Firstly, MoCoUTRL employs multi-granularity semantic contrastive learning objectives, enabling a more comprehensive understanding of the semantic features of samples. Secondly, MoCoUTRL uses a dynamic dictionary to act as the approximately ground-truth representation for each token, providing the pseudo labels for token-level contrastive learning. The MoCoUTRL can extend the use of pre-trained language models (PLM) and even large-scale language models (LLM) into a plug-and-play semantic feature extractor that can fuel multiple downstream tasks. Experimental results on several publicly available datasets and further theoretical analysis validate the effectiveness and interpretability of the proposed method in this paper

    Similar works

    Full text

    thumbnail-image