The recent success of SimCSE has greatly advanced state-of-the-art sentence
representations. However, the original formulation of SimCSE does not fully
exploit the potential of hard negative samples in contrastive learning. This
study introduces an unsupervised contrastive learning framework that combines
SimCSE with hard negative mining, aiming to enhance the quality of sentence
embeddings. The proposed focal-InfoNCE function introduces self-paced
modulation terms in the contrastive objective, downweighting the loss
associated with easy negatives and encouraging the model focusing on hard
negatives. Experimentation on various STS benchmarks shows that our method
improves sentence embeddings in terms of Spearman's correlation and
representation alignment and uniformity.Comment: Findings of emnlp 202