Sentence Representation Learning (SRL) is a fundamental task in Natural
Language Processing (NLP), with Contrastive learning of Sentence Embeddings
(CSE) as the mainstream technique due to its superior performance. An
intriguing phenomenon in CSE is the significant performance gap between
supervised and unsupervised methods, even when their sentence encoder and loss
function are the same. Previous works attribute this performance gap to
differences in two representation properties (alignment and uniformity).
However, alignment and uniformity only measure the results, which means they
cannot answer "What happens during the training process that leads to the
performance gap?" and "How can the performance gap be narrowed?". In this
paper, we conduct empirical experiments to answer these "What" and "How"
questions. We first answer the "What" question by thoroughly comparing the
behavior of supervised and unsupervised CSE during their respective training
processes. From the comparison, We observe a significant difference in fitting
difficulty. Thus, we introduce a metric, called Fitting Difficulty Increment
(FDI), to measure the fitting difficulty gap between the evaluation dataset and
the held-out training dataset, and use the metric to answer the "What"
question. Then, based on the insights gained from the "What" question, we
tackle the "How" question by increasing the fitting difficulty of the training
dataset. We achieve this by leveraging the In-Context Learning (ICL) capability
of the Large Language Model (LLM) to generate data that simulates complex
patterns. By utilizing the hierarchical patterns in the LLM-generated data, we
effectively narrow the gap between supervised and unsupervised CSE.Comment: work in progres