Vector representations of natural language are ubiquitous in search
applications. Recently, various methods based on contrastive learning have been
proposed to learn textual representations from unlabelled data; by maximizing
alignment between minimally-perturbed embeddings of the same text, and
encouraging a uniform distribution of embeddings across a broader corpus.
Differently, we propose maximizing alignment between texts and a composition of
their phrasal constituents. We consider several realizations of this objective
and elaborate the impact on representations in each case. Experimental results
on semantic textual similarity tasks show improvements over baselines that are
comparable with state-of-the-art approaches. Moreover, this work is the first
to do so without incurring costs in auxiliary training objectives or additional
network parameters.Comment: ACL 202