In this study, we present an investigation into the anisotropy dynamics and
intrinsic dimension of embeddings in transformer architectures, focusing on the
dichotomy between encoders and decoders. Our findings reveal that the
anisotropy profile in transformer decoders exhibits a distinct bell-shaped
curve, with the highest anisotropy concentrations in the middle layers. This
pattern diverges from the more uniformly distributed anisotropy observed in
encoders. In addition, we found that the intrinsic dimension of embeddings
increases in the initial phases of training, indicating an expansion into
higher-dimensional space. Which is then followed by a compression phase towards
the end of training with dimensionality decrease, suggesting a refinement into
more compact representations. Our results provide fresh insights to the
understanding of encoders and decoders embedding properties.Comment: Submitted to EACL-202