The Shape of Learning: Anisotropy and Intrinsic Dimensions in
  Transformer-Based Models

Dimitrov, Denis; Goncharova, Elizaveta; Kuznetsov, Andrey; Mikhalchuk, Matvey; Oseledets, Ivan; Razzhigaev, Anton

The Shape of Learning: Anisotropy and Intrinsic Dimensions in Transformer-Based Models

Authors: Denis Dimitrov
Elizaveta Goncharova
Andrey Kuznetsov
Matvey Mikhalchuk
Ivan Oseledets
Anton Razzhigaev
Publication date: 10 November 2023
Publisher

Abstract

In this study, we present an investigation into the anisotropy dynamics and intrinsic dimension of embeddings in transformer architectures, focusing on the dichotomy between encoders and decoders. Our findings reveal that the anisotropy profile in transformer decoders exhibits a distinct bell-shaped curve, with the highest anisotropy concentrations in the middle layers. This pattern diverges from the more uniformly distributed anisotropy observed in encoders. In addition, we found that the intrinsic dimension of embeddings increases in the initial phases of training, indicating an expansion into higher-dimensional space. Which is then followed by a compression phase towards the end of training with dimensionality decrease, suggesting a refinement into more compact representations. Our results provide fresh insights to the understanding of encoders and decoders embedding properties.Comment: Submitted to EACL-202

Similar works

Full text

Available Versions

arXiv.org e-Print Archive

oai:arXiv.org:2311.05928

Last time updated on 10/02/2024