Search CORE

3 research outputs found

The Shape of Learning: Anisotropy and Intrinsic Dimensions in Transformer-Based Models

Author: Dimitrov Denis
Goncharova Elizaveta
Kuznetsov Andrey
Mikhalchuk Matvey
Oseledets Ivan
Razzhigaev Anton
Publication venue
Publication date: 10/11/2023
Field of study

In this study, we present an investigation into the anisotropy dynamics and intrinsic dimension of embeddings in transformer architectures, focusing on the dichotomy between encoders and decoders. Our findings reveal that the anisotropy profile in transformer decoders exhibits a distinct bell-shaped curve, with the highest anisotropy concentrations in the middle layers. This pattern diverges from the more uniformly distributed anisotropy observed in encoders. In addition, we found that the intrinsic dimension of embeddings increases in the initial phases of training, indicating an expansion into higher-dimensional space. Which is then followed by a compression phase towards the end of training with dimensionality decrease, suggesting a refinement into more compact representations. Our results provide fresh insights to the understanding of encoders and decoders embedding properties.Comment: Submitted to EACL-202

arXiv.org e-Print Archive

MEKER: Memory Efficient Knowledge Embedding Representation for Link Prediction and Question Answering

Author: Chekalina Viktoriia
Frolov Evgeny
Panchenko Alexander
Razzhigaev Anton
Sayapin Albert
Publication venue
Publication date: 24/05/2022
Field of study

Knowledge Graphs (KGs) are symbolically structured storages of facts. The KG embedding contains concise data used in NLP tasks requiring implicit information about the real world. Furthermore, the size of KGs that may be useful in actual NLP assignments is enormous, and creating embedding over it has memory cost issues. We represent KG as a 3rd-order binary tensor and move beyond the standard CP decomposition by using a data-specific generalized version of it. The generalization of the standard CP-ALS algorithm allows obtaining optimization gradients without a backpropagation mechanism. It reduces the memory needed in training while providing computational benefits. We propose a MEKER, a memory-efficient KG embedding model, which yields SOTA-comparable performance on link prediction tasks and KG-based Question Answering

arXiv.org e-Print Archive