Representation Of Lexical Stylistic Features In Language Models'
  Embedding Space

Apidianaki, Marianna; Callison-Burch, Chris; Lyu, Qing

Representation Of Lexical Stylistic Features In Language Models' Embedding Space

Authors: Marianna Apidianaki
Chris Callison-Burch
Qing Lyu
Publication date: 29 May 2023
Publisher

Abstract

The representation space built by pretrained Language Models (LMs) encodes rich information about words and their relationships (e.g., similarity, hypernymy/hyponymy, polysemy) as well as abstract semantic notions (e.g., intensity). In this paper, we demonstrate that lexical stylistic notions such as complexity, formality, and figurativeness, can also be identified in this space. We show that it is possible to derive a vector representation for each of these stylistic notions, from only a small number of seed text pairs. Using these vectors, we can characterize new texts in terms of these dimensions using simple calculations in the corresponding embedding space. We perform experiments on five datasets and find that static embeddings encode these features more accurately at the level of words and phrases, whereas contextualized LMs perform better on longer texts. The lower performance of contextualized representations at the word level is partially attributable to the anisotropy of their vector space, which can be corrected through techniques like standardization to further improve performance.Comment: Accepted at *SEM 202

Similar works

Full text

Available Versions

arXiv.org e-Print Archive

oai:arXiv.org:2305.18657

Last time updated on 02/06/2023