Text classification with semantically enriched word embeddings

Abstract

The recent breakthroughs in deep neural architectures across multiple machine learning fields have led to the widespread use of deep neural models. These learners are often applied as black-box models that ignore or insufficiently utilize a wealth of preexisting semantic information. In this study, we focus on the text classification task, investigating methods for augmenting the input to deep neural networks (DNNs) with semantic information. We extract semantics for the words in the preprocessed text from the WordNet semantic graph, in the form of weighted concept terms that form a semantic frequency vector. Concepts are selected via a variety of semantic disambiguation techniques, including a basic, a part-of-speech-based, and a semantic embedding projection method. Additionally, we consider a weight propagation mechanism that exploits semantic relationships in the concept graph and conveys a spreading activation component. We enrich word2vec embeddings with the resulting semantic vector through concatenation or replacement and apply the semantically augmented word embeddings on the classification task via a DNN. Experimental results over established datasets demonstrate that our approach of semantic augmentation in the input space boosts classification performance significantly, with concatenation offering the best performance. We also note additional interesting findings produced by our approach regarding the behavior of term frequency - inverse document frequency normalization on semantic vectors, along with the radical dimensionality reduction potential with negligible performance loss. © 2021 Cambridge University Press. All rights reserved

    Similar works