Semantic neologisms (SN) are defined as words that acquire a new word meaning
while maintaining their form. Given the nature of this kind of neologisms, the
task of identifying these new word meanings is currently performed manually by
specialists at observatories of neology. To detect SN in a semi-automatic way,
we developed a system that implements a combination of the following
strategies: topic modeling, keyword extraction, and word sense disambiguation.
The role of topic modeling is to detect the themes that are treated in the
input text. Themes within a text give clues about the particular meaning of the
words that are used, for example: viral has one meaning in the context of
computer science (CS) and another when talking about health. To extract
keywords, we used TextRank with POS tag filtering. With this method, we can
obtain relevant words that are already part of the Spanish lexicon. We use a
deep learning model to determine if a given keyword could have a new meaning.
Embeddings that are different from all the known meanings (or topics) indicate
that a word might be a valid SN candidate. In this study, we examine the
following word embedding models: Word2Vec, Sense2Vec, and FastText. The models
were trained with equivalent parameters using Wikipedia in Spanish as corpora.
Then we used a list of words and their concordances (obtained from our database
of neologisms) to show the different embeddings that each model yields.
Finally, we present a comparison of these outcomes with the concordances of
each word to show how we can determine if a word could be a valid candidate for
SN.Comment: 16 pages, 3 figure