Creación de corpus de palabras embebidas de tweets generados en Argentina

Amor, Matias Nicolas; Cardoso, Alejandra Carolina; Monge, Agustina; Talamé, Maria Lorena

Creación de corpus de palabras embebidas de tweets generados en Argentina

Authors: Matias Nicolas Amor
Alejandra Carolina Cardoso
Agustina Monge
Maria Lorena Talamé
Publication date: 13 December 2021
Publisher: Universidad Catolica de Salta
Doi

Abstract

El procesamiento de textos de cualquier índole es una tarea de gran interés en la comunidad científica. Una de las redes sociales donde frecuentemente las personas se expresan libremente es Twitter, y por lo tanto, es una de las principales fuentes para obtener datos textuales. Para poder realizar cualquier tipo de análisis, como primer paso se debe representar los textos de manera adecuada para que, luego, puedan ser usados por un algoritmo. En este artículo se describe la creación de un corpus de representaciones de palabras obtenidas de Twitter, utilizando Word2Vec. Si bien los conjuntos de tweets utilizados no son masivos, se consideran suficientes para dar el primer paso en la creación de un corpus. Un aporte importante de este trabajo es el entrenamiento de un modelo que captura los modismos y expresiones coloquiales de Argentina, y que incluye emojis y hashtags dentro del espacio vectorial.Text processing of any kind is a task of great interest in the scientific community. One of the social networks where people frequently express themselves freely is Twitter, and therefore, it is one of the main sources for obtaining textual data. In order to perform any type of analysis, the first step is to represent the texts in a suitable way so that they can then be used by an algorithm. This paper describes the creation of a corpus of word representations obtained from Twitter using Word2Vec. Although the sets of tweets used are not massive, they are considered sufficient to take the first step in the creation of a corpus. An important contribution of this work is the training of a model that captures the idioms and colloquial expressions of Argentina, and includes emojis and hashtags within the vector space

Similar works

Full text

Open in the Core reader

Download PDF

Available Versions

Portal de revistas científicas UCASAL (Universidad Católica de Salta)

oai:http://revistas.ucasal.edu...

Last time updated on 05/05/2022