Belgrade : Institute of molecular genetics and genetic engineering
Abstract
In this study, we conducted an investigation into Long COVID from a user perspective, utilizing
Twitter social media data. Prior to analysis, the data underwent preprocessing to obtain raw text
per tweet. Our analysis commenced with basic statistical analysis and subsequently expanded to
identify characteristic periods for the phenotypes based on dynamic timelines. We also explored the
relationships between the phenotypes, as well as the interdependence between phenotypes and
geolocation.
In the context of this research, an analysis was conducted on a collection of tweets that encompassed
the timeframe from March 2020 to March 2022. The dataset consisted of approximately 1.9
million tweets. In order to concentrate on word phrases, extraneous elements such as mentions,
emoticons, links, and hashtags were eliminated. Subsequently, a process of lemmatization was
performed. For the purpose of reducing the number of distinct phenotypes under investigation
and facilitating the presentation of results, the collected data was categorized into five overarching
groups: Cardiovascular, Respiratory, Daily Living, Neurological and Mental Health, and Other.
The statistical data regarding the most commonly used words by individuals describing their
experiences during the Long COVID period are as follows: “Ampicillin” was tweeted 125,295 times,
“Death” was tweeted 121,156 times, “Suffer” was tweeted 125,113 times, and “Vaccine” was
tweeted 108,968 times. We observe distinct patterns in the emergence of certain phenotypes
during this period, particularly in relation to the quality of life. On August 1, 2020, the term “quality
of life” was mentioned in only 223 tweets, whereas one year later, during the same month, this
phenotype garnered 1,663 tweets.
Our findings reveal that the occurrence of Long COVID phenotypes is influenced by both temporal and
geographical factors. The analysis shows a clear and notable trend within the dataset. Specifically,
it is observed that neurological symptoms, along with symptoms that impede individuals’ daily
functioning, exhibit the highest prevalence, particularly during the latter half of the analyzed tweet
period. This period corresponds to a time when an increasing number of individuals have recovered
from COVID-19 and are reporting their experiences with Long COVID. Notably, fatigue, depression,
stress, and anxiety emerge as the most prevalent phenotypes.
This scientific investigation of the complex interactions between Long COVID phenotypes, mental
health, and the manifestation of diverse symptoms is offering insights into the profound consequences
on individuals’ lives. These findings shed light on the significant burden posed by Long COVID and its
cascading effects on various aspects of individuals’ well-being and society at large.Book of abstract: 4th Belgrade Bioinformatics Conference, June 19-23, 202