3 research outputs found
Twitter and Research: A Systematic Literature Review Through Text Mining
Researchers have collected Twitter data to study a wide range of topics. This growing body of literature, however, has not yet been reviewed systematically to synthesize Twitter-related papers. The existing literature review papers have been limited by constraints of traditional methods to manually select and analyze samples of topically related papers. The goals of this retrospective study are to identify dominant topics of Twitter-based research, summarize the temporal trend of topics, and interpret the evolution of topics withing the last ten years. This study systematically mines a large number of Twitter-based studies to characterize the relevant literature by an efficient and effective approach. This study collected relevant papers from three databases and applied text mining and trend analysis to detect semantic patterns and explore the yearly development of research themes across a decade. We found 38 topics in more than 18,000 manuscripts published between 2006 and 2019. By quantifying temporal trends, this study found that while 23.7% of topics did not show a significant trend ( P=\u3e0.05 ), 21% of topics had increasing trends and 55.3% of topics had decreasing trends that these hot and cold topics represent three categories: application, methodology, and technology. The contributions of this paper can be utilized in the growing field of Twitter-based research and are beneficial to researchers, educators, and publishers
Sentiment analysis on Twitter data using machine learning
In the world of social media people are more responsive towards product or certain events
that are currently occurring. This response given by the user is in form of raw textual data
(Semi Structured Data) in different languages and terms, which contains noise in data as
well as critical information that encourage the analyst to discover knowledge and pattern
from the dataset available. This is useful for decision making and taking strategic decision
for the future market. To discover this unknown information from the linguistic data Natural Language
Processing (NLP) and Data Mining techniques are most focused research terms used for
sentiment analysis. In the derived approach the analysis on Twitter data to detect sentiment
of the people throughout the world using machine learning techniques. Here the data set
available for research is from Twitter for world cup Soccer 2014, held in Brazil. During
this period, many people had given their opinion, emotion and attitude about the game,
promotion, players. By filtering and analyzing the data using natural language processing
techniques, and sentiment polarity has been calculated based on the emotion word detected
in the user tweets. The data set is normalized to be used by machine learning algorithm and
prepared using natural language processing techniques like Word Tokenization, Stemming
and lemmatization, POS (Part of speech) Tagger, NER (Name Entity recognition) and
parser to extract emotions for the textual data from each tweet. This approach is implemented using Python programming language and Natural Language Toolkit (NLTK),
which is openly available for academic as well as for research purpose. Derived algorithm
extracts emotional words using WordNet with its POS (Part-of-Speech) for the word in a
sentence that has a meaning in current context, and is assigned sentiment polarity using
‘SentWordNet’ Dictionary or using lexicon based method. The resultant polarity assigned
is further analyzed using Naïve Bayes and SVM (support vector Machine) machine
learning algorithm and visualized data on WEKA platform. Finally, the goal is to compare
both the results of implementation and prove the best approach for sentiment analysis on
social media for semi structured data.Master of Science (MSc) in Computational Science
Incremental algorithm for Decision Rule generation in data stream contexts
Actualmente, la ciencia de datos está ganando mucha atención en diferentes sectores.
Concretamente en la industria, muchas aplicaciones pueden ser consideradas. Utilizar
técnicas de ciencia de datos en el proceso de toma de decisiones es una de esas
aplicaciones que pueden aportar valor a la industria. El incremento de la disponibilidad
de los datos y de la aparición de flujos continuos en forma de data streams hace
emerger nuevos retos a la hora de trabajar con datos cambiantes. Este trabajo presenta
una propuesta innovadora, Incremental Decision Rules Algorithm (IDRA), un
algoritmo que, de manera incremental, genera y modifica reglas de decisión para
entornos de data stream para incorporar cambios que puedan aparecer a lo largo del
tiempo. Este método busca proponer una nueva estructura de reglas que busca mejorar
el proceso de toma de decisiones, planteando una base de conocimiento descriptiva y
transparente que pueda ser integrada en una herramienta decisional. Esta tesis describe
la lógica existente bajo la propuesta de IDRA, en todas sus versiones, y propone una
variedad de experimentos para compararlas con un método clásico (CREA) y un
método adaptativo (VFDR). Conjuntos de datos reales, juntamente con algunos
escenarios simulados con diferentes tipos y ratios de error, se utilizan para comparar
estos algoritmos. El estudio prueba que IDRA, especÃficamente la versión reactiva de
IDRA (RIDRA), mejora la precisión de VFDR y CREA en todos los escenarios, tanto
reales como simulados, a cambio de un incremento en el tiempo.Nowadays, data science is earning a lot of attention in many different sectors.
Specifically in the industry, many applications might be considered. Using data
science techniques in the decision-making process is a valuable approach among the
mentioned applications. Along with this, the growth of data availability and the
appearance of continuous data flows in the form of data stream arise other challenges
when dealing with changing data. This work presents a novel proposal of an algorithm,
Incremental Decision Rules Algorithm (IDRA), that incrementally generates and
modify decision rules for data stream contexts to incorporate the changes that could
appear over time. This method aims to propose new rule structures that improve the
decision-making process by providing a descriptive and transparent base of knowledge
that could be integrated in a decision tool. This work describes the logic underneath
IDRA, in all its versions, and proposes a variety of experiments to compare them with
a classical method (CREA) and an adaptive method (VFDR). Some real datasets,
together with some simulated scenarios with different error types and rates are used to
compare these algorithms. The study proved that IDRA, specifically the reactive
version of IDRA (RIDRA), improves the accuracies of VFDR and CREA in all the
studied scenarios, both real and simulated, in exchange of more time