3 research outputs found

    Twitter and Research: A Systematic Literature Review Through Text Mining

    Get PDF
    Researchers have collected Twitter data to study a wide range of topics. This growing body of literature, however, has not yet been reviewed systematically to synthesize Twitter-related papers. The existing literature review papers have been limited by constraints of traditional methods to manually select and analyze samples of topically related papers. The goals of this retrospective study are to identify dominant topics of Twitter-based research, summarize the temporal trend of topics, and interpret the evolution of topics withing the last ten years. This study systematically mines a large number of Twitter-based studies to characterize the relevant literature by an efficient and effective approach. This study collected relevant papers from three databases and applied text mining and trend analysis to detect semantic patterns and explore the yearly development of research themes across a decade. We found 38 topics in more than 18,000 manuscripts published between 2006 and 2019. By quantifying temporal trends, this study found that while 23.7% of topics did not show a significant trend ( P=\u3e0.05 ), 21% of topics had increasing trends and 55.3% of topics had decreasing trends that these hot and cold topics represent three categories: application, methodology, and technology. The contributions of this paper can be utilized in the growing field of Twitter-based research and are beneficial to researchers, educators, and publishers

    Sentiment analysis on Twitter data using machine learning

    Get PDF
    In the world of social media people are more responsive towards product or certain events that are currently occurring. This response given by the user is in form of raw textual data (Semi Structured Data) in different languages and terms, which contains noise in data as well as critical information that encourage the analyst to discover knowledge and pattern from the dataset available. This is useful for decision making and taking strategic decision for the future market. To discover this unknown information from the linguistic data Natural Language Processing (NLP) and Data Mining techniques are most focused research terms used for sentiment analysis. In the derived approach the analysis on Twitter data to detect sentiment of the people throughout the world using machine learning techniques. Here the data set available for research is from Twitter for world cup Soccer 2014, held in Brazil. During this period, many people had given their opinion, emotion and attitude about the game, promotion, players. By filtering and analyzing the data using natural language processing techniques, and sentiment polarity has been calculated based on the emotion word detected in the user tweets. The data set is normalized to be used by machine learning algorithm and prepared using natural language processing techniques like Word Tokenization, Stemming and lemmatization, POS (Part of speech) Tagger, NER (Name Entity recognition) and parser to extract emotions for the textual data from each tweet. This approach is implemented using Python programming language and Natural Language Toolkit (NLTK), which is openly available for academic as well as for research purpose. Derived algorithm extracts emotional words using WordNet with its POS (Part-of-Speech) for the word in a sentence that has a meaning in current context, and is assigned sentiment polarity using ‘SentWordNet’ Dictionary or using lexicon based method. The resultant polarity assigned is further analyzed using Naïve Bayes and SVM (support vector Machine) machine learning algorithm and visualized data on WEKA platform. Finally, the goal is to compare both the results of implementation and prove the best approach for sentiment analysis on social media for semi structured data.Master of Science (MSc) in Computational Science

    Incremental algorithm for Decision Rule generation in data stream contexts

    Get PDF
    Actualmente, la ciencia de datos está ganando mucha atención en diferentes sectores. Concretamente en la industria, muchas aplicaciones pueden ser consideradas. Utilizar técnicas de ciencia de datos en el proceso de toma de decisiones es una de esas aplicaciones que pueden aportar valor a la industria. El incremento de la disponibilidad de los datos y de la aparición de flujos continuos en forma de data streams hace emerger nuevos retos a la hora de trabajar con datos cambiantes. Este trabajo presenta una propuesta innovadora, Incremental Decision Rules Algorithm (IDRA), un algoritmo que, de manera incremental, genera y modifica reglas de decisión para entornos de data stream para incorporar cambios que puedan aparecer a lo largo del tiempo. Este método busca proponer una nueva estructura de reglas que busca mejorar el proceso de toma de decisiones, planteando una base de conocimiento descriptiva y transparente que pueda ser integrada en una herramienta decisional. Esta tesis describe la lógica existente bajo la propuesta de IDRA, en todas sus versiones, y propone una variedad de experimentos para compararlas con un método clásico (CREA) y un método adaptativo (VFDR). Conjuntos de datos reales, juntamente con algunos escenarios simulados con diferentes tipos y ratios de error, se utilizan para comparar estos algoritmos. El estudio prueba que IDRA, específicamente la versión reactiva de IDRA (RIDRA), mejora la precisión de VFDR y CREA en todos los escenarios, tanto reales como simulados, a cambio de un incremento en el tiempo.Nowadays, data science is earning a lot of attention in many different sectors. Specifically in the industry, many applications might be considered. Using data science techniques in the decision-making process is a valuable approach among the mentioned applications. Along with this, the growth of data availability and the appearance of continuous data flows in the form of data stream arise other challenges when dealing with changing data. This work presents a novel proposal of an algorithm, Incremental Decision Rules Algorithm (IDRA), that incrementally generates and modify decision rules for data stream contexts to incorporate the changes that could appear over time. This method aims to propose new rule structures that improve the decision-making process by providing a descriptive and transparent base of knowledge that could be integrated in a decision tool. This work describes the logic underneath IDRA, in all its versions, and proposes a variety of experiments to compare them with a classical method (CREA) and an adaptive method (VFDR). Some real datasets, together with some simulated scenarios with different error types and rates are used to compare these algorithms. The study proved that IDRA, specifically the reactive version of IDRA (RIDRA), improves the accuracies of VFDR and CREA in all the studied scenarios, both real and simulated, in exchange of more time
    corecore