471 research outputs found

    Sentiment Analysis of Nigerian Students’ Tweets on Education: A Data Mining Approach

    Get PDF
    The paper is aimed at investigating data mining technologies by acquiring tweets from Nigerian University students on Twitter on how they feel about the current state of the Nigerian university system. The study for this paper was conducted in a way that the tweet data collected using the Twitter Application was pre-processed before being translated from text to vector representation using a feature extraction technique such Bag-of-Words. In the paper, the proposed sentiment analysis architecture was designed using UML and the Naïve Bayes classifier (NBC) approach, which is a simple but effective classifier to determine the polarity of the education dataset, was applied to compute the probabilities of the classes. Furthermore, Naïve Bayes classifier polarized the tweets' wording as negative or positive for polarity. Based on our investigation, the experiment revealed after data cleaning that 4016 of the total data obtained were utilized. Also, Positive attitudes accounted for 40.56%, while negative sentiments accounted for 59.44% of the total data having divided the dataset into 70:30 training and testing ratio, with the Naïve Bayes classifier being taught on the training set and its performance being evaluated on the test set. Because the models were trained on unbalanced data, we employed more relevant evaluation metrics such as precision, recall, F1-score, and balanced accuracy for model evaluation. The classifier's prediction accuracy, misclassification error rate, recall, precision, and f1-score were 63 %, 37%, 63%, 62%, and 62% respectively. All of the analyses were completed using the Python programming language and the Natural Language Tool Kit packages. Finally, the outcome of this prediction is the highest likelihood class. These forecasts can be used by Nigerian Government to improve the educational system and assist students to receive a better education

    Cyberbullying Detection on Twitter Using Natural Language Processing and Machine Learning Techniques

    Get PDF
    People use social media to engage and debate themes ranging from entertainment to sports to politics and many others. The use of social media has also resulted in an increase in cyberbullying, which is occurring at an alarming pace. Many cyberbullying messages may be found in the comment sections of many social media platforms, including Twitter, YouTube, and others. Cyberbullying has the ability to cause stress and mental distress, which should be detected early and avoid being published on social media platforms. In this study, we provide a system for detecting cyberbullying messages in English using natural language processing (NLP) and machine learning approaches. On Twitter, a total of 16851 tweets were gathered. The dataset was applied to an NLP approach to find the most offensive terms associated with cyberbullying. Based on our NLP results, it was clear that cyberbullying happens and must be addressed as soon as possible. The dataset was also utilized to train the random forest (RF) and support vector machine (SVM) algorithms. Random forest surpassed support vector machine, which attained an accuracy of 90.5%, with 98.5%. With careful attention to data preparation, where missing and outlier values are dealt beforehand, the high percentage of the model is obtained. This method facilitates the analysis of the available data at the expense of the study's statistical power and ultimately the validity of its findings. Additionally, it aids in producing a significant bias in the outcomes and increases the effectiveness of the data. The Root mean square error and mean square error were used to analyse the results. In comparison to the support vector machine, the random forest earned the best error score. Our findings may be utilized by agencies and groups to educate individuals about the proper use of social media in order to avoid cyberbullying

    Twitter Activity Of Urban And Rural Colleges: A Sentiment Analysis Using The Dialogic Loop

    Get PDF
    The purpose of the present study is to ascertain if colleges are achieving their ultimate communication goals of maintaining and attracting students through their microblogging activity, which according to Dialogic Loop Theory, is directly correlated to the use of positive and negative sentiment. The study focused on a cross-section of urban and rural community colleges within the United States to identify the sentiment score of their microblogging activity. The study included a content analysis on the Twitter activity of these colleges. A data-mining process was employed to collect a census of the tweets associated with these colleges. Further processing was then applied using data linguistic software that removed all irrelevant text, word abbreviations, emoticons, and other Twitter specific classifiers. The resulting data set was then processed through a Multinomial Naive Bayes Classifier, which refers to a probability of word counts in a text. The classifier was trained using a data source of 1.5 million tweets, called Sentiment140, that qualitatively analyzed the corpus of these tweets, labeling them as positive and negative sentiment. The Multinomial Naive Bayes Classifier distinguished specific wording and phrases from the corpus, comparing the data to a specific database of sentiment word identifiers. The sentiment analysis process categorized the text as being positive or negative. Finally, statistical analysis was conducted on the outcome of the sentiment analysis. A significant contribution of the current work was extending Kent and Taylor\u27s (1998) Dialogic Loop Theory, which was designed specifically for identifying the relationship building capabilities of a Web site, to encompass the microblogging concept used in Twitter. Specifically, Dialogic Loop Theory is applied and enhanced to develop a model for social media communication to augment relationship building capabilities, which the current study established as a new form for evaluating Twitter tweets, labeled in the current body of work as Microblog Dialogic Communication. The implication is that by using Microblog Dialogic Communication, a college can address and correct their microblogging sentiment. The results of the data collected found that rural colleges tweeted more positive sentiment tweets and less negative sentiment tweets when compared to the urban colleges tweets

    Fame for sale: efficient detection of fake Twitter followers

    Get PDF
    Fake followers\textit{Fake followers} are those Twitter accounts specifically created to inflate the number of followers of a target account. Fake followers are dangerous for the social platform and beyond, since they may alter concepts like popularity and influence in the Twittersphere - hence impacting on economy, politics, and society. In this paper, we contribute along different dimensions. First, we review some of the most relevant existing features and rules (proposed by Academia and Media) for anomalous Twitter accounts detection. Second, we create a baseline dataset of verified human and fake follower accounts. Such baseline dataset is publicly available to the scientific community. Then, we exploit the baseline dataset to train a set of machine-learning classifiers built over the reviewed rules and features. Our results show that most of the rules proposed by Media provide unsatisfactory performance in revealing fake followers, while features proposed in the past by Academia for spam detection provide good results. Building on the most promising features, we revise the classifiers both in terms of reduction of overfitting and cost for gathering the data needed to compute the features. The final result is a novel Class A\textit{Class A} classifier, general enough to thwart overfitting, lightweight thanks to the usage of the less costly features, and still able to correctly classify more than 95% of the accounts of the original training set. We ultimately perform an information fusion-based sensitivity analysis, to assess the global sensitivity of each of the features employed by the classifier. The findings reported in this paper, other than being supported by a thorough experimental methodology and interesting on their own, also pave the way for further investigation on the novel issue of fake Twitter followers

    Analyzing the drivers of customer satisfaction via social media

    Get PDF
    Social media became a great influence force during the last decade. Active social media user population increased with the new generations. Thus, data started to accumulate in tremendous amounts. Data accumulated through social media offers an opportunity to reach valuable insights and support business decisions. The aim of this project is to understand the drivers of customer satisfaction by public sentiments on Twitter towards a financial institution. Data was extracted from the most popular microblogging platform Twitter and sentiment analysis was performed. The unstructured data was classified by their sentiments with a lexicon-based model and a machine learning based model. The outcome of this study showed machine learning based model successfully overcame the language specific problems and was able to make better predictions where lexicon-based model struggled. Further analysis was performed on the extreme daily average sentiment scores to match these days with prominent events. The results showed that the public sentiment on Twitter is driven by three main themes; complaints related to services, advertisement campaigns, and influencers’ impact.Sosyal medyanın etki alanı geçtiğimiz yıllarla birlikte giderek artmıştır. Yeni jenerasyonlarla birlikte aktif olarak sosyal medya kullanan nüfus artış göstermiştir. Bu sebeple büyük veri birikimi artmıştır. Sosyal medya üzerinden oluşan büyük veri şirketlerin iş yapış şekillerine yönelik değerli kavrayış ve karar alma mekanizmalarına destek fırsatları sunmaktadır. Bu çalışmanın amacı bir finansal kurumun müşterilerinin memnuniyet seviyelerini sosyal medyada oluşan algıyı kullanarak anlamaya çalışmaktır. Çalışma kapsamında kullanılan veri popüler mikro-blog sitesi Twitter üzerinden derlenmiştir. Yapılandırılmamış bu veri sözlük tabanlı ve makine öğrenmesi tabanlı iki model kullanılarak analiz edilmiştir. Çalışma sonucu makine öğrenmesi tabanlı modelin sözlük tabanlı modelin karşılaştığı Türkçe kaynaklı sorunlardan daha az etkilendiği ve daha başarılı tahminler üretebildiğini göstermiştir. Analizin sonraki aşamasında ortalama sonucu aşırı uçlarda çıkan günler aynı günlerde ortaya çıkan olaylar ile eşleştirilmiştir. Ortaya çıkan sonuçlara göre müşteri memnuniyeti sosyal medyada ortaya çıkan üç temel faktörden etkilenmektedir. Bunlar, şikâyet yönetimi, kampanya yönetimi ve sosyal medya fenomenlerinin etkisi olarak tanımlanmaktadır

    Classifying Crises-Information Relevancy with Semantics

    Get PDF
    Social media platforms have become key portals for sharing and consuming information during crisis situations. However, humanitarian organisations and affected communities often struggle to sieve through the large volumes of data that are typically shared on such platforms during crises to determine which posts are truly relevant to the crisis, and which are not. Previous work on automatically classifying crisis information was mostly focused on using statistical features. However, such approaches tend to be inappropriate when processing data on a type of crisis that the model was not trained on, such as processing information about a train crash, whereas the classifier was trained on floods, earthquakes, and typhoons. In such cases, the model will need to be retrained, which is costly and time-consuming. In this paper, we explore the impact of semantics in classifying Twitter posts across same, and different, types of crises. We experiment with 26 crisis events, using a hybrid system that combines statistical features with various semantic features extracted from external knowledge bases. We show that adding semantic features has no noticeable benefit over statistical features when classifying same-type crises, whereas it enhances the classifier performance by up to 7.2% when classifying information about a new type of crisis