327 research outputs found

    Detection of Americans’ Behavior toward Islam on Facebook

    Get PDF
    Social network websites have become a rich place for detecting and analyzing people’s attitudes, perceptions, and feelings towards news, products,  and other real-world issues. Facebook is a popular platform among different age groups and countries and is generally used to convey ideas about certain topics based on likes, comments and sharing. In recent years, one of the most controversial topics were the idea behind Islamophobia and other ideas built in people’s minds about Islam around the world. This research studied the public opinion of American citizens about Islam during the presidency of Donald Trump, as that period was rich in diversity of opinion between his supporters and detractors. In this paper, sentiment analysis was used to analyze American citizens’ behavior towards posts about Islam during Trump’s presidency in various states across the United States. Sentiment analysis was performed on Facebook posts and comments extracted from American news channels from the year 2017. Several machine learning methods were used to detect the polarity in the dataset. The highest classification accuracy among the classifiers used in this research was achieved using a logistic regression classifier, reaching 84%

    Identification and monitoring polarization from social network perspective

    Get PDF
    Abstract. Polarization is a new phenomenon that threatens the cohesion and social development of our society. The raise of social media is known to have contributed significantly to the emergence of this phenomenon as it can be noticed from the multiplication of far right and racist online communities as well as the ill-structured political discourse. This can be noticed from scrutinizing recent US or EU elections. Automatic identification of polarization from social media plays a key role in devising appropriate defence strategy to tackle the issue and avoid escalation. This thesis implements several methods to identify polarization from Twitter data issued from Trump-Clinton US election campaign using metrics like Belief Polarization Index (BPI) and Sentiment Analysis. Furtherly, semantic role labelling and argument mining were applied to derive structure of arguments of polarized discourse. Especially, we constructed thirteen topics of interests that were used as potential candidates for polarized discourse. For each topic, the cosine distance of the frequency of the topic overtime between the two candidates was used to indicate the polarization (called as Belief Polarization Index). The statistics inference of sentiment scores was implemented to convey either a positive or negative polarity, which are then further examined using argument structure. All the proposed approaches provide attempts to measure the polarization between two individuals from different perspectives, which may give some hints or references for future research.Tiivistelmä. Polarisaatio on uusi ilmiö, joka uhkaa yhteiskuntamme yhteenkuuluvuutta ja sosiaalista kehitystä. Sosiaalisen median nousun tiedetään vaikuttaneen merkittävästi tämän ilmiön syntymiseen, koska se voidaan havaita äärioikeistolaisten ja rasististen verkkoyhteisöjen lisääntymisestä sekä huonosti jäsennellystä poliittisesta keskustelusta. Tämä voidaan havaita tarkastelemalla äskettäisiä Yhdysvaltojen tai EU: n vaaleja. Polarisaation automaattisella tunnistamisella sosiaalisesta mediasta on keskeinen rooli sopivan puolustusstrategian suunnittelussa ongelman ratkaisemiseksi ja eskalaation välttämiseksi. Tässä opinnäytetyössä toteutetaan useita menetelmiä polarisaation tunnistamiseksi Yhdysvaltain Trump-Clintonin vaalikampanjan Twitter-tiedoista käyttämällä mittareita, kuten vakaumuspolarisaatio indeksi (BPI) ja mielipiteiden analyysi. Lisäksi semanttisen roolin merkintöjä ja argumenttien louhintaa sovellettiin polarisoidun diskurssin argumenttien rakenteen johtamiseen. Erityisesti rakensimme kolmetoista aihepiiriä, joita käytettiin potentiaalisina ehdokkaina polarisoituneeseen keskusteluun. Kunkin aiheen kohdalla kahden ehdokkaan aiheiden ylityötiheyden kosinietäisyyttä käytettiin osoittamaan polarisaatiota (kutsutaan nimellä Belief Polarization Index). Tunnelmapisteiden tilastollinen päättely toteutettiin joko positiivisen tai negatiivisen napaisuuden välittämiseksi, joita sitten tutkitaan edelleen argumenttirakennetta käyttäen. Kaikki ehdotetut lähestymistavat tarjoavat yrityksiä mitata kahden ihmisen välistä polarisaatiota eri näkökulmista, mikä saattaa antaa vihjeitä tai viitteitä tulevaa tutkimusta varten

    Non-Query-Based Pattern Mining and Sentiment Analysis for Massive Microblogging Online Texts

    Get PDF
    Pattern mining has been widely studied in the last decade given its great interest for research and its numerous applications in the real world. In this paper the definition of query and non-query based systems is proposed, highlighting the needs of non-query based systems in the era of Big Data. For this, we propose a new approach of a non-query based system that combines association rules, generalized rules and sentiment analysis in order to catalogue and discover opinion patterns in the social network Twitter. Association rules have been previously applied for sentiment analysis, but in most cases, they are used once the process of sentiment analysis is finished to see which tokens appear commonly related to a certain sentiment. On the other hand, they have also been used to discover patterns between sentiments. Our work differs from these in that it proposes a non-query based system which combines both techniques, in a mixed proposal of sentiment analysis and association rules to discover patterns and sentiment patterns in microblogging texts. The obtained rules generalize and summarize the sentiments obtained from a group of tweets about any character, brand or product mentioned in them. To study the performance of the proposed system, an initial set of 1.7 million tweets have been employed to analyse the most salient sentiments during the American pre-election campaign. The analysis of the obtained results supports the capability of the system of obtaining association rules and patterns with great descriptive value in this use case. Parallelisms can be established in these patterns that match perfectly with real life events.COPKIT Project, through the European Union's Horizon 2020 Research and Innovation Programme 786687Spanish Ministry for Economy and Competitiveness TIN2015-64776-C3-1-RAndalusian Government, through Data Analysis in Medicine: from Medical Records to Big Data Project P18-RT-2947Spanish Ministry of Education, Culture, and Sport FPU18/00150University of Granad

    Leveraging machine learning to analyze sentiment from COVID-19 tweets: A global perspective

    Get PDF
    Since the advent of the worldwide COVID-19 pandemic, analyzing public sentiment has become one of the major concerns for policy and decision-makers. While the priority is to curb the spread of the virus, mass population (user) sentiment analysis is equally important. Though sentiment analysis using different state-of-the-art technologies has been focused on during the COVID-19 pandemic, the reasons behind the variations in public sentiment are yet to be explored. Moreover, how user sentiment varies due to the COVID-19 pandemic from a cross-country perspective has been less focused on. Therefore, the objectives of this study are: to identify the most effective machine learning (ML) technique for classifying public sentiments, to analyze the variations of public sentiment across the globe, and to find the critical contributing factors to sentiment variations. To attain the objectives, 12,000 tweets, 3000 each from the USA, UK, and Bangladesh, were rigorously annotated by three independent reviewers. Based on the labeled tweets, four different boosting ML models, namely, CatBoost, gradient boost, AdaBoost, and XGBoost, are investigated. Next, the top performed ML model predicted sentiment of 300,000 data (100,000 from each country). The public perceptions have been analyzed based on the labeled data. As an outcome, the CatBoost model showed the highest (85.8 %) F1-score, followed by gradient boost (84.3%), AdaBoost (78.9 %), and XGBoost (83.1 %). Second, it was revealed that during the time of the COVID-19 pandemic, the sentiments of the people of the three countries mainly were negative, followed by positive and neutral. Finally, this study identified a few critical concerns that impact primarily varying public sentiment around the globe: lockdown, quarantine, hospital, mask, vaccine, and the like

    Sentiment Analysis of Twitter Data

    Get PDF
    The rapid expansion and acceptance of social media has opened doors into users’ opinions and perceptions that were never as accessible as they are with today\u27s prevalence of mobile technology. Harvested data, analyzed for opinions and sentiment can provide powerful insight into a population. This research utilizes Twitter data due to its widespread global use, in order to examine the sentiment associated with tweets. An approach utilizing Twitter #hashtags and Latent Dirichlet Allocation topic modeling were utilized to differentiate between tweet topics. A lexicographical dictionary was then utilized to classify sentiment. This method provides a framework for an analyst to ingest Twitter data, conduct an analysis and provide insight into the sentiment contained within the data

    Anti-Russia or anti-Ukraine: How do Twitter users feel about the ongoing conflict between August 2022 and February 2023? A sentiment analysis approach

    Get PDF
    The research presented in this thesis aimed to investigate the shifting sentiment among Twitter users regarding the Ukraine-Russia conflict between August 2022 and February 2023. To comprehend this sentiment variation and public opinion, we travelled back to 1991, the year of the Soviet Union's dissolution, and reviewed literature to gain deeper insights into the Ukraine-Russia relationship. Employing a combination of descriptive analysis techniques, Sentiment Analysis, Topic Modelling, and Machine Learning algorithms such as Logistic Regression, Decision Tree, Naïve Bayes, AdaBoost, and XGBoost, we examined the evolving Anti-Ukraine and Anti-Russia sentiments expressed by Twitter users during the second semester of the conflict. Our findings revealed that, within our datasets, there was a higher prevalence of tweets expressing Anti-Ukraine sentiments than those expressing Anti-Russia sentiments. Notably, the XGBoost model exhibited the most promising performance metrics, achieving an accuracy rate of 90% for the dataset with data from August and September 2022 and 93% accuracy for the dataset with data from February 2023.A investigação apresentada nesta tese teve como objetivo analisar a evolução do sentimento dos utilizadores do Twitter face ao conflito Ucrânia-Rússia entre agosto de 2022 e fevereiro de 2023. Para melhor compreender esta evolução de sentimento e da opinião pública, pesquisámos literatura relativa às relações entre a Ucrânia e a Rússia desde 1991, o ano da dissolução da União Soviética. Utilizando uma combinação de técnicas de análise descritiva, Análise de Sentimento, Topic Modelling e algoritmos de Machine Learning, como Regressão Logística, Árvore de Decisão, Naïve Bayes, AdaBoost e XGBoost, analisámos a evolução dos sentimentos Anti-Ucrânia e Anti-Rússia expressos pelos utilizadores do Twitter durante o segundo semestre do conflito. Concluímos que, dentro dos nossos conjuntos de dados, existe uma maior prevalência de tweets que expressam sentimentos Anti-Ucrânia em comparação com sentimentos Anti-Rússia. O modelo XGBoost apresentou as melhores métricas de performance, com uma taxa de accuracy de 90% para o dataset com dados de agosto e setembro de 2022 e uma taxa de accucary de 93% para o dataset com dados de fevereiro de 2023

    Automatic information search for countering covid-19 misinformation through semantic similarity

    Full text link
    Trabajo Fin de Máster en Bioinformática y Biología ComputacionalInformation quality in social media is an increasingly important issue and misinformation problem has become even more critical in the current COVID-19 pandemic, leading people exposed to false and potentially harmful claims and rumours. Civil society organizations, such as the World Health Organization, have demanded a global call for action to promote access to health information and mitigate harm from health misinformation. Consequently, this project pursues countering the spread of COVID-19 infodemic and its potential health hazards. In this work, we give an overall view of models and methods that have been employed in the NLP field from its foundations to the latest state-of-the-art approaches. Focusing on deep learning methods, we propose applying multilingual Transformer models based on siamese networks, also called bi-encoders, combined with ensemble and PCA dimensionality reduction techniques. The goal is to counter COVID-19 misinformation by analyzing the semantic similarity between a claim and tweets from a collection gathered from official fact-checkers verified by the International Fact-Checking Network of the Poynter Institute. It is factual that the number of Internet users increases every year and the language spoken determines access to information online. For this reason, we give a special effort in the application of multilingual models to tackle misinformation across the globe. Regarding semantic similarity, we firstly evaluate these multilingual ensemble models and improve the result in the STS-Benchmark compared to monolingual and single models. Secondly, we enhance the interpretability of the models’ performance through the SentEval toolkit. Lastly, we compare these models’ performance against biomedical models in TREC-COVID task round 1 using the BM25 Okapi ranking method as the baseline. Moreover, we are interested in understanding the ins and outs of misinformation. For that purpose, we extend interpretability using machine learning and deep learning approaches for sentiment analysis and topic modelling. Finally, we developed a dashboard to ease visualization of the results. In our view, the results obtained in this project constitute an excellent initial step toward incorporating multilingualism and will assist researchers and people in countering COVID-19 misinformation

    SentiMLBench: Benchmark Evaluation of Machine Learning Algorithms for Sentiment Analysis

    Get PDF
    Sentiment Analysis has been a topic of interest for researchers due to its increasing usage by Industry. To measure end-user sentiment., there is no clear verdict on which algorithms are better in real-time scenarios. A rigorous benchmark evaluation of various algorithms running across multiple datasets and different hardware architectures is required that can guide future researchers on potential advantages and limitations. In this paper, proposed SentiMLBench is a critical evaluation of key ML algorithms as standalone classifiers, a novel cascade feature selection (CFS) based ensemble technique in multiple benchmark environments each using a different twitter dataset and processing hardware. The best trained ensemble model with CFS enhancement surpasses current state-of-the-art models, according to experimental results. In a study, though ensemble model provides good accuracy, it falls short of neural networks accuracy by 2%. ML algorithms accuracy is poor as standalone classifiers across all three studies. The supremacy of neural networks is further stamped in study three where it outperforms other algorithms in accuracy by over 10%. Graphical processing unit provide speed and higher computational power at a fraction of a cost compared to a normal processor thereby providing critical architectural insights into developing a robust expert system for sentiment analysis

    Naïve Bayes Method for Text-Based Sentiment Analysis on Social Media

    Get PDF
    Scientometrics is the study of  measurement and analysis of science, innovation and technology through scientific publications. One form of measurement that can be taken is  the network of authors measurement. This study uses author network analysis as a measurement tool performed in scientific studies. The purpose of this study was to observe the Authorsip network formed among professors at Bina Darma University, in order to determine which professors and departments are the most productive in producing yearbook articles  or magazine. The method used in this study is the centrality of graphic degrees. Software used to view Gephi 0.9.2. The data used in this study are published data for the year 2015-2020. Based on the results of this study, it can be concluded that the agent with the highest central value is the EU with a value of 28, where the EU is the agent. with the largest number of publications. Meanwhile, the actor who has an influence or relationship and frequently collaborates on publications with the highest score on Betweenness Centrality is AM with a score of 61500.94