557 research outputs found

    A survey on extremism analysis using natural language processing: definitions, literature review, trends and challenges

    Get PDF
    Open Access funding provided thanks to the CRUE-CSIC agreement with Springer Nature.Extremism has grown as a global problem for society in recent years, especially after the apparition of movements such as jihadism. This and other extremist groups have taken advantage of different approaches, such as the use of Social Media, to spread their ideology, promote their acts and recruit followers. The extremist discourse, therefore, is reflected on the language used by these groups. Natural language processing (NLP) provides a way of detecting this type of content, and several authors make use of it to describe and discriminate the discourse held by these groups, with the final objective of detecting and preventing its spread. Following this approach, this survey aims to review the contributions of NLP to the field of extremism research, providing the reader with a comprehensive picture of the state of the art of this research area. The content includes a first conceptualization of the term extremism, the elements that compose an extremist discourse and the differences with other terms. After that, a review description and comparison of the frequently used NLP techniques is presented, including how they were applied, the insights they provided, the most frequently used NLP software tools, descriptive and classification applications, and the availability of datasets and data sources for research. Finally, research questions are approached and answered with highlights from the review, while future trends, challenges and directions derived from these highlights are suggested towards stimulating further research in this exciting research area.CRUE-CSIC agreementSpringer Natur

    A survey on extremism analysis using natural language processing: definitions, literature review, trends and challenges

    Get PDF
    Extremism has grown as a global problem for society in recent years, especially after the apparition of movements such as jihadism. This and other extremist groups have taken advantage of different approaches, such as the use of Social Media, to spread their ideology, promote their acts and recruit followers. The extremist discourse, therefore, is reflected on the language used by these groups. Natural language processing (NLP) provides a way of detecting this type of content, and several authors make use of it to describe and discriminate the discourse held by these groups, with the final objective of detecting and preventing its spread. Following this approach, this survey aims to review the contributions of NLP to the field of extremism research, providing the reader with a comprehensive picture of the state of the art of this research area. The content includes a first conceptualization of the term extremism, the elements that compose an extremist discourse and the differences with other terms. After that, a review description and comparison of the frequently used NLP techniques is presented, including how they were applied, the insights they provided, the most frequently used NLP software tools, descriptive and classification applications, and the availability of datasets and data sources for research. Finally, research questions are approached and answered with highlights from the review, while future trends, challenges and directions derived from these highlights are suggested towards stimulating further research in this exciting research area.Open Access funding provided thanks to the CRUE-CSIC agreement with Springer Nature

    Literature review on Real-time Location-Based Sentiment Analysis on Twitter

    Get PDF
    Sentiment analysis mainly supports sorting out the polarity and provides valuable information with the use of raw data in social media platforms. Many fields like health, business, and security require real-time data analysis for instant decision-making situations.Since Twitter is considered a popular social media platform to collect data easily, this paper is considering data analysis methods of Twitter data, real-time Twitter data analysis based on geo-location. Twitter data classification and analysis can be done with the use of diverse algorithms and deciding the most appropriate algorithm for data analysis, can be accomplished by implementing and testing these diverse algorithms.This paper is discussing the major description of sentiment analysis, data collection methods, data pre-processing, feature extraction, and sentiment analysis methods related to Twitter data. Real-time data analysis arises as a major method of analyzing the data available online and the real-time Twitter data analysis process is described throughout this paper. Several methods of classifying the polarized Twitter data are discussed within the paper while depicting a proposed method of Twitter data analyzing algorithm. Location-based Twitter data analysis is another crucial aspect of sentiment analyses, that enables data sorting according to geo-location, and this paper describes the way of analyzing Twitter data based on geo-location. Further, a comparison about several sentiment analysis algorithms used by previous researchers has been reported and finally, a conclusion has been provided.

    Sentiment Analysis and Classifying Hashtags in Social Media Using Data Mining Techniques

    Get PDF
    Big data is one of the important topics which is still open for a wide range of applications for extracting useful information and knowledge for supporting organizations by planning and decision-making. Social media as a technology is an important resource of data, especially because it has been widely used in the last years. A Hashtag is recently one of the most popular features provided by Social media and is used by most social media users to express, share, and retrieve opinions and feelings regarding a specific theme. Hashtag features in social media are used more and more in recent years to discuss and debate important current events by public audience. This paper sheds light on how business can use such sources of information and how needed technical processes can be implemented accordingly. The paper demonstrates sentiment analysis as a scenario for such implementation. The main innovation in this paper is not limited to the technical method used, but rather to focus on the idea of using hashtags as information source in business, which is still rarely addressed in science. This paper will provide a novel model based on text mining techniques to provide a sentiment analysis for classifying business-related Hashtags posted on social media from the customers. The results will be presented and verified through samples of positive, and negative classified comments extracted from the Hashtags for supporting the organization by planning and decision making for generating completive advantages

    Learning to Detect Human Emotions in Digital World by Integrating Ensemble Voting Classifiers

    Get PDF
    Due to the expansion of world of the internet and the quick acceptance of platforms for social media, information is now able to exchange in ways never previously imagined in history of mankind. A social networking site like Twitter offers a forum where people may interact, discuss, as well as respond to specific issues via short entries, like tweets of 140 characters and fewer. Users may engage by utilizing the comment, like and share tabs on texts, videos, images and other content. Although platforms for social media are now so extensively utilized, individuals are creating as well as sharing so much information than shared before, which can be incorrect or unconnected to reality. It is difficult to identify erroneous or inaccurate statements in textual content autonomously and find emotions of people. In this paper, we suggest an Ensemble method for sentiment and emotion analysis. Different textual features of actual and Emotion and sentiment have been utilized. We used a publicly accessible dataset of twitter sentiment analysis that included total 48,247 authenticated tweets out of 23,947 of which were authentic positive texts labeled as binary 0s  and 24,300 of which were  negative texts labeled as binary 1s. In order to assess our approach, we used well-known (ML) machine learning techniques, these are Logistic Regression (LR), AdaBoost, Decision Tree (DT), SGD, XG-Boost as well as Naive Bayes. In order to get more accurate findings, we created a multi-model sentiment and emotion analyzing system utilizing the ensemble approach and the classifiers stated above. Our recommended ensemble learner method outperforms individual learners, according to an experimental study

    A Bibliometric Analysis of Online Extremism Detection

    Get PDF
    The Internet has become an essential part of modern communication. People are sharing ideas, thoughts, and beliefs easily, using social media. This sharing of ideas has raised a big problem like the spread of the radicalized extremist ideas. The various extremist organizations use the social media as a propaganda tool. The extremist organizations actively radicalize and recruit youths by sharing inciting material on social media. Extremist organizations use social media to influence people to carry out lone-wolf attacks. Social media platforms employ various strategies to identify and remove the extremist content. But due to the sheer amount of data and loopholes in detection strategies, extremism remain undetected for a significant time. Thus, there is a need of accurate detection of extremism on social media. This study provides Bibliometric analysis and systematic mappings of existing literature for radicalisation or extremism detection. Bibliometric analysis of Machine Learning and Deep Learning articles in extremism detection are considered. This is performed using SCOPUS database, with the tools like Sciencescape and VOS Viewer. It is observed that the current literature on extremist detection is focused on a particular ideology. Though it is noted that few researchers are working in the extremism detection area, it is preferred among researchers in the recent years

    A Review of Deep Learning Models for Twitter Sentiment Analysis: Challenges and Opportunities

    Get PDF
    Microblogging site Twitter (re-branded to X since July 2023) is one of the most influential online social media websites, which offers a platform for the masses to communicate, expresses their opinions, and shares information on a wide range of subjects and products, resulting in the creation of a large amount of unstructured data. This has attracted significant attention from researchers who seek to understand and analyze the sentiments contained within this massive user-generated text. The task of sentiment analysis (SA) entails extracting and identifying user opinions from the text, and various lexicon-and machine learning-based methods have been developed over the years to accomplish this. However, deep learning (DL)-based approaches have recently become dominant due to their superior performance. This study briefs on standard preprocessing techniques and various word embeddings for data preparation. It then delves into a taxonomy to provide a comprehensive summary of DL-based approaches. In addition, the work compiles popular benchmark datasets and highlights evaluation metrics employed for performance measures and the resources available in the public domain to aid SA tasks. Furthermore, the survey discusses domain-specific practical applications of SA tasks. Finally, the study concludes with various research challenges and outlines future outlooks for further investigation

    Enhancing extremist data classification through textual analysis

    Get PDF
    The high volume of extremist materials on the Internet has created the need for intelligence gathering via the Web and real-time monitoring of potential websites for evidence of extremist activities. However, the manual classification for such contents is practically difficult and time-consuming. In response to this challenge, the work reported here developed several classification frameworks. Each framework provides a basis of text representation before being fed into machine learning algorithm. The basis of text representation are Sentiment-rule, Posit-textual analysis with word-level features, and an extension of Posit analysis, known as Extended-Posit, which adopts character-level as well as word-level data. Identifying some gaps in the aforementioned techniques created avenues for further improvements, most especially in handling larger datasets with better classification accuracy. Consequently, a novel basis of text representation known as the Composite-based method was developed. This is a computational framework that explores the combination of both sentiment and syntactic features of textual contents of a Web page. Subsequently, these techniques are applied on a dataset that had been subjected to a manual classification process, thereafter fed into machine learning algorithm. This is to generate a measure of how well each page can be classified into their appropriate classes. The classifiers considered are both Neural Network (RNN and MLP) and Machine Learning classifiers (such as J48, Random Forest and KNN). In addition, features selection and model optimisation were evaluated to know the cost when creating machine learning model. However, considering all the result obtained from each of the framework, the results indicated that composite features are preferable to solely syntactic or sentiment features which offer improved classification accuracy when used with machine learning algorithms. Furthermore, the extension of Posit analysis to include both word and character-level data out-performed word-level feature alone when applied on the assembled textual data. Moreover, Random Forest classifier outperformed other classifiers explored. Taking cost into account, feature selection improves classification accuracy and save time better than hyperparameter turning (model optimisation).The high volume of extremist materials on the Internet has created the need for intelligence gathering via the Web and real-time monitoring of potential websites for evidence of extremist activities. However, the manual classification for such contents is practically difficult and time-consuming. In response to this challenge, the work reported here developed several classification frameworks. Each framework provides a basis of text representation before being fed into machine learning algorithm. The basis of text representation are Sentiment-rule, Posit-textual analysis with word-level features, and an extension of Posit analysis, known as Extended-Posit, which adopts character-level as well as word-level data. Identifying some gaps in the aforementioned techniques created avenues for further improvements, most especially in handling larger datasets with better classification accuracy. Consequently, a novel basis of text representation known as the Composite-based method was developed. This is a computational framework that explores the combination of both sentiment and syntactic features of textual contents of a Web page. Subsequently, these techniques are applied on a dataset that had been subjected to a manual classification process, thereafter fed into machine learning algorithm. This is to generate a measure of how well each page can be classified into their appropriate classes. The classifiers considered are both Neural Network (RNN and MLP) and Machine Learning classifiers (such as J48, Random Forest and KNN). In addition, features selection and model optimisation were evaluated to know the cost when creating machine learning model. However, considering all the result obtained from each of the framework, the results indicated that composite features are preferable to solely syntactic or sentiment features which offer improved classification accuracy when used with machine learning algorithms. Furthermore, the extension of Posit analysis to include both word and character-level data out-performed word-level feature alone when applied on the assembled textual data. Moreover, Random Forest classifier outperformed other classifiers explored. Taking cost into account, feature selection improves classification accuracy and save time better than hyperparameter turning (model optimisation)
    • …
    corecore