1,294 research outputs found
Comparative Studies of Detecting Abusive Language on Twitter
The context-dependent nature of online aggression makes annotating large
collections of data extremely difficult. Previously studied datasets in abusive
language detection have been insufficient in size to efficiently train deep
learning models. Recently, Hate and Abusive Speech on Twitter, a dataset much
greater in size and reliability, has been released. However, this dataset has
not been comprehensively studied to its potential. In this paper, we conduct
the first comparative study of various learning models on Hate and Abusive
Speech on Twitter, and discuss the possibility of using additional features and
context data for improvements. Experimental results show that bidirectional GRU
networks trained on word-level features, with Latent Topic Clustering modules,
is the most accurate model scoring 0.805 F1.Comment: ALW2: 2nd Workshop on Abusive Language Online to be held at EMNLP
2018 (Brussels, Belgium), October 31st, 201
Concept drift and machine learning model for detecting fraudulent transactions in streaming environment
In a streaming environment, data is continuously generated and processed in an ongoing manner, and it is necessary to detect fraudulent transactions quickly to prevent significant financial losses. Hence, this paper proposes a machine learning-based approach for detecting fraudulent transactions in a streaming environment, with a focus on addressing concept drift. The approach utilizes the extreme gradient boosting (XGBoost) algorithm. Additionally, the approach employs four algorithms for detecting continuous stream drift. To evaluate the effectiveness of the approach, two datasets are used: a credit card dataset and a Twitter dataset containing financial fraud-related social media data. The approach is evaluated using cross-validation and the results demonstrate that it outperforms traditional machine learning models in terms of accuracy, precision, and recall, and is more robust to concept drift. The proposed approach can be utilized as a real-time fraud detection system in various industries, including finance, insurance, and e-commerce
A Fake Profile Detection Model Using Multistage Stacked Ensemble Classification
Fake profile identification on social media platforms is essential for preserving a reliable online community. Previous studies have primarily used conventional classifiers for fake account identification on social networking sites, neglecting feature selection and class balancing to enhance performance. This study introduces a novel multistage stacked ensemble classification model to enhance fake profile detection accuracy, especially in imbalanced datasets. The model comprises three phases: feature selection, base learning, and meta-learning for classification. The novelty of the work lies in utilizing chi-squared feature-class association-based feature selection, combining stacked ensemble and cost-sensitive learning. The research findings indicate that the proposed model significantly enhances fake profile detection efficiency. Employing cost-sensitive learning enhances accuracy on the Facebook, Instagram, and Twitter spam datasets with 95%, 98.20%, and 81% precision, outperforming conventional and advanced classifiers. It is demonstrated that the proposed model has the potential to enhance the security and reliability of online social networks, compared with existing models
Detecting and Monitoring Hate Speech in Twitter
Social Media are sensors in the real world that can be used to measure the pulse of societies.
However, the massive and unfiltered feed of messages posted in social media is a phenomenon that
nowadays raises social alarms, especially when these messages contain hate speech targeted to a
specific individual or group. In this context, governments and non-governmental organizations
(NGOs) are concerned about the possible negative impact that these messages can have on individuals
or on the society. In this paper, we present HaterNet, an intelligent system currently being used by
the Spanish National Office Against Hate Crimes of the Spanish State Secretariat for Security that
identifies and monitors the evolution of hate speech in Twitter. The contributions of this research
are many-fold: (1) It introduces the first intelligent system that monitors and visualizes, using social
network analysis techniques, hate speech in Social Media. (2) It introduces a novel public dataset on
hate speech in Spanish consisting of 6000 expert-labeled tweets. (3) It compares several classification
approaches based on different document representation strategies and text classification models. (4)
The best approach consists of a combination of a LTSM+MLP neural network that takes as input the
tweet’s word, emoji, and expression tokens’ embeddings enriched by the tf-idf, and obtains an area
under the curve (AUC) of 0.828 on our dataset, outperforming previous methods presented in the
literatureThe work by Quijano-Sanchez was supported by the Spanish Ministry of Science and Innovation
grant FJCI-2016-28855. The research of Liberatore was supported by the Government of Spain, grant MTM2015-65803-R, and by the European Union’s Horizon 2020 Research and Innovation Programme, under the Marie Sklodowska-Curie grant agreement No. 691161 (GEOSAFE). All the financial support is gratefully acknowledge
A systematic literature review on spam content detection and classification
The presence of spam content in social media is tremendously increasing, and therefore the detection of spam has become vital. The spam contents increase as people extensively use social media, i.e ., Facebook, Twitter, YouTube, and E-mail. The time spent by people using social media is overgrowing, especially in the time of the pandemic. Users get a lot of text messages through social media, and they cannot recognize the spam content in these messages. Spam messages contain malicious links, apps, fake accounts, fake news, reviews, rumors, etc. To improve social media security, the detection and control of spam text are essential. This paper presents a detailed survey on the latest developments in spam text detection and classification in social media. The various techniques involved in spam detection and classification involving Machine Learning, Deep Learning, and text-based approaches are discussed in this paper. We also present the challenges encountered in the identification of spam with its control mechanisms and datasets used in existing works involving spam detection
Artificial intelligence in the cyber domain: Offense and defense
Artificial intelligence techniques have grown rapidly in recent years, and their applications in practice can be seen in many fields, ranging from facial recognition to image analysis. In the cybersecurity domain, AI-based techniques can provide better cyber defense tools and help adversaries improve methods of attack. However, malicious actors are aware of the new prospects too and will probably attempt to use them for nefarious purposes. This survey paper aims at providing an overview of how artificial intelligence can be used in the context of cybersecurity in both offense and defense.Web of Science123art. no. 41
- …