168 research outputs found
Bibliometric Survey on Incremental Learning in Text Classification Algorithms for False Information Detection
The false information or misinformation over the web has severe effects on people, business and society as a whole. Therefore, detection of misinformation has become a topic of research among many researchers. Detecting misinformation of textual articles is directly connected to text classification problem. With the massive and dynamic generation of unstructured textual documents over the web, incremental learning in text classification has gained more popularity. This survey explores recent advancements in incremental learning in text classification and review the research publications of the area from Scopus, Web of Science, Google Scholar, and IEEE databases and perform quantitative analysis by using methods such as publication statistics, collaboration degree, research network analysis, and citation analysis. The contribution of this study in incremental learning in text classification provides researchers insights on the latest status of the research through literature survey, and helps the researchers to know the various applications and the techniques used recently in the field
Deep learning to filter SMS spam
The popularity of short message service (SMS) has been growing over the last decade. For businesses, these text messages are more effective than even emails. This is because while 98% of mobile users read their SMS by the end of the day, about 80% of the emails remain unopened. The popularity of SMS has also given rise to SMS Spam, which refers to any irrelevant text messages delivered using mobile networks. They are severely annoying to users. Most existing research that has attempted to filter SMS Spam has relied on manually identified features. Extending the current literature, this paper uses deep learning to classify Spam and Not-Spam text messages. Specifically, Convolutional Neural Network and Long Short-term memory models were employed. The proposed models were based on text data only, and self-extracted the feature set. On a benchmark dataset consisting of 747 Spam and 4,827 Not-Spam text messages, a remarkable accuracy of 99.44% was achieved
A review of spam email detection: analysis of spammer strategies and the dataset shift problem
.Spam emails have been traditionally seen as just annoying and unsolicited emails containing advertisements, but they increasingly include scams, malware or phishing. In order to ensure the security and integrity for the users, organisations and researchers aim to develop robust filters for spam email detection. Recently, most spam filters based on machine learning algorithms published in academic journals report very high performance, but users are still reporting a rising number of frauds and attacks via spam emails. Two main challenges can be found in this field: (a) it is a very dynamic environment prone to the dataset shift problem and (b) it suffers from the presence of an adversarial figure, i.e. the spammer. Unlike classical spam email reviews, this one is particularly focused on the problems that this constantly changing environment poses. Moreover, we analyse the different spammer strategies used for contaminating the emails, and we review the state-of-the-art techniques to develop filters based on machine learning. Finally, we empirically evaluate and present the consequences of ignoring the matter of dataset shift in this practical field. Experimental results show that this shift may lead to severe degradation in the estimated generalisation performance, with error rates reaching values up to 48.81%.SIPublicación en abierto financiada por el Consorcio de Bibliotecas Universitarias de Castilla y León (BUCLE), con cargo al Programa Operativo 2014ES16RFOP009 FEDER 2014-2020 DE CASTILLA Y LEÓN, Actuación:20007-CL - Apoyo Consorcio BUCL
Social media bot detection with deep learning methods: a systematic review
Social bots are automated social media accounts governed by software and controlled by humans at the backend. Some bots have good purposes, such as automatically posting information about news and even to provide help during emergencies. Nevertheless, bots have also been used for malicious purposes, such as for posting fake news or rumour spreading or manipulating political campaigns. There are existing mechanisms that allow for detection and removal of malicious bots automatically. However, the bot landscape changes as the bot creators use more sophisticated methods to avoid being detected. Therefore, new mechanisms for discerning between legitimate and bot accounts are much needed. Over the past few years, a few review studies contributed to the social media bot detection research by presenting a comprehensive survey on various detection methods including cutting-edge solutions like machine learning (ML)/deep learning (DL) techniques. This paper, to the best of our knowledge, is the first one to only highlight the DL techniques and compare the motivation/effectiveness of these techniques among themselves and over other methods, especially the traditional ML ones. We present here a refined taxonomy of the features used in DL studies and details about the associated pre-processing strategies required to make suitable training data for a DL model. We summarize the gaps addressed by the review papers that mentioned about DL/ML studies to provide future directions in this field. Overall, DL techniques turn out to be computation and time efficient techniques for social bot detection with better or compatible performance as traditional ML techniques
Deep Learning for Phishing Detection: Taxonomy, Current Challenges and Future Directions
This work was supported in part by the Ministry of Higher Education under the Fundamental Research Grant Scheme under Grant FRGS/1/2018/ICT04/UTM/01/1; and in part by the Faculty of Informatics and Management, University of Hradec Kralove, through SPEV project under Grant 2102/2022.Phishing has become an increasing concern and captured the attention of end-users as well
as security experts. Existing phishing detection techniques still suffer from the de ciency in performance
accuracy and inability to detect unknown attacks despite decades of development and improvement.
Motivated to solve these problems, many researchers in the cybersecurity domain have shifted their attention
to phishing detection that capitalizes on machine learning techniques. Deep learning has emerged as a branch
of machine learning that becomes a promising solution for phishing detection in recent years. As a result,
this study proposes a taxonomy of deep learning algorithm for phishing detection by examining 81 selected
papers using a systematic literature review approach. The paper rst introduces the concept of phishing and
deep learning in the context of cybersecurity. Then, taxonomies of phishing detection and deep learning
algorithm are provided to classify the existing literature into various categories. Next, taking the proposed
taxonomy as a baseline, this study comprehensively reviews the state-of-the-art deep learning techniques
and analyzes their advantages as well as disadvantages. Subsequently, the paper discusses various issues
that deep learning faces in phishing detection and proposes future research directions to overcome these
challenges. Finally, an empirical analysis is conducted to evaluate the performance of various deep learning
techniques in a practical context, and to highlight the related issues that motivate researchers in their future
works. The results obtained from the empirical experiment showed that the common issues among most of
the state-of-the-art deep learning algorithms are manual parameter-tuning, long training time, and de cient
detection accuracy.Ministry of Higher Education under the Fundamental Research Grant Scheme FRGS/1/2018/ICT04/UTM/01/1Faculty of Informatics and Management, University of Hradec Kralove, through SPEV project 2102/202
Automated design of the deep neural network pipeline
Deep neural networks have proven to be effective in various domains, especially in natural
language processing and image processing. However, one of the challenges associated with using
deep neural networks includes the long design time and expertise needed to apply these neural
networks to a particular domain. The research presented in this paper investigates the automation of
the design of the deep neural network pipeline to overcome this challenge. The deep learning pipeline
includes identifying the preprocessing needed, the feature engineering technique, the neural network
to use and the parameters for the neural network. A selection pertubative hyper-heuristic (SPHH)
is used to automate the design pipeline. The study also examines the reusability of the generated
pipeline. The effectiveness of transfer learning on the generated designs is also investigated. The
proposed approach is evaluated for text processing—namely, sentiment analysis and spam detection—
and image processing—namely, maize disease detection and oral lesion detection. The study revealed
that the automated design of the deep neural network pipeline produces just as good, and in some
cases better, performance compared to the manual design, with the automated design requiring
less design time than the manual design. In the majority of instances, the design was not reusable;
however, transfer learning achieved positive transfer of designs, with the performance being just as
good or better than when transfer learning was not used.The National Research Foundation of South Africa.https://www.mdpi.com/journal/applsciam2023Computer Scienc
- …