Search CORE

1,517 research outputs found

A Hybrid Spam Detection Method Based on Unstructured Datasets

Author: A Castiglione
B Al-Duwairi
C Zhang
D Kuropka
Eleana Asimakopoulou
G Fumera
K Kreutz-Delgado
M Aharon
M Zhu
Marcello Trovati
Nik Bessis
Olga Angelopoulou
Quan Shi
T Ahonen
V Serbanescu
Yeqin Shao
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 21/12/2015
Field of study

This document is the accepted manuscript version of the following article: Shao, Y., Trovati, M., Shi, Q. et al. Soft Comput (2017) 21: 233. The final publication is available at Springer via http://dx.doi.org/10.1007/s00500-015-1959-z. © Springer-Verlag Berlin Heidelberg 2015.The identification of non-genuine or malicious messages poses a variety of challenges due to the continuous changes in the techniques utilised by cyber-criminals. In this article, we propose a hybrid detection method based on a combination of image and text spam recognition techniques. In particular, the former is based on sparse representation-based classification, which focuses on the global and local image features, and a dictionary learning technique to achieve a spam and a ham sub-dictionary. On the other hand, the textual analysis is based on semantic properties of documents to assess the level of maliciousness. More specifically, we are able to distinguish between meta-spam and real spam. Experimental results show the accuracy and potential of our approach.Peer reviewedFinal Accepted Versio

Crossref

Edge Hill University Research Information Repository

University of Hertfordshire Research Archive

SURVEY OF E-MAIL CLASSIFICATION: REVIEW AND OPEN ISSUES

Author: Abdulraheem Hayder Muqdad
Ghaleb Abdulkadhim Ekhlas
Publication venue: University of Information and Technology Communications
Publication date: 03/12/2020
Field of study

Email is an economical facet of communication, the importance of which is increasing in spite of access to other approaches, such as electronic messaging, social networks, and phone applications. The business arena depends largely on the use of email, which urges the proper management of emails due to disruptive factors such as spams, phishing emails, and multi-folder categorization. The present study aimed to review the studies regarding emails, which were published during 2016-2020, based on the problem description analysis in terms of datasets, applications areas, classification techniques, and feature sets. In addition, other areas involving email classifications were identified and comprehensively reviewed. The results indicated four email application areas, while the open issues and research directions of email classifications were implicated for further investigation

Iraqi Journal for Computers and Informatics

On Identifying Disaster-Related Tweets: Matching-based or Learning-based?

Author: Agrawal Sumeet
Kim Seon Ho
Shahabi Cyrus
To Hien
Publication venue
Publication date: 04/05/2017
Field of study

Social media such as tweets are emerging as platforms contributing to situational awareness during disasters. Information shared on Twitter by both affected population (e.g., requesting assistance, warning) and those outside the impact zone (e.g., providing assistance) would help first responders, decision makers, and the public to understand the situation first-hand. Effective use of such information requires timely selection and analysis of tweets that are relevant to a particular disaster. Even though abundant tweets are promising as a data source, it is challenging to automatically identify relevant messages since tweet are short and unstructured, resulting to unsatisfactory classification performance of conventional learning-based approaches. Thus, we propose a simple yet effective algorithm to identify relevant messages based on matching keywords and hashtags, and provide a comparison between matching-based and learning-based approaches. To evaluate the two approaches, we put them into a framework specifically proposed for analyzing disaster-related tweets. Analysis results on eleven datasets with various disaster types show that our technique provides relevant tweets of higher quality and more interpretable results of sentiment analysis tasks when compared to learning approach

arXiv.org e-Print Archive

Crossref

A Review on Cybersecurity based on Machine Learning and Deep Learning Algorithms

Author: Jahwar Alan Fuad
Y. Ameen Siddeeq
Publication venue: 'Penerbit UTHM'
Publication date: 24/10/2021
Field of study

Machin learning (ML) and Deep Learning (DL) technique have been widely applied to areas like image processing and speech recognition so far. Likewise, ML and DL plays a critical role in detecting and preventing in the field of cybersecurity. In this review, we focus on recent ML and DL algorithms that have been proposed in cybersecurity, network intrusion detection, malware detection. We also discuss key elements of cybersecurity, main principle of information security and the most common methods used to threaten cybersecurity. Finally, concluding remarks are discussed including the possible research topics that can be taken into consideration to enhance various cyber security applications using DL and ML algorithms

Journals of Universiti Tun Hussein Onn Malaysia (UTHM)

A Comparative Study of Classification Techniques for Fraud Detection

Author: Er. Monika, Er. Amarpreet Kaur
Publication venue: Auricle Global Society of Education and Research
Publication date: 31/05/2018
Field of study

There is large volume of data generated each day and the handling such large volume of data is very cumbersome. The generated data is stored in huge databases which can be retrieved as per the user. There are large sized repositories and databases generated in which the data can be stored. However, the retrieval of important data from such large databases is a major concern. There are numerous tools presented which can help in extracting useful information from the databases as per the requirement of users. The mechanism through which the data can be stored and extracted efficiently as per the requirement is known as data mining. This review paper studied about the classification techniques on the basis of different types of algorithms like Decision tree, Na�ve bayes, Rule based, K-NN(K Nearest Neighbour), Artificial Neural Network. It describe the uses of various classification algorithm for develop a predictive model which is useful in different fields like Software fault prediction , credit card fraud analytics, and intrusion detection, medical and so on with respect to accuracy during the past few years

International Journal on Future Revolution in Computer Science & Communication Engineering

Document-level sentiment analysis of email data

Author: Liu Sisi
Publication venue
Publication date: 01/01/2020
Field of study

Sisi Liu investigated machine learning methods for Email document sentiment analysis. She developed a systematic framework that has been qualitatively and quantitatively proved to be effective and efficient in identifying sentiment from massive amount of Email data. Analytical results obtained from the document-level Email sentiment analysis framework are beneficial for better decision making in various business settings

ResearchOnline@JCU

ResearchOnline at James Cook University

Optimal Feature Subset Selection Based on Combining Document Frequency and Term Frequency for Text Classification

Author: Karpagalingam Thirumoorthy
Karuppaiah Muneeswaran
Publication venue: Institute of Informatics, Slovak Academy of Sciences
Publication date: 25/03/2021
Field of study

Feature selection plays a vital role to reduce the high dimension of the feature space in the text document classification problem. The dimension reduction of feature space reduces the computation cost and improves the text classification system accuracy. Hence, the identification of a proper subset of the significant features of the text corpus is needed to classify the data in less computational time with higher accuracy. In this proposed research, a novel feature selection method which combines the document frequency and the term frequency (FS-DFTF) is used to measure the significance of a term. The optimal feature subset which is selected by our proposed work is evaluated using Naive Bayes and Support Vector Machine classifier with various popular benchmark text corpus datasets. The experimental outcome confirms that the proposed method has a better classification accuracy when compared with other feature selection techniques

Computing and Informatics (E-Journal - Institute of Informatics, SAS, Bratislava)