Search CORE

1,568 research outputs found

Clustering and classification methods for spam analysis

Author: Smirnov Maksim
Publication venue
Publication date: 08/10/2018
Field of study

Spam emails are a major tool for criminals to distribute malware, conduct fraudulent activity, sell counterfeit products, etc. Thus, security companies are interested in researching spam. Unfortunately, due to the spammers' detection-avoidance techniques, most of the existing tools for spam analysis are not able to provide accurate information about spam campaigns. Moreover, they are not able to link together campaigns initiated by the same sender. F-Secure, a cybersecurity company, collects vast amounts of spam for analysis. The threat intelligence collection from these messages currently involves a lot of manual work. In this thesis we apply state-of-the-art data-analysis techniques to increase the level of automation in the analysis process, thus enabling the human experts to focus on high-level information such as campaigns and actors. The thesis discusses a novel method of spam analysis in which email messages are clustered by different characteristics and the clusters are presented as a graph. The graph representation allows the analyst to see evolving campaigns and even connections between related messages which themselves have no features in common. This makes our analysis tool more powerful than previous methods that simply cluster emails to sets. We implemented a proof of concept version of the analysis tool to evaluate the usefulness of the approach. Experiments show that the graph representation and clustering by different features makes it possible to link together large and complex spam campaigns that were previously not detected. The tools also found evidence that different campaigns were likely to be organized by the same spammer. The results indicate that the graph-based approach is able to extract new, useful information about spam campaigns

Aaltodoc Publication Archive

Web Spam DetectionUsing Fuzzy Clustering

Author: J.Shyam Jegadeesh, P.Libin Jacob, J.John Spencer, C.Stanly DevaKumar
Publication venue: 'Auricle Technologies, Pvt., Ltd.'
Publication date: 31/12/2013
Field of study

Internet is the most widespread medium to express our views and ideas and a lucrative platform for delivering the products. F or this in tention, search engine plays a key role. The information or data about the web pages are stored in an index database of the search engine for use in later queries. Web spam refers to a host of techniques to challenge the ranking algorithms of web search en gines and cause them to rank their web pages higher or for some other beneficial purpose. Usually, the web spam is irritating the web surfers and makes disruption. It ruins the quality of the web search engine. So, in this paper, we presented an efficient clustering method to detect the spam web pages effectively and accurately. Also, we employed various validation measures to validate our research work by using the clustering methods. The comparison s between the obtained charts and the val idation results clearly explain that the research work we presented produces the better result

International Journal on Recent and Innovation Trends in Computing and Communication

Bibliometric Survey on Incremental Learning in Text Classification Algorithms for False Information Detection

Author: Barve Yashoda Narayanprasad, Mrs.
Mulay Preeti, Dr.
Publication venue: DigitalCommons@University of Nebraska - Lincoln
Publication date: 23/11/2020
Field of study

The false information or misinformation over the web has severe effects on people, business and society as a whole. Therefore, detection of misinformation has become a topic of research among many researchers. Detecting misinformation of textual articles is directly connected to text classification problem. With the massive and dynamic generation of unstructured textual documents over the web, incremental learning in text classification has gained more popularity. This survey explores recent advancements in incremental learning in text classification and review the research publications of the area from Scopus, Web of Science, Google Scholar, and IEEE databases and perform quantitative analysis by using methods such as publication statistics, collaboration degree, research network analysis, and citation analysis. The contribution of this study in incremental learning in text classification provides researchers insights on the latest status of the research through literature survey, and helps the researchers to know the various applications and the techniques used recently in the field

DigitalCommons@University of Nebraska

Spam on the Internet: can it be eradicated or is it here to stay?

Author: de Freitas Sara
Levene Mark
Publication venue: JISC
Publication date: 01/06/2004
Field of study

A discussion of the rise in unsolicited bulk e-mail, its effect on tertiary education, and some of the methods being used or developed to combat it. Includes an examination of block listing, protocol change, economic and computational solutions, e-mail aliasing, sender warranted e-mail, collaborative filtering, rule-based and statistical solutions, and legislation

Birkbeck Institutional Research Online

Transforming Message Detection

Author: Ermakova L.
Publication venue: St. Petersburg University Press
Publication date: 01/01/2011
Field of study

The majority of existing spam filtering techniques suffers from several serious disadvantages. Some of them provide many false positives. The others are suitable only for email filtering and may not be used in IM and social networks. Therefore content methods seem to be more efficient. One of them is based on signature retrieval. However it is not change resistant. There are enhancements (e.g. checksums) but they are extremely time and resource consuming. That is why the main objective of this research is to develop a transforming message detection method. To this end we have compared spam in various languages, namely English, French, Russian and Italian. For each language the number of examined messages including spam and notspam was about 1000. 135 quantitative features have been retrieved. Almost all these features do not depend on the language. They underlie the first step of the algorithm based on support vector machine. The next stage is to test the obtained results applying N-gram approach. Special attention is paid to word distortion and text alteration. The obtaining results indicate the efficiency of the suggested approach

Institutional repository of Ural Federal University named after the first President of Russia B.N.Yeltsin

Recommended from our members

MapReduce based RDF assisted distributed SVM for high throughput spam filtering

Author: Caruana Godwin
Publication venue: Brunel University School of Engineering and Design PhD Theses
Publication date: 01/01/2013
Field of study

This thesis was submitted for the degree of Doctor of Philosophy and was awarded by Brunel UniversityElectronic mail has become cast and embedded in our everyday lives. Billions of legitimate emails are sent on a daily basis. The widely established underlying infrastructure, its widespread availability as well as its ease of use have all acted as catalysts to such pervasive proliferation. Unfortunately, the same can be alleged about unsolicited bulk email, or rather spam. Various methods, as well as enabling architectures are available to try to mitigate spam permeation. In this respect, this dissertation compliments existing survey work in this area by contributing an extensive literature review of traditional and emerging spam filtering approaches. Techniques, approaches and architectures employed for spam filtering are appraised, critically assessing respective strengths and weaknesses. Velocity, volume and variety are key characteristics of the spam challenge. MapReduce (M/R) has become increasingly popular as an Internet scale, data intensive processing platform. In the context of machine learning based spam filter training, support vector machine (SVM) based techniques have been proven effective. SVM training is however a computationally intensive process. In this dissertation, a M/R based distributed SVM algorithm for scalable spam filter training, designated MRSMO, is presented. By distributing and processing subsets of the training data across multiple participating computing nodes, the distributed SVM reduces spam filter training time significantly. To mitigate the accuracy degradation introduced by the adopted approach, a Resource Description Framework (RDF) based feedback loop is evaluated. Experimental results demonstrate that this improves the accuracy levels of the distributed SVM beyond the original sequential counterpart. Effectively exploiting large scale, ‘Cloud’ based, heterogeneous processing capabilities for M/R in what can be considered a non-deterministic environment requires the consideration of a number of perspectives. In this work, gSched, a Hadoop M/R based, heterogeneous aware task to node matching and allocation scheme is designed. Using MRSMO as a baseline, experimental evaluation indicates that gSched improves on the performance of the out-of-the box Hadoop counterpart in a typical Cloud based infrastructure. The focal contribution to knowledge is a scalable, heterogeneous infrastructure and machine learning based spam filtering scheme, able to capitalize on collaborative accuracy improvements through RDF based, end user feedback. MapReduce based RDF Assisted Distributed SVM for High Throughput Spam Filterin

Brunel University Research Archive

Data Leak Detection As a Service: Challenges and Solutions

Author: Shu Xiaokui
Yao Danfeng (Daphne)
Publication venue
Publication date: 01/01/2012
Field of study

We describe a network-based data-leak detection (DLD) technique, the main feature of which is that the detection does not require the data owner to reveal the content of the sensitive data. Instead, only a small amount of specialized digests are needed. Our technique – referred to as the fuzzy fingerprint – can be used to detect accidental data leaks due to human errors or application flaws. The privacy-preserving feature of our algorithms minimizes the exposure of sensitive data and enables the data owner to safely delegate the detection to others.We describe how cloud providers can offer their customers data-leak detection as an add-on service with strong privacy guarantees. We perform extensive experimental evaluation on the privacy, efficiency, accuracy and noise tolerance of our techniques. Our evaluation results under various data-leak scenarios and setups show that our method can support accurate detection with very small number of false alarms, even when the presentation of the data has been transformed. It also indicates that the detection accuracy does not degrade when partial digests are used. We further provide a quantifiable method to measure the privacy guarantee offered by our fuzzy fingerprint framework

Computer Science Technical Reports @Virginia Tech

Tutorial and Critical Analysis of Phishing Websites Methods

Author: Abu-Nimeh
Aburrous
Aburrous
Afroz
Angelo
Cendrowska
Cortes
Dhamija
Dhamija
Dowd
Downs
Fadi Thabtah
Florencio
Franklin
Goldreich
Guang
Guang
Halderman
Han
He
Herzberg
Huang
Jagatic
James
Joshi
Kang
Keizer
Kirda
Kumaraguru
Lee McCluskey
Liu
Liu
Ludl
Mannan
Manning
Miyamoto
Mizuno
Neil
Ohaya
Pan
Quinlan
Rami M. Mohammad
Ronald
Ronda
Ross
Sanglerdsinlapachai
Schneier
Sharifi
Sheng
Sodiya
Sullins
Wenyin
Witten
Wu
Wu
Yossi
Yu
Yue
Zhang
Publication venue: 'Elsevier BV'
Publication date: 13/05/2015
Field of study

The Internet has become an essential component of our everyday social and financial activities. Internet is not important for individual users only but also for organizations, because organizations that offer online trading can achieve a competitive edge by serving worldwide clients. Internet facilitates reaching customers all over the globe without any market place restrictions and with effective use of e-commerce. As a result, the number of customers who rely on the Internet to perform procurements is increasing dramatically. Hundreds of millions of dollars are transferred through the Internet every day. This amount of money was tempting the fraudsters to carry out their fraudulent operations. Hence, Internet users may be vulnerable to different types of web threats, which may cause financial damages, identity theft, loss of private information, brand reputation damage and loss of customers’ confidence in e-commerce and online banking. Therefore, suitability of the Internet for commercial transactions becomes doubtful. Phishing is considered a form of web threats that is defined as the art of impersonating a website of an honest enterprise aiming to obtain user’s confidential credentials such as usernames, passwords and social security numbers. In this article, the phishing phenomena will be discussed in detail. In addition, we present a survey of the state of the art research on such attack. Moreover, we aim to recognize the up-to-date developments in phishing and its precautionary measures and provide a comprehensive study and evaluation of these researches to realize the gap that is still predominating in this area. This research will mostly focus on the web based phishing detection methods rather than email based detection methods

Crossref

University of Huddersfield Repository

Huddersfield Research Portal

Comparative Study of Gaussian and Nearest Mean Classifiers for Filtering Spam E-mails

Author: Dr. Upasna Attri
Harpreet Kaur
Publication venue: Global Journals Inc. (US)
Publication date: 15/01/2012
Field of study

The development of data-mining applications such as classification and clustering has shown the need for machine learning algorithms to be applied to large scale data. The article gives an overview of some of the most popular machine learning methods (Gaussian and Nearest Mean) and of their applicability to the problem of spam e-mail filtering. The aim of this paper is to compare and investigate the effectiveness of classifiers for filtering spam e-mails using different matrices. Since spam is increasingly becoming difficult to detect, so these automated techniques will help in saving lot of time and resources required to handle e-mail messages

Global Journal of Computer Science and Technology (GJCST)