170,980 research outputs found

    Spam Email Detection on Data Mining: A Review

    Get PDF
    As we know email is an effective tool for communication and it is the fastest way to send information from one place to another and it saves time and also cost. But the email is affected by attacks which include spam mails. Spam is unwanted email or it is bulk data that is flooding the internet with many duplication of similar message, in an attempt to force the email on people who would not otherwise choose to receive it. To address the growing of spam email on the internet the interest of spam filtering also grow accordingly. In this paper we review various spam detection technics. We are use the technics with feature selection algorithm and without feature selection algorithm and apply all the classifier of data mining tool. In this study we analyze the classifier algorithm using two different data mining tools those are WEKA and TANAGRA. Data mining is the discovery of knowledge from the large database and it is the technique of finding out new patterns in a huge data sets. Both data mining tool use different classification algorithms like K-Nearest Neighbor (K-NN), Naïve Bayes (NB) and others. Then finally, the best classifier for email spam is identified based on the accuracy of the algorithm on each data mining tools. Keywords: Classifier, Feature selection, Spam E-mail. DOI: 10.7176/JIEA/9-2-01 Publication date: April 30th 201

    A Behavior-Based Approach To Securing Email Systems

    Get PDF
    The Malicious Email Tracking (MET) system, reported in a prior publication, is a behavior-based security system for email services. The Email Mining Toolkit (EMT) presented in this paper is an offline email archive data mining analysis system that is designed to assist computing models of malicious email behavior for deployment in an online MET system. EMT includes a variety of behavior models for email attachments, user accounts and groups of accounts. Each model computed is used to detect anomalous and errant email behaviors. We report on the set of features implemented in the current version of EMT, and describe tests of the system and our plans for extensions to the set of models

    DATA MINING AND THE PROCESS OF TAKING DECISIONS IN EBUSINESS

    Get PDF
    Data mining software allows users to analyze large databases to solve business decision problems. Data mining is, in some ways, an extension of statistics, with a few artificial intelligence and machine learning twists thrown in. Like statistics, data mining is not a business solution, it is just a technology. For example, consider a catalog retailer who needs to decide who should receive information about a new product. The information operated on by the data mining process is contained in a historical database of previous interactions with customers and the features associated with the customers, such as age, zip code, their responses. The data mining software would use this historical information to build a model of customer behavior that could be used to predict which customers would be likely to respond to the new product. By using this information a marketing manager can select only the customers who are most likely to respond. The operational business software can then feed the results of the decision to the appropriate touch point systems (call centers, direct mail, web servers, email systems, etc.) so that the right customers receive the right offers.data mining, business decisions, data analysis, cluster analysis, decision strategy

    Task automation through email data analysis

    Get PDF
    Currently, many companies do not use the information contained in their emails, yet it is a data set that is full of information and could be very useful. This thesis report focuses on email data analysis and task automation, particularly in the area of email-based process mining. The state of the art section reviews existing research on extracting information from email content using techniques such as lexical analysis, language detection, semantic analysis and machine learning methods. It explores different areas of process mining, including process pattern discovery, anomaly discovery, and process extraction from texts. The objectives of this research are to assess the feasibility of extracting candidate processes from emails, to develop human-understandable metrics to classify processes, to propose a system to identify automation opportunities in email templates and explore possibilities for automation in email interactions. To do this, we carried out different steps such as data preparation, chains detection, text representation, distance matrix calculation and grouping methods

    Deteksi Spam Email dengan Metode Naive Bayes dan Particle Swarm Optimization (PSO)

    Get PDF
    Internet-based technology has become a primary need. Based on the survey results from the Central Statistics Agency in collaboration with APJII, email sending and receiving activities have outperformed social media positions by reaching 95.75%. Very intense use of email can have both positive and negative effects. Because apart from being a communication tool, in reality not everyone uses email well and there are even so many misuses of email that have the potential to harm others. This misused email is commonly known as spam or junkmail (junk email) which contains advertisements, scams and even viruses. In this study, data processing from gmail emails with text mining was carried out and then tested with several data mining classification methods including the Naïve Bayes Algorithm, SVM, Random Forest and combined with Partical Swarm Optimization in predicting spam emails with the aim that the selected algorithm is the most accurate. From the test results by measuring the performance of the four algorithms using Confusion Matrix and ROC, it is known that the Naïve Bayes algorithm with Partical Swarm Optimization (PSO) has the highest accuracy value, namely 81.40% and AUC 0.7

    Hot Zone Identification: Analyzing Effects of Data Sampling on SPAM Clustering

    Get PDF
    Email is the most common and comparatively the most efficient means of exchanging information in today\u27s world. However, given the widespread use of emails in all sectors, they have been the target of spammers since the beginning. Filtering spam emails has now led to critical actions such as forensic activities based on mining spam email. The data mine for spam emails at the University of Alabama at Birmingham is considered to be one of the most prominent resources for mining and identifying spam sources. It is a widely researched repository used by researchers from different global organizations. The usual process of mining the spam data involves going through every email in the data mine and clustering them based on their different attributes. However, given the size of the data mine, it takes an exceptionally long time to execute the clustering mechanism each time. In this paper, we have illustrated sampling as an efficient tool for data reduction, while preserving the information within the clusters, which would thus allow the spam forensic experts to quickly and effectively identify the ‘hot zone’ from the spam campaigns. We have provided detailed comparative analysis of the quality of the clusters after sampling, the overall distribution of clusters on the spam data, and timing measurements for our sampling approach. Additionally, we present different strategies which allowed us to optimize the sampling process using data-preprocessing and using the database engine\u27s computational resources, and thus improving the performance of the clustering process. Keywords: Clustering, Data mining, Monte-Carlo Sampler, Sampling, Spam, Step Sequence Sampler, Stepping Random Sampler, Hot Zon

    PENERAPAN TEXT MINING PADA SISTEM KLASIFIKASI EMAIL SPAM MENGGUNAKAN NAIVE BAYES

    Get PDF
    Email atau Elektronik mail merupakan salah satu fasilitas internet yang murah dan mudah digunakan untuk melakukan transfer informasi atau penyebaran informasi berupa file (mail attachment) antar pengguna internet .Tetapi tidak semua pengguna memanfaatkan email dengan baik dan benar. pengguna yang kurang baik memanfaatkan email untuk menyebarkan informasi yang tidak baik seperti virus dan iklan suatu perusahaan atau mempromosikan produk bisnis tertentu. Email yang seperti itulah yang lebih dikenal dengan email spam. Email spam dikirim ke banyak orang tanpa melakukan ijin terlebih dahulu ke pemilik email yang dituju. Berdasarkan permasalahan tersebut, maka dibuat suatu penelitian untuk mengembangkan suatu aplikasi text mining yang mampu mengklasifikasi email. Text mining merupakan proses menambang data yang berupa teks dimana sumber data biasanya didapatkan dari dokumen dan tujuannya adalah mencari kata-kata yang dapat mewakili isi dari dokumen sehingga dapat dilakukan analisa keterhubungan antar dokumen. Proses dalam text mining meliputi proses tokenisasi, stemming dan filtering. Metode pengumpulan data dengan metode kepustakaan. Tahapan pengembangan aplikasi meliputi perancangan proses, perancangan tabel, implementasi dan pengujian sistem. pengujian sistem dengan black box test dan alpha test. Dari penelitian yang dilakukan menghasilkan sebuah perangkat lunak penerapan text mining pada sistem klasifikasi email spam menggunakan metode naive bayes. Pada klasifikasi email dihitung nilai probabilitas berdasarkan kemunculan kata yang terdapat dalam data email. pengujian keakurasian sistem ditampilkan berupa grafik nilai keakurasian, false positif dan false negatif. Hasil uji coba menunjukkan bahwa aplikasi ini layak dan dapat digunakan dan memiliki nilai keakurasian sistem sebesar 89,6 %. Kata Kunci : Text Mining, Klasifikasi, Email spam, Naive Baye

    Analisa dan Implementasi Personal Spam Filtering Menggunakan Metoda Evolving Fuzzy System Classifier (Studi Kasus ECML PKDD 2006 Discover Challenge Data Mining Competition)

    Get PDF
    ABSTRAKSI: Orang menghabiskan banyak waktu untuk membaca email dan memutuskan apakah email itu spam atau non-spam. Beberapa orang lainnya menghabiskan beberapa tambahan waktu untuk memberi label pada email mereka yang akan digunakan untuk men-training local spam filters yang ada di masing-masing komputer mereka. Namun di sisi lain email service provider mencoba untuk meringankan mereka dengan menggunakan spam filters pada server mereka. Dimana pada spam filters yang bersifat server-based ini tidak dapat menggunakan labeled email dari individual user, melainkan sumber yang ada secara umum, seperti email newsgroup atau email-email yang ditandai melalui spam traps. Sedangkan dari tiap individu pengguna email tersebut memiliki karakteristik yang berbeda dalam menyeleksi email yang dianggap spam atapun non-spam. Di dalam tugas akhir ini, data mining digunakan untuk memutuskan apakah suatu email yang diterima oleh user/pengguna email adalah suatu email spam atau non-spam dan menggunakan ECML PKDD 2006 Discovery Challenge Data Mining Competition sebagai studi kasus dengan data email yang telah dikodekan menjadi bag-of-word vector space sehingga tidak diketahui secara pasti bentuk sebenarnya dari email tersebut. Dan metoda Evolving Fuzzy Classifier digunakan untuk klasifikasi menentukan jenis dari email dengan algoritma genetika sebagai metoda pengembangan atau evolusi bagi Fuzzy Classifier yang adaKata Kunci : data mining, klasifikasi ,evolving fuzzy classifier , spam, genetic algorithm, non-spam, email, bag-of-word vector space, fuzzy classifier.ABSTRACT: People spend an increasing amount of time for reading email and deciding whether they are spam or non-spam. Some users spend additional time to label their received spam email for training local spam filters running on their computers. But in the other side email service providers want to relieve users from this burden by installing server-based spam filters in its server. Which this spam filters that has server-based characteristic can’t use labeled email from individual user, but from on publicly available sources, such as newsgroup messages or emails received through spam traps. While from that individual email user has each different characteristic when they decide an email is a spam or non-spam. In this final exam, data mining used to decide what if an email which is received by a user is a spam or non-spam and use as a case study for ECML PKDD2006 Discovery Challenge Data Mining Competition with email data that encode into bag-of-word vector space then we doesn’t know about the truth about that email. And with Evolving Fuzzy Classifier method used for classification to decide the type of those email with Genetic Algorithm as an evolution for existing Fuzzy Classifier.Keyword: data mining, classification, evolving fuzzy classifier, spam, genetic algorithm, non-spam, email, bag-of-word vector space, fuzzy classifier
    • …
    corecore