Search CORE

1,485 research outputs found

Feature extraction and classification of spam emails

Author: Hassan Muhammad Ali
Mtetwa Nhamo
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 02/05/2019
Field of study

Crossref

ResearchOnline@GCU

Detecting spam e-mails using stop word TF-IDF and stemming algorithm with Naïve Bayes classifier on the multicore GPU

Author: Das Sukriti
Jaiswal Manjit
Khushboo Khushboo
Publication venue: 'Institute of Advanced Engineering and Science'
Publication date: 01/08/2021
Field of study

A spam filter is a program which is used to identify unwanted emails and prevents those messages from getting into a user's mail. The study was focused on how the algorithms can be applied on a number of e-mails consisting of both ham and spam e-mails. First, the working principle and steps which are followed for implementation of stop words, TF-IDF and stemming algorithm on NVIDIA’s Tesla P100 GPU are discussed and to verify the findings by executing of Naïve Bayes algorithm. After complete training and testing of the spam e-mails dataset taken from Kaggle by using the proposed method, we got a high training accuracy of 99.67% and got a testing accuracy of about 99.03% on the multicore GPU that boosted the speed of execution of training time period and testing time period which is improved of training and testing accuracy around 0.22% and 0.18% respectively when compared to that after applying only Naïve Bayes i.e. conventional method to the same dataset where we found training and testing accuracy to be 99.45% and 98.85% respectively. Also, we found that training time taken on GPU is 1.361 seconds which was about 1.49X faster than that taken on CPU which is 2.029 seconds. And the testing time taken on GPU is 1.978 seconds which was about 1.15X faster than that taken on CPU which is 2.280 seconds

ZENODO

Institute of Advanced Engineering and Science

Data Sets: Word Embeddings Learned from Tweets and General Data

Author: Li Quanzhi
Liu Xiaomo
Nourbakhsh Armineh
Shah Sameena
Publication venue
Publication date: 03/05/2017
Field of study

A word embedding is a low-dimensional, dense and real- valued vector representation of a word. Word embeddings have been used in many NLP tasks. They are usually gener- ated from a large text corpus. The embedding of a word cap- tures both its syntactic and semantic aspects. Tweets are short, noisy and have unique lexical and semantic features that are different from other types of text. Therefore, it is necessary to have word embeddings learned specifically from tweets. In this paper, we present ten word embedding data sets. In addition to the data sets learned from just tweet data, we also built embedding sets from the general data and the combination of tweets with the general data. The general data consist of news articles, Wikipedia data and other web data. These ten embedding models were learned from about 400 million tweets and 7 billion words from the general text. In this paper, we also present two experiments demonstrating how to use the data sets in some NLP tasks, such as tweet sentiment analysis and tweet topic classification tasks

arXiv.org e-Print Archive

Association for the Advancement of Artificial Intelligence: AAAI Publications

Camouflages and Token Manipulations-The Changing Faces of the Nigerian Fraudulent 419 Spammers

Author: CHIEMEKE Stella Chinye
LONGE Folake Adunni
LONGE Olumide Babatope
ONIFADE Olufade F. Williams
Publication venue: 'University of Technology, Sydney (UTS)'
Publication date: 14/01/2009
Field of study

The inefficiencies of current spam filters against fraudulent (419) mails is not unrelated to the use by spammers of good-word attacks, topic drifts, parasitic spamming, wrong categorization and recategorization of electronic mails by e-mail clients and of course the fuzzy factors of greed and gullibility on the part of the recipients who responds to fraudulent spam mail offers. In this paper, we establish that mail token manipulations remain, above any other tactics, the most potent tool used by Nigerian scammers to fool statistical spam filters. While hoping that the uncovering of these manipulative evidences will prove useful in future antispam research, our findings also sensitize spam filter developers on the need to inculcate within their antispam architecture robust modules that can deal with the identified camouflages

UTS ePress

Text Mining Agent Using Hybrid Neural Networks

Author: Flaih Laith R.
Muhammed Saja A.
Publication venue: The International Institute for Science, Technology and Education (IISTE)
Publication date: 29/12/2013
Field of study

With the development of internet services and Electronic Mail communication, number of spam, advertise, or unwanted E-mails have been grown dramatically, tens of such annoying and time consuming E-mails arrives mail boxes every day and this issue becomes more critical with the availability of the internet services for children these days. The system proposed in this paper tries to solve this problem using intelligent software agent that checks each incoming and outgoing messages and blocks unwanted messages or replace the undesired words. The proposed system also offers creating dynamic number of agents according the users desire by creating dynamic number of rules that satisfies user’s requirements and objectives. keywords: Agent, Text mining, Intelligent Agent, Electronic Mai

International Institute for Science, Technology and Education (IISTE): E-Journals

Using Text Mining to Analyze Quality Aspects of Unstructured Data: A Case Study for “stock-touting” Spam Emails

Author: Diaz David
Theodoulidis Babis
Zaki Mohamed
Publication venue: AIS Electronic Library (AISeL)
Publication date: 01/01/2010
Field of study

The growth in the utilization of text mining tools and techniques in the last decade has been primarily driven by the increase in the sheer volume of unstructured texts and the need to extract useful and more importantly, quality information from them. The impetus to analyse unstructured data efficiently and effectively as part of the decision making processes within an organization has further motivated the need to better understand how to use text mining tools and techniques. This paper describes a case study of a stock spam e-mail architecture that demonstrates the process of refining linguistic resources to extract relevant, high quality information including stock profile, financial key words, stock and company news (positive/negative), and compound phrases from stock spam e-mails. The context of such a study is to identify high quality information patterns that can be used to support relevant authorities in detecting and analyzing fraudulent activities

The University of Manchester - Institutional Repository

AIS Electronic Library (AISeL)