Search CORE

4 research outputs found

Recommended from our members

MapReduce based RDF assisted distributed SVM for high throughput spam filtering

Author: Caruana Godwin
Publication venue: Brunel University School of Engineering and Design PhD Theses
Publication date: 01/01/2013
Field of study

This thesis was submitted for the degree of Doctor of Philosophy and was awarded by Brunel UniversityElectronic mail has become cast and embedded in our everyday lives. Billions of legitimate emails are sent on a daily basis. The widely established underlying infrastructure, its widespread availability as well as its ease of use have all acted as catalysts to such pervasive proliferation. Unfortunately, the same can be alleged about unsolicited bulk email, or rather spam. Various methods, as well as enabling architectures are available to try to mitigate spam permeation. In this respect, this dissertation compliments existing survey work in this area by contributing an extensive literature review of traditional and emerging spam filtering approaches. Techniques, approaches and architectures employed for spam filtering are appraised, critically assessing respective strengths and weaknesses. Velocity, volume and variety are key characteristics of the spam challenge. MapReduce (M/R) has become increasingly popular as an Internet scale, data intensive processing platform. In the context of machine learning based spam filter training, support vector machine (SVM) based techniques have been proven effective. SVM training is however a computationally intensive process. In this dissertation, a M/R based distributed SVM algorithm for scalable spam filter training, designated MRSMO, is presented. By distributing and processing subsets of the training data across multiple participating computing nodes, the distributed SVM reduces spam filter training time significantly. To mitigate the accuracy degradation introduced by the adopted approach, a Resource Description Framework (RDF) based feedback loop is evaluated. Experimental results demonstrate that this improves the accuracy levels of the distributed SVM beyond the original sequential counterpart. Effectively exploiting large scale, ‘Cloud’ based, heterogeneous processing capabilities for M/R in what can be considered a non-deterministic environment requires the consideration of a number of perspectives. In this work, gSched, a Hadoop M/R based, heterogeneous aware task to node matching and allocation scheme is designed. Using MRSMO as a baseline, experimental evaluation indicates that gSched improves on the performance of the out-of-the box Hadoop counterpart in a typical Cloud based infrastructure. The focal contribution to knowledge is a scalable, heterogeneous infrastructure and machine learning based spam filtering scheme, able to capitalize on collaborative accuracy improvements through RDF based, end user feedback. MapReduce based RDF Assisted Distributed SVM for High Throughput Spam Filterin

Brunel University Research Archive

Towards an Effective Organization-Wide Bulk Email System

Author: Kong Ruoyan
Publication venue
Publication date: 17/08/2023
Field of study

Bulk email is widely used in organizations to communicate messages to employees. It is an important tool in making employees aware of policies, events, leadership updates, etc. However, in large organizations, the problem of overwhelming communication is widespread. Ineffective organizational bulk emails waste employees' time and organizations' money, and cause a lack of awareness or compliance with organizations' missions and priorities. This thesis focuses on improving organizational bulk email systems by 1) conducting qualitative research to understand different stakeholders; 2) conducting field studies to evaluate personalization's effects on getting employees to read bulk messages; 3) designing tools to support communicators in evaluating bulk emails. We performed these studies at the University of Minnesota, interviewing 25 employees (both senders and recipients), and including 317 participants in total. We found that the university's current bulk email system is ineffective as only 22% of the information communicated was retained by employees. To encourage employees to read high-level information, we implemented a multi-stakeholder personalization framework that mixed important-to-organization messages with employee-preferred messages and improved the studied bulk email's recognition rate by 20%. On the sender side, we iteratively designed a prototype of a bulk email evaluation platform. In field evaluation, we found bulk emails' message-level performance helped communicators in designing bulk emails. We collected eye-tracking data and developed a neural network technique to estimate how much time each message is being read using recipients' interactions with browsers only, which improved the estimation accuracy to 73%. In summary, this work sheds light on how to design organizational bulk email systems that communicate effectively and respect different stakeholders' value.Comment: PhD Thesi

arXiv.org e-Print Archive