Search CORE

22 research outputs found

Not that Simple: {E}mail Delivery in the 21st Century

Author: Fiebig T.
Holzbauer F.
Lindorfer M.
Ullrich J.
Publication venue
Publication date: 01/01/2022
Field of study

An ownership-base message admission control mechanism for curbing spam

Author: Geng Hongxing
Publication venue: 'University of Saskatchewan Library'
Publication date
Field of study

Unsolicited e-mail has brought much annoyance to users, thus, making e-mail less reliable as a communication tool. This has happened because current email architecture has key limitations. For instance, while it allows senders to send as many messages as they want, it does not provide adequate capability to recipients to prevent unrestricted access to their mailbox. This research develops a new approach to equip recipients with ability to control access to their mailbox.This thesis builds an ownership-based approach to control mailbox usage employing the CyberOrgs model. CyberOrgs is a model that provides facilities to control resources in multi-agent systems. We consider a mailbox to be a precious resource of its owner. Any access to the resource requires its owner's permission. Thus, we give recipients a capability to manage their valuable resource - mailbox. In our approach, message senders obtain a permission to send messages through negotiation. In this negotiation, a sender makes a proposal and the intended recipient evaluates the proposal according to their own policies. A sender's desired outcome of a negotiation is a contract, which conducts the subsequent communication between the sender and the recipient. Contracts help senders and recipients construct a long-term relationship.Besides allowing individuals to control their mailbox, we consider groups, which represent organizations in human society, in order to allow organizations to manage their resources including mailboxes, message sending allowances, and contracts.A prototype based on our approach is implemented. In the prototype, policies are separated from the mechanisms. Examples of policies are presented and a public policy interface is exposed to allow programmers to develop custom policies. Experimental results demonstrate that the system performance is policy-dependent. In other words, as long as policies are carefully designed, communication involving negotiation has minimal overhead compared to communication in which senders deliver messages to recipients directly

eCommons@USASK

University of Saskatchewan Research Archive

An approach to preventing spam using Access Codes with a combination of anti-spam mechanisms

Author: Akhtar H. Khalil (7201979)
Publication venue
Publication date: 01/01/2009
Field of study

Spam is becoming a more and more severe problem for individuals, networks, organisations and businesses. The losses caused by spam are billions of dollars every year. Research shows that spam contributes more than 80% of e-mails with an increased in its growth rate every year. Spam is not limited to emails; it has started affecting other technologies like VoIP, cellular and traditional telephony, and instant messaging services. None of the approaches (including legislative, collaborative, social awareness and technological) separately or in combination with other approaches, can prevent sufficient of the spam to be deemed a solution to the spam problem. The severity of the spam problem and the limitations of the state-of-the-Art solutions create a strong need for an efficient anti-spam mechanism that can prevent significant volumes of spam without showing any false positives. This can be achieved by an efficient anti-spam mechanism such as the proposed anti-spam mechanism known as "Spam Prevention using Access Codes", SPAC. SPAC targets spam from two angles i.e. to prevent/block spam and to discourage spammers by making the infrastructure environment very unpleasant for them. In addition to the idea of Access Codes, SPAC combines the ideas behind some of the key current technological anti-spam measures to increase effectiveness. The difference in this work is that SPAC uses those ideas effectively and combines them in a unique way which enables SPAC to acquire the good features of a number of technological anti-spam approaches without showing any of the drawbacks of these approaches. Sybil attacks, Dictionary attacks and address spoofing have no impact on the performance of SPAC. In fact SPAC functions in a similar way (i.e. as for unknown persons) for these sorts of attacks. An application known as the "SPAC application" has been developed to test the performance of the SPAC mechanism. The results obtained from various tests on the SPAC application show that SPAC has a clear edge over the existing anti-spam technological approaches

Loughborough University Institutional Repository

Evaluation of Email Spam Detection Techniques

Author: Guda Seshi Reddy
Publication venue: The Repository at St. Cloud State
Publication date: 01/05/2022
Field of study

Email has become a vital form of communication among individuals and organizations in today’s world. However, simultaneously it became a threat to many users in the form of spam emails which are also referred as junk/unsolicited emails. Most of the spam emails received by the users are in the form of commercial advertising, which usually carry computer viruses without any notifications. Today, 95% of the email messages across the world are believed to be spam, therefore it is essential to develop spam detection techniques. There are different techniques to detect and filter the spam emails, but off recently all the developed techniques are being implemented successfully to minimize the threats. This paper describes how the current spam email detection approaches are determining and evaluating the problems. There are different types of techniques developed based on Reputation, Origin, Words, Multimedia, Textual, Community, Rules, Hybrid, Machine learning, Fingerprint, Social networks, Protocols, Traffic analysis, OCR techniques, Low-level features, and many other techniques. All these filtering techniques are developed to detect and evaluate spam emails. Along with classification of the email messages into spam or ham, this paper also demonstrates the effectiveness and accuracy of the spam detection techniques

St. Cloud State University

How to accelerate your internet : a practical guide to bandwidth management and optimisation using open source software

Author: Flickenger Rob (Ed.)
Publication venue: [S.l.] : INASP/ICTP, 2006.
Publication date: 01/01/2006
Field of study

xiii, 298 p. : ill. ; 24 cm.Libro ElectrónicoAccess to sufficient Internet bandwidth enables worldwide electronic collaboration, access to informational resources, rapid and effective communication, and grants membership to a global community. Therefore, bandwidth is probably the single most critical resource at the disposal of a modern organisation. The goal of this book is to provide practical information on how to gain the largest possible benefit from your connection to the Internet. By applying the monitoring and optimisation techniques discussed here, the effectiveness of your network can be significantly improved

Metabiblioteca-Biblioteca Digital Libros Abiertos

Personal Email Spam Filtering with Minimal User Interaction

Author: Mojdeh Mona
Publication venue: 'University of Waterloo'
Publication date: 01/01/2012
Field of study

This thesis investigates ways to reduce or eliminate the necessity of user input to learning-based personal email spam filters. Personal spam filters have been shown in previous studies to yield superior effectiveness, at the cost of requiring extensive user training which may be burdensome or impossible. This work describes new approaches to solve the problem of building a personal spam filter that requires minimal user feedback. An initial study investigates how well a personal filter can learn from different sources of data, as opposed to user’s messages. Our initial studies show that inter-user training yields substantially inferior results to intra-user training using the best known methods. Moreover, contrary to previous literature, it is found that transfer learning degrades the performance of spam filters when the source of training and test sets belong to two different users or different times. We also adapt and modify a graph-based semi-supervising learning algorithm to build a filter that can classify an entire inbox trained on twenty or fewer user judgments. Our experiments show that this approach compares well with previous techniques when trained on as few as two training examples. We also present the toolkit we developed to perform privacy-preserving user studies on spam filters. This toolkit allows researchers to evaluate any spam filter that conforms to a standard interface defined by TREC, on real users’ email boxes. Researchers have access only to the TREC-style result file, and not to any content of a user’s email stream. To eliminate the necessity of feedback from the user, we build a personal autonomous filter that learns exclusively on the result of a global spam filter. Our laboratory experiments show that learning filters with no user input can substantially improve the results of open-source and industry-leading commercial filters that employ no user-specific training. We use our toolkit to validate the performance of the autonomous filter in a user study

University of Waterloo's Institutional Repository

Efficient feature reduction and classification methods

Author: Janecek Andreas
Publication venue
Publication date: 01/01/2009
Field of study

Durch die steigende Anzahl verfügbarer Daten in unterschiedlichsten Anwendungsgebieten nimmt der Aufwand vieler Data-Mining Applikationen signifikant zu. Speziell hochdimensionierte Daten (Daten die über viele verschiedene Attribute beschrieben werden) können ein großes Problem für viele Data-Mining Anwendungen darstellen. Neben höheren Laufzeiten können dadurch sowohl für überwachte (supervised), als auch nicht überwachte (unsupervised) Klassifikationsalgorithmen weitere Komplikationen entstehen (z.B. ungenaue Klassifikationsgenauigkeit, schlechte Clustering-Eigenschaften, …). Dies führt zu einem Bedarf an effektiven und effizienten Methoden zur Dimensionsreduzierung. Feature Selection (die Auswahl eines Subsets von Originalattributen) und Dimensionality Reduction (Transformation von Originalattribute in (Linear)-Kombinationen der Originalattribute) sind zwei wichtige Methoden um die Dimension von Daten zu reduzieren. Obwohl sich in den letzten Jahren vielen Studien mit diesen Methoden beschäftigt haben, gibt es immer noch viele offene Fragestellungen in diesem Forschungsgebiet. Darüber hinaus ergeben sich in vielen Anwendungsbereichen durch die immer weiter steigende Anzahl an verfügbaren und verwendeten Attributen und Features laufend neue Probleme. Das Ziel dieser Dissertation ist es, verschiedene Fragenstellungen in diesem Bereich genau zu analysieren und Verbesserungsmöglichkeiten zu entwickeln. Grundsätzlich, werden folgende Ansprüche an Methoden zur Feature Selection und Dimensionality Reduction gestellt: Die Methoden sollten effizient (bezüglich ihres Rechenaufwandes) sein und die resultierenden Feature-Sets sollten die Originaldaten möglichst kompakt repräsentieren können. Darüber hinaus ist es in vielen Anwendungsgebieten wichtig, die Interpretierbarkeit der Originaldaten beizubehalten. Letztendlich sollte der Prozess der Dimensionsreduzierung keinen negativen Effekt auf die Klassifikationsgenauigkeit haben - sondern idealerweise, diese noch verbessern. Offene Problemstellungen in diesem Bereich betreffen unter anderem den Zusammenhang zwischen Methoden zur Dimensionsreduzierung und der resultierenden Klassifikationsgenauigkeit, wobei sowohl eine möglichst kompakte Repräsentation der Daten, als auch eine hohe Klassifikationsgenauigkeit erzielt werden sollen. Wie bereits erwähnt, ergibt sich durch die große Anzahl an Daten auch ein erhöhter Rechenaufwand, weshalb schnelle und effektive Methoden zur Dimensionsreduzierung entwickelt werden müssen, bzw. existierende Methoden verbessert werden müssen. Darüber hinaus sollte natürlich auch der Rechenaufwand der verwendeten Klassifikationsmethoden möglichst gering sein. Des Weiteren ist die Interpretierbarkeit von Feature Sets zwar möglich, wenn Feature Selection Methoden für die Dimensionsreduzierung verwendet werden, im Fall von Dimensionality Reduction sind die resultierenden Feature Sets jedoch meist Linearkombinationen der Originalfeatures. Daher ist es schwierig zu überprüfen, wie viel Information einzelne Originalfeatures beitragen. Im Rahmen dieser Dissertation konnten wichtige Beiträge zu den oben genannten Problemstellungen präsentiert werden: Es wurden neue, effiziente Initialisierungsvarianten für die Dimensionality Reduction Methode Nonnegative Matrix Factorization (NMF) entwickelt, welche im Vergleich zu randomisierter Initialisierung und im Vergleich zu State-of-the-Art Initialisierungsmethoden zu einer schnelleren Reduktion des Approximationsfehlers führen. Diese Initialisierungsvarianten können darüber hinaus mit neu entwickelten und sehr effektiven Klassifikationsalgorithmen basierend auf NMF kombiniert werden. Um die Laufzeit von NMF weiter zu steigern wurden unterschiedliche Varianten von NMF Algorithmen auf Multi-Prozessor Systemen vorgestellt, welche sowohl Task- als auch Datenparallelismus unterstützen und zu einer erheblichen Reduktion der Laufzeit für NMF führen. Außerdem wurde eine effektive Verbesserung der Matlab Implementierung des ALS Algorithmus vorgestellt. Darüber hinaus wurde eine Technik aus dem Bereich des Information Retrieval -- Latent Semantic Indexing -- erfolgreich als Klassifikationsalgorithmus für Email Daten angewendet. Schließlich wurde eine ausführliche empirische Studie über den Zusammenhang verschiedener Feature Reduction Methoden (Feature Selection und Dimensionality Reduction) und der resultierenden Klassifikationsgenauigkeit unterschiedlicher Lernalgorithmen präsentiert. Der starke Einfluss unterschiedlicher Methoden zur Dimensionsreduzierung auf die resultierende Klassifikationsgenauigkeit unterstreicht dass noch weitere Untersuchungen notwendig sind um das komplexe Zusammenspiel von Dimensionsreduzierung und Klassifikation genau analysieren zu können.The sheer volume of data today and its expected growth over the next years are some of the key challenges in data mining and knowledge discovery applications. Besides the huge number of data samples that are collected and processed, the high dimensional nature of data arising in many applications causes the need to develop effective and efficient techniques that are able to deal with this massive amount of data. In addition to the significant increase in the demand of computational resources, those large datasets might also influence the quality of several data mining applications (especially if the number of features is very high compared to the number of samples). As the dimensionality of data increases, many types of data analysis and classification problems become significantly harder. This can lead to problems for both supervised and unsupervised learning. Dimensionality reduction and feature (subset) selection methods are two types of techniques for reducing the attribute space. While in feature selection a subset of the original attributes is extracted, dimensionality reduction in general produces linear combinations of the original attribute set. In both approaches, the goal is to select a low dimensional subset of the attribute space that covers most of the information of the original data. During the last years, feature selection and dimensionality reduction techniques have become a real prerequisite for data mining applications. There are several open questions in this research field, and due to the often increasing number of candidate features for various application areas (e.\,g., email filtering or drug classification/molecular modeling) new questions arise. In this thesis, we focus on some open research questions in this context, such as the relationship between feature reduction techniques and the resulting classification accuracy and the relationship between the variability captured in the linear combinations of dimensionality reduction techniques (e.\,g., PCA, SVD) and the accuracy of machine learning algorithms operating on them. Another important goal is to better understand new techniques for dimensionality reduction, such as nonnegative matrix factorization (NMF), which can be applied for finding parts-based, linear representations of nonnegative data. This ``sum-of-parts'' representation is especially useful if the interpretability of the original data should be retained. Moreover, performance aspects of feature reduction algorithms are investigated. As data grow, implementations of feature selection and dimensionality reduction techniques for high-performance parallel and distributed computing environments become more and more important. In this thesis, we focus on two types of open research questions: methodological advances without any specific application context, and application-driven advances for a specific application context. Summarizing, new methodological contributions are the following: The utilization of nonnegative matrix factorization in the context of classification methods is investigated. In particular, it is of interest how the improved interpretability of NMF factors due to the non-negativity constraints (which is of central importance in various problem settings) can be exploited. Motivated by this problem context two new fast initialization techniques for NMF based on feature selection are introduced. It is shown how approximation accuracy can be increased and/or how computational effort can be reduced compared to standard randomized seeding of the NMF and to state-of-the-art initialization strategies suggested earlier. For example, for a given number of iterations and a required approximation error a speedup of 3.6 compared to standard initialization, and a speedup of 3.4 compared to state-of-the-art initialization strategies could be achieved. Beyond that, novel classification methods based on the NMF are proposed and investigated. We can show that they are not only competitive in terms of classification accuracy with state-of-the-art classifiers, but also provide important advantages in terms of computational effort (especially for low-rank approximations). Moreover, parallelization and distributed execution of NMF is investigated. Several algorithmic variants for efficiently computing NMF on multi-core systems are studied and compared to each other. In particular, several approaches for exploiting task and/or data-parallelism in NMF are studied. We show that for some scenarios new algorithmic variants clearly outperform existing implementations. Last, but not least, a computationally very efficient adaptation of the implementation of the ALS algorithm in Matlab 2009a is investigated. This variant reduces the runtime significantly (in some settings by a factor of 8) and also provides several possibilities to be executed concurrently. In addition to purely methodological questions, we also address questions arising in the adaptation of feature selection and classification methods to two specific application problems: email classification and in silico screening for drug discovery. Different research challenges arise in the contexts of these different application areas, such as the dynamic nature of data for email classification problems, or the imbalance in the number of available samples of different classes for drug discovery problems. Application-driven advances of this thesis comprise the adaptation and application of latent semantic indexing (LSI) to the task of email filtering. Experimental results show that LSI achieves significantly better classification results than the widespread de-facto standard method for this special application context. In the context of drug discovery problems, several groups of well discriminating descriptors could be identified by utilizing the ``sum-of-parts`` representation of NMF. The number of important descriptors could be further increased when applying sparseness constraints on the NMF factors

OTHES

VoIP security - attacks and solutions

Author: Arkko J.
Baugher M.
Baumann R.
Enkh-Amgalan Baatarjav
Kent S.
Menezes A.
Radermacher T. A.
Ram Dantu
Ramsdell B.
Rohwer T.
Rosenberg J.
Rosenberg J.
Santi Phithakkitnukoon
Schulzrinne H.
Schwartz D.
Sengar H.
Sterman B.
Thermos P.
Vinokurov D.
Publication venue: 'Informa UK Limited'
Publication date: 01/01/2008
Field of study

Voice over IP (VoIP) technology is being extensively and rapidly deployed. Flexibility and cost efficiency are the key factors luring enterprises to transition to VoIP. Some security problems may surface with the widespread deployment of VoIP. This article presents an overview of VoIP systems and its security issues. First, we briefly describe basic VoIP architecture and its fundamental differences compared to PSTN. Next, basic VoIP protocols used for signaling and media transport, as well as defense mechanisms are described. Finally, current and potential VoIP attacks along with the approaches that have been adopted to counter the attacks are discussed

CiteSeerX

Crossref

Open Research Online (The Open University)

Recommended from our members

MapReduce based RDF assisted distributed SVM for high throughput spam filtering

Author: Caruana Godwin
Publication venue: Brunel University School of Engineering and Design PhD Theses
Publication date: 01/01/2013
Field of study

This thesis was submitted for the degree of Doctor of Philosophy and was awarded by Brunel UniversityElectronic mail has become cast and embedded in our everyday lives. Billions of legitimate emails are sent on a daily basis. The widely established underlying infrastructure, its widespread availability as well as its ease of use have all acted as catalysts to such pervasive proliferation. Unfortunately, the same can be alleged about unsolicited bulk email, or rather spam. Various methods, as well as enabling architectures are available to try to mitigate spam permeation. In this respect, this dissertation compliments existing survey work in this area by contributing an extensive literature review of traditional and emerging spam filtering approaches. Techniques, approaches and architectures employed for spam filtering are appraised, critically assessing respective strengths and weaknesses. Velocity, volume and variety are key characteristics of the spam challenge. MapReduce (M/R) has become increasingly popular as an Internet scale, data intensive processing platform. In the context of machine learning based spam filter training, support vector machine (SVM) based techniques have been proven effective. SVM training is however a computationally intensive process. In this dissertation, a M/R based distributed SVM algorithm for scalable spam filter training, designated MRSMO, is presented. By distributing and processing subsets of the training data across multiple participating computing nodes, the distributed SVM reduces spam filter training time significantly. To mitigate the accuracy degradation introduced by the adopted approach, a Resource Description Framework (RDF) based feedback loop is evaluated. Experimental results demonstrate that this improves the accuracy levels of the distributed SVM beyond the original sequential counterpart. Effectively exploiting large scale, ‘Cloud’ based, heterogeneous processing capabilities for M/R in what can be considered a non-deterministic environment requires the consideration of a number of perspectives. In this work, gSched, a Hadoop M/R based, heterogeneous aware task to node matching and allocation scheme is designed. Using MRSMO as a baseline, experimental evaluation indicates that gSched improves on the performance of the out-of-the box Hadoop counterpart in a typical Cloud based infrastructure. The focal contribution to knowledge is a scalable, heterogeneous infrastructure and machine learning based spam filtering scheme, able to capitalize on collaborative accuracy improvements through RDF based, end user feedback. MapReduce based RDF Assisted Distributed SVM for High Throughput Spam Filterin

Brunel University Research Archive

Recommended from our members

Scaling up VoIP: Transport Protocols and Controlling Unwanted Communication Requests

Author: Ono Kumiko
Publication venue: 'Columbia University Libraries/Information Services'
Publication date: 01/01/2012
Field of study

Millions of people worldwide use voice over IP (VoIP) services not only as cost-effective alternatives to long distance and international calls but also as unified communication tools, such as video conferencing. Owing to the low cost of new user accounts, each person can easily obtain multiple accounts for various purposes. Rich VoIP functions combined with the low cost of new accounts and connections attract many people, resulting in a dramatic increase in the number of active user accounts. Internet telephony service providers (ITSPs), therefore, need to deploy VoIP systems to accommodate this growing demand for VoIP user accounts. Attracted people also include bad actors who make calls that are unwanted to callees. Once ITSPs openly connect with each other, unwanted bulk calls will be at least as serious a problem as email spam. This dissertation studies how we can reduce load both on ITSPs and end users to ensure continuing the success of VoIP services. From ITSPs' perspective, the scalability of VoIP servers is of importance and concern. Scalability depends on server implementation and the transport protocol for SIP, VoIP signaling. We conduct experiments to understand the impact of connection-oriented transport protocols, namely, TCP and SCTP, because of the additional costs of handling connections. Contradicting the negative perception of connection-oriented transport protocols, our experimental results demonstrate that the TCP implementation in Linux can maintain comparable capacity to UDP, which is a lightweight connection-less transport protocol. The use of SCTP, on the other hand, requires improving the Linux implementation since the not-well-tested implementation makes a server less scalable. We establish the maximum number of concurrent TCP or SCTP connections as baseline data and suggest better server configurations to minimize the negative impact of handling a large number of connections. Thus, our experimental analysis will also contribute to the design of other servers with a very large number of TCP or SCTP connections. From the perspective of end users, controlling unwanted calls is vital to preserving the VoIP service utility and value. Prior work on preventing unwanted email or calls has mainly focused on detecting unwanted communication requests, leaving many messages or calls unlabeled since false positives during filtering are unacceptable. Unlike prior work, we explore approaches to identifying a "good" call based on signaling messages rather than content. This is because content-based filtering cannot prevent call spam from disturbing callees since a ringing tone interrupts them before content is sent. Our first approach uses "cross-media relations.'' Calls are unlikely to be unwanted if two parties have been previously communicated with each other through other communication means. Specifically, we propose two mechanisms using cross-media relations. For the first mechanism, a potential caller offers her contact addresses which might be used in future calls to the callee. For the second mechanism, a callee provides a potential caller with weak secret for future use. When the caller makes a call, she conveys the information to be identified as someone the callee contacted before through other means. Our prototype illustrates how these mechanisms work in web-then-call and email-then-call scenarios. In addition, our user study of received email messages, calls, SMS messages demonstrates the potential effectiveness of this idea. Another approach uses caller's attributes, such as organizational affiliation, in the case where two parties have had no prior contact. We introduce a lightweight mechanism for validating user attributes with privacy-awareness and moderate security. Unlike existing mechanisms of asserting user attributes, we design to allow the caller to claim her attributes to callees without needing to prove her identity or her public key. To strike the proper balance between the ease of service deployment and security, our proposed mechanism relies on transitive trust, through an attribute validation server, established over transport layer security. This mechanism uses an attribute reference ID, which limits the lifetime and restricts relying parties. Our prototype demonstrates the simplicity of our concept and the possibility of practical use

Columbia University Academic Commons