146 research outputs found

    Enforcing email addresses privacy using tokens

    Get PDF
    We propose a system which allows users to monitor how their email addresses are used and how they spread over the Internet. This protects the privacy of the user and can reduce the spam phenomenon. Our solution does not require changes to the email infrastructure, can be set up by the end user on an individual basis and is compatible with any email client as long as emails are centralized on a server (e.g. an IMAP server). Nevertheless, it requires that people use email messaging quite differentl

    Enforcing Email Addresses Privacy Using Tokens

    Full text link

    Protection Against Spam Using Pre-Challenges

    Full text link
    Abstract: Spam turns out to be an increasingly serious problem to email users. A number of anti-spam schemes have been proposed and deployed, but the problem has yet been well addressed. One of those schemes is challenge-response, in which a challenge is imposed on an email sender. However, such a scheme introduces new problems for the users, e.g., delay of service and denial of service attacks. In this paper, we introduce a pre-challenge scheme that avoids those problems. It assumes each user has a challenge that is defined by the user himself/herself and associated with his/her email address, in such a way that an email sender can simultaneously retrieve a new receiver's email address and challenge before sending an email in the first contact. Some new mechanisms are employed to reach a good balance between security against spam and convenience to email users

    Towards secure message systems

    Get PDF
    Message systems, which transfer information from sender to recipient via communication networks, are indispensable to our modern society. The enormous user base of message systems and their critical role in information delivery make it the top priority to secure message systems. This dissertation focuses on securing the two most representative and dominant messages systems---e-mail and instant messaging (IM)---from two complementary aspects: defending against unwanted messages and ensuring reliable delivery of wanted messages.;To curtail unwanted messages and protect e-mail and instant messaging users, this dissertation proposes two mechanisms DBSpam and HoneyIM, which can effectively thwart e-mail spam laundering and foil malicious instant message spreading, respectively. DBSpam exploits the distinct characteristics of connection correlation and packet symmetry embedded in the behavior of spam laundering and utilizes a simple statistical method, Sequential Probability Ratio Test, to detect and break spam laundering activities inside a customer network in a timely manner. The experimental results demonstrate that DBSpam is effective in quickly and accurately capturing and suppressing e-mail spam laundering activities and is capable of coping with high speed network traffic. HoneyIM leverages the inherent characteristic of spreading of IM malware and applies the honey-pot technology to the detection of malicious instant messages. More specifically, HoneyIM uses decoy accounts in normal users\u27 contact lists as honey-pots to capture malicious messages sent by IM malware and suppresses the spread of malicious instant messages by performing network-wide blocking. The efficacy of HoneyIM has been validated through both simulations and real experiments.;To improve e-mail reliability, that is, prevent losses of wanted e-mail, this dissertation proposes a collaboration-based autonomous e-mail reputation system called CARE. CARE introduces inter-domain collaboration without central authority or third party and enables each e-mail service provider to independently build its reputation database, including frequently contacted and unacquainted sending domains, based on the local e-mail history and the information exchanged with other collaborating domains. The effectiveness of CARE on improving e-mail reliability has been validated through a number of experiments, including a comparison of two large e-mail log traces from two universities, a real experiment of DNS snooping on more than 36,000 domains, and extensive simulation experiments in a large-scale environment

    A real-time system for abusive network traffic detection

    Get PDF
    Abusive network traffic--to include unsolicited e-mail, malware propagation, and denial-of-service attacks--remains a constant problem in the Internet. Despite extensive research in, and subsequent deployment of, abusive-traffic detection infrastructure, none of the available techniques addresses the problem effectively or completely. The fundamental failing of existing methods is that spammers and attack perpetrators rapidly adapt to and circumvent new mitigation techniques. Analyzing network traffic by exploiting transport-layer characteristics can help remedy this and provide effective detection of abusive traffic. Within this framework, we develop a real-time, online system that integrates transport layer characteristics into the existing SpamAssasin tool for detecting unsolicited commercial e-mail (spam). Specifically, we implement the previously proposed, but undeveloped, SpamFlow technique. We determine appropriate algorithms based on classification performance, training required, adaptability, and computational load. We evaluate system performance in a virtual test bed and live environment and present analytical results. Finally, we evaluate our system in the context of Spam Assassin's auto-learning mode, providing an effective method to train the system without explicit user interaction or feedback.http://archive.org/details/arealtimesystemf109455754Outstanding ThesisApproved for public release; distribution is unlimited

    Towards Least Privilege Principle: Limiting Unintended Accesses in Software Systems.

    Full text link
    Adhering to the least privilege principle involves ensuring that only legitimate subjects have access rights to objects. Sometimes, this is hard because of permission irrevocability, changing security requirements, infeasibility of access control mechanisms, and permission creeps. If subjects turn rogue, the accesses can be abused. This thesis examines three scenarios where accesses are commonly abused and lead to security issues, and proposes three systems, SEAL, DeGap, and Expose to detect and, where practical, eliminate unintended accesses. Firstly, we examine abuse of email addresses, whose leakages are irreversible. Also, users can only hope that businesses requiring their email addresses for validating affiliations do not misuse them. SEAL uses semi-private aliases, which permits gradual and selective controls while providing privacy for affiliation validations. Secondly, access control mechanisms may be ineffective as subject roles change and administrative oversights lead to permission gaps, which should be removed expeditiously. Identifying permission gaps can be hard since another reference point besides granted permissions is often unavailable. DeGap uses access logs to estimate the gaps while using a common logic for various system services. DeGap also recommends configuration changes towards reducing the gaps. Lastly, unintended software code re-use can lead to intellectual property theft and license violations. Determining whether an application uses a library can be difficult. Compiler optimizations, function inlining, and lack of symbols make using syntactic methods a challenge, while pure semantic analysis is slow. Given a library and a set of applications, Expose combines syntactic and semantic analysis to efficiently help identify applications that re-use the library.PhDComputer Science & EngineeringUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/99976/1/bengheng_1.pd

    Rede automatizada de spamtraps

    Get PDF
    Dissertação de mestrado em Engenharia InformáticaSpam é o termo que define o envio de mensagens não solicitadas por meios eletrónicos. A vertente do fenómeno associada ao email é um problema que afeta a Internet desde os seus primórdios. São muitas e sofisticadas as técnicas adotadas pelos spammers para espalhar mensagens não solicitadas. Apesar disso, os mecanismos anti-spam têm vindo a tornar-se cada vez mais eficazes. Uma técnica de combate ao spam que se tem tornado cada vez mais popular é a utilização de spamtraps. Tratam-se de normais endereços de email que não são utilizados por humanos, tendo apenas como objetivo a captura de mensagens não solicitadas e por isso classificadas como spam. Estes endereços são colocados em locais estratégicos na Internet para que os spammers os recolham e os utilizem para enviar spam. Neste trabalho é proposto um novo modo de combater o spam tendo por base a distribuição automática de spamtraps por diversos locais. O projeto de software baseado neste princípio será responsável pela receção e encaminhamento das mensagens recebidas pelas spamtraps. Estas mensagens de spam, futuramente, serão utilizadas para alimentar um sistema de filtragem de spam já existente, de modo a que este esteja sempre atualizado com os conteúdos mais recentes enviados pelos spammers, bem como das mais diversas informações relativas aos remetentes destas

    Cyber indicators of compromise: a domain ontology for security information and event management

    Get PDF
    It has been said that cyber attackers are attacking at wire speed (very fast), while cyber defenders are defending at human speed (very slow). Researchers have been working to improve this asymmetry by automating a greater portion of what has traditionally been very labor-intensive work. This work is involved in both the monitoring of live system events (to detect attacks), and the review of historical system events (to investigate attacks). One technology that is helping to automate this work is Security Information and Event Management (SIEM). In short, SIEM technology works by aggregating log information, and then sifting through this information looking for event correlations that are highly indicative of attack activity. For example: Administrator successful local logon and (concurrently) Administrator successful remote logon. Such correlations are sometimes referred to as indicators of compromise (IOCs). Though IOCs for network-based data (i.e., packet headers and payload) are fairly mature (e.g., Snort's large rule-base), the field of end-device IOCs is still evolving and lacks any well-defined go-to standard accepted by all. This report addresses ontological issues pertaining to end-device IOCs development, including what they are, how they are defined, and what dominant early standards already exist.http://archive.org/details/cyberindicatorso1094553041Lieutenant, United States NavyApproved for public release; distribution is unlimited

    Intrusion detection and management over the world wide web

    Get PDF
    As the Internet and society become ever more integrated so the number of Internet users continues to grow. Today there are 1.6 billion Internet users. They use its services to work from home, shop for gifts, socialise with friends, research the family holiday and manage their finances. Through generating both wealth and employment the Internet and our economies have also become interwoven. The growth of the Internet has attracted hackers and organised criminals. Users are targeted for financial gain through malware and social engineering attacks. Industry has responded to the growing threat by developing a range defences: antivirus software, firewalls and intrusion detection systems are all readily available. Yet the Internet security problem continues to grow and Internet crime continues to thrive. Warnings on the latest application vulnerabilities, phishing scams and malware epidemics are announced regularly and serve to heighten user anxiety. Not only are users targeted for attack but so too are businesses, corporations, public utilities and even states. Implementing network security remains an error prone task for the modern Internet user. In response this thesis explores whether intrusion detection and management can be effectively offered as a web service to users in order to better protect them and heighten their awareness of the Internet security threat

    Efficient feature reduction and classification methods

    Get PDF
    Durch die steigende Anzahl verfügbarer Daten in unterschiedlichsten Anwendungsgebieten nimmt der Aufwand vieler Data-Mining Applikationen signifikant zu. Speziell hochdimensionierte Daten (Daten die über viele verschiedene Attribute beschrieben werden) können ein großes Problem für viele Data-Mining Anwendungen darstellen. Neben höheren Laufzeiten können dadurch sowohl für überwachte (supervised), als auch nicht überwachte (unsupervised) Klassifikationsalgorithmen weitere Komplikationen entstehen (z.B. ungenaue Klassifikationsgenauigkeit, schlechte Clustering-Eigenschaften, …). Dies führt zu einem Bedarf an effektiven und effizienten Methoden zur Dimensionsreduzierung. Feature Selection (die Auswahl eines Subsets von Originalattributen) und Dimensionality Reduction (Transformation von Originalattribute in (Linear)-Kombinationen der Originalattribute) sind zwei wichtige Methoden um die Dimension von Daten zu reduzieren. Obwohl sich in den letzten Jahren vielen Studien mit diesen Methoden beschäftigt haben, gibt es immer noch viele offene Fragestellungen in diesem Forschungsgebiet. Darüber hinaus ergeben sich in vielen Anwendungsbereichen durch die immer weiter steigende Anzahl an verfügbaren und verwendeten Attributen und Features laufend neue Probleme. Das Ziel dieser Dissertation ist es, verschiedene Fragenstellungen in diesem Bereich genau zu analysieren und Verbesserungsmöglichkeiten zu entwickeln. Grundsätzlich, werden folgende Ansprüche an Methoden zur Feature Selection und Dimensionality Reduction gestellt: Die Methoden sollten effizient (bezüglich ihres Rechenaufwandes) sein und die resultierenden Feature-Sets sollten die Originaldaten möglichst kompakt repräsentieren können. Darüber hinaus ist es in vielen Anwendungsgebieten wichtig, die Interpretierbarkeit der Originaldaten beizubehalten. Letztendlich sollte der Prozess der Dimensionsreduzierung keinen negativen Effekt auf die Klassifikationsgenauigkeit haben - sondern idealerweise, diese noch verbessern. Offene Problemstellungen in diesem Bereich betreffen unter anderem den Zusammenhang zwischen Methoden zur Dimensionsreduzierung und der resultierenden Klassifikationsgenauigkeit, wobei sowohl eine möglichst kompakte Repräsentation der Daten, als auch eine hohe Klassifikationsgenauigkeit erzielt werden sollen. Wie bereits erwähnt, ergibt sich durch die große Anzahl an Daten auch ein erhöhter Rechenaufwand, weshalb schnelle und effektive Methoden zur Dimensionsreduzierung entwickelt werden müssen, bzw. existierende Methoden verbessert werden müssen. Darüber hinaus sollte natürlich auch der Rechenaufwand der verwendeten Klassifikationsmethoden möglichst gering sein. Des Weiteren ist die Interpretierbarkeit von Feature Sets zwar möglich, wenn Feature Selection Methoden für die Dimensionsreduzierung verwendet werden, im Fall von Dimensionality Reduction sind die resultierenden Feature Sets jedoch meist Linearkombinationen der Originalfeatures. Daher ist es schwierig zu überprüfen, wie viel Information einzelne Originalfeatures beitragen. Im Rahmen dieser Dissertation konnten wichtige Beiträge zu den oben genannten Problemstellungen präsentiert werden: Es wurden neue, effiziente Initialisierungsvarianten für die Dimensionality Reduction Methode Nonnegative Matrix Factorization (NMF) entwickelt, welche im Vergleich zu randomisierter Initialisierung und im Vergleich zu State-of-the-Art Initialisierungsmethoden zu einer schnelleren Reduktion des Approximationsfehlers führen. Diese Initialisierungsvarianten können darüber hinaus mit neu entwickelten und sehr effektiven Klassifikationsalgorithmen basierend auf NMF kombiniert werden. Um die Laufzeit von NMF weiter zu steigern wurden unterschiedliche Varianten von NMF Algorithmen auf Multi-Prozessor Systemen vorgestellt, welche sowohl Task- als auch Datenparallelismus unterstützen und zu einer erheblichen Reduktion der Laufzeit für NMF führen. Außerdem wurde eine effektive Verbesserung der Matlab Implementierung des ALS Algorithmus vorgestellt. Darüber hinaus wurde eine Technik aus dem Bereich des Information Retrieval -- Latent Semantic Indexing -- erfolgreich als Klassifikationsalgorithmus für Email Daten angewendet. Schließlich wurde eine ausführliche empirische Studie über den Zusammenhang verschiedener Feature Reduction Methoden (Feature Selection und Dimensionality Reduction) und der resultierenden Klassifikationsgenauigkeit unterschiedlicher Lernalgorithmen präsentiert. Der starke Einfluss unterschiedlicher Methoden zur Dimensionsreduzierung auf die resultierende Klassifikationsgenauigkeit unterstreicht dass noch weitere Untersuchungen notwendig sind um das komplexe Zusammenspiel von Dimensionsreduzierung und Klassifikation genau analysieren zu können.The sheer volume of data today and its expected growth over the next years are some of the key challenges in data mining and knowledge discovery applications. Besides the huge number of data samples that are collected and processed, the high dimensional nature of data arising in many applications causes the need to develop effective and efficient techniques that are able to deal with this massive amount of data. In addition to the significant increase in the demand of computational resources, those large datasets might also influence the quality of several data mining applications (especially if the number of features is very high compared to the number of samples). As the dimensionality of data increases, many types of data analysis and classification problems become significantly harder. This can lead to problems for both supervised and unsupervised learning. Dimensionality reduction and feature (subset) selection methods are two types of techniques for reducing the attribute space. While in feature selection a subset of the original attributes is extracted, dimensionality reduction in general produces linear combinations of the original attribute set. In both approaches, the goal is to select a low dimensional subset of the attribute space that covers most of the information of the original data. During the last years, feature selection and dimensionality reduction techniques have become a real prerequisite for data mining applications. There are several open questions in this research field, and due to the often increasing number of candidate features for various application areas (e.\,g., email filtering or drug classification/molecular modeling) new questions arise. In this thesis, we focus on some open research questions in this context, such as the relationship between feature reduction techniques and the resulting classification accuracy and the relationship between the variability captured in the linear combinations of dimensionality reduction techniques (e.\,g., PCA, SVD) and the accuracy of machine learning algorithms operating on them. Another important goal is to better understand new techniques for dimensionality reduction, such as nonnegative matrix factorization (NMF), which can be applied for finding parts-based, linear representations of nonnegative data. This ``sum-of-parts'' representation is especially useful if the interpretability of the original data should be retained. Moreover, performance aspects of feature reduction algorithms are investigated. As data grow, implementations of feature selection and dimensionality reduction techniques for high-performance parallel and distributed computing environments become more and more important. In this thesis, we focus on two types of open research questions: methodological advances without any specific application context, and application-driven advances for a specific application context. Summarizing, new methodological contributions are the following: The utilization of nonnegative matrix factorization in the context of classification methods is investigated. In particular, it is of interest how the improved interpretability of NMF factors due to the non-negativity constraints (which is of central importance in various problem settings) can be exploited. Motivated by this problem context two new fast initialization techniques for NMF based on feature selection are introduced. It is shown how approximation accuracy can be increased and/or how computational effort can be reduced compared to standard randomized seeding of the NMF and to state-of-the-art initialization strategies suggested earlier. For example, for a given number of iterations and a required approximation error a speedup of 3.6 compared to standard initialization, and a speedup of 3.4 compared to state-of-the-art initialization strategies could be achieved. Beyond that, novel classification methods based on the NMF are proposed and investigated. We can show that they are not only competitive in terms of classification accuracy with state-of-the-art classifiers, but also provide important advantages in terms of computational effort (especially for low-rank approximations). Moreover, parallelization and distributed execution of NMF is investigated. Several algorithmic variants for efficiently computing NMF on multi-core systems are studied and compared to each other. In particular, several approaches for exploiting task and/or data-parallelism in NMF are studied. We show that for some scenarios new algorithmic variants clearly outperform existing implementations. Last, but not least, a computationally very efficient adaptation of the implementation of the ALS algorithm in Matlab 2009a is investigated. This variant reduces the runtime significantly (in some settings by a factor of 8) and also provides several possibilities to be executed concurrently. In addition to purely methodological questions, we also address questions arising in the adaptation of feature selection and classification methods to two specific application problems: email classification and in silico screening for drug discovery. Different research challenges arise in the contexts of these different application areas, such as the dynamic nature of data for email classification problems, or the imbalance in the number of available samples of different classes for drug discovery problems. Application-driven advances of this thesis comprise the adaptation and application of latent semantic indexing (LSI) to the task of email filtering. Experimental results show that LSI achieves significantly better classification results than the widespread de-facto standard method for this special application context. In the context of drug discovery problems, several groups of well discriminating descriptors could be identified by utilizing the ``sum-of-parts`` representation of NMF. The number of important descriptors could be further increased when applying sparseness constraints on the NMF factors
    corecore