467 research outputs found

    Spam filtering using ML algorithms

    Full text link
    Spam is commonly defined as unsolicited email messages, and the goal of spam categorization is to distinguish between spam and legitimate email messages. Spam used to be considered a mere nuisance, but due to the abundant amounts of spam being sent today, it has progressed from being a nuisance to becoming a major problem. Spam filtering is able to control the problem in a variety of ways. Many researches in spam filtering has been centred on the more sophisticated classifier-related issues. Currently,&nbsp; machine learning for spam classification is an important research issue at present. Support Vector Machines (SVMs) are a new learning method and achieve substantial improvements over the currently preferred methods, and behave robustly whilst tackling a variety of different learning tasks. Due to its high dimensional input, fewer irrelevant features and high accuracy, the&nbsp; SVMs are more important to researchers for categorizing spam. This paper explores and identifies the use of different learning algorithms for classifying spam and legitimate messages from e-mail. A comparative analysis among the filtering techniques has also been presented in this paper.<br /

    E-Mail Spam Filtering Solution For The Western Interstate Commission For Higher Education (Wiche)

    Get PDF
    WICHE staff, consultants and constituents that connect to WICHE\u27s network are provided with internet access and e-mail accounts. In the past, email was not monitored or controlled, other than scanning e-mail attachments for viruses with Norton Antivirus for Exchange Server. This has become a problem for several reasons. Just a few of the reasons are viruses, wasted staff time, inappropriate material stored on the network and inappropriate use of network resources. The WICHE Spam filtering solution includes monitoring and restriction of incoming and outgoing email, email attachment and storage limitations and filtering incoming email for Spam. All internal, incoming and outgoing email is scanned on the Exchange server for inappropriate key words. Limits are placed on email accounts, such as mailbox size, message size, attachment size and maximum number of recipients. All incoming e-mail is scanned for several indicators of Spam. These indicators include originating mail server, message subject, sender address, number of recipients and message content, based on key words. The messages are scanned and stopped at a front end SMTP server before they enter WICHE\u27s internal network

    Evaluation of Email Spam Detection Techniques

    Get PDF
    Email has become a vital form of communication among individuals and organizations in today’s world. However, simultaneously it became a threat to many users in the form of spam emails which are also referred as junk/unsolicited emails. Most of the spam emails received by the users are in the form of commercial advertising, which usually carry computer viruses without any notifications. Today, 95% of the email messages across the world are believed to be spam, therefore it is essential to develop spam detection techniques. There are different techniques to detect and filter the spam emails, but off recently all the developed techniques are being implemented successfully to minimize the threats. This paper describes how the current spam email detection approaches are determining and evaluating the problems. There are different types of techniques developed based on Reputation, Origin, Words, Multimedia, Textual, Community, Rules, Hybrid, Machine learning, Fingerprint, Social networks, Protocols, Traffic analysis, OCR techniques, Low-level features, and many other techniques. All these filtering techniques are developed to detect and evaluate spam emails. Along with classification of the email messages into spam or ham, this paper also demonstrates the effectiveness and accuracy of the spam detection techniques

    Support vector machines for image and electronic mail classification

    Get PDF
    Support Vector Machines (SVMs) have demonstrated accuracy and efficiency in a variety of binary classification applications including indoor/outdoor scene categorization of consumer photographs and distinguishing unsolicited commercial electronic mail from legitimate personal communications. This thesis examines a parallel implementation of the Sequential Minimal Optimization (SMO) method of training SVMs resulting in multiprocessor speedup subject to a decrease in accuracy dependent on the data distribution and number of processors. Subsequently the SVM classification system was applied to the image labeling and e-mail classification problems. A parallel implementation of the image classification system\u27s color histogram, color coherence, and edge histogram feature extractors increased performance when using both noncaching and caching data distribution methods. The electronic mail classification application produced an accuracy of 96.69% with a user-generated dictionary. An implementation of the electronic mail classifier as a Microsoft Outlook add-in provides immediate mail filtering capabilities to the average desktop user. While the parallel implementation of the SVM trainer was not supported for the classification applications, the parallel feature extractor improved image classification performance

    Inferring malicious network events in commercial ISP networks using traffic summarisation

    Get PDF
    With the recent increases in bandwidth available to home users, traffic rates for commercial national networks have also been increasing rapidly. This presents a problem for any network monitoring tool as the traffic rate they are expected to monitor is rising on a monthly basis. Security within these networks is para- mount as they are now an accepted home of trade and commerce. Core networks have been demonstrably and repeatedly open to attack; these events have had significant material costs to high profile targets. Network monitoring is an important part of network security, providing in- formation about potential security breaches and in understanding their impact. Monitoring at high data rates is a significant problem; both in terms of processing the information at line rates, and in terms of presenting the relevant information to the appropriate persons or systems. This thesis suggests that the use of summary statistics, gathered over a num- ber of packets, is a sensible and effective way of coping with high data rates. A methodology for discovering which metrics are appropriate for classifying signi- ficant network events using statistical summaries is presented. It is shown that the statistical measures found with this methodology can be used effectively as a metric for defining periods of significant anomaly, and further classifying these anomalies as legitimate or otherwise. In a laboratory environment, these metrics were used to detect DoS traffic representing as little as 0.1% of the overall network traffic. The metrics discovered were then analysed to demonstrate that they are ap- propriate and rational metrics for the detection of network level anomalies. These metrics were shown to have distinctive characteristics during DoS by the analysis of live network observations taken during DoS events. This work was implemented and operated within a live system, at multiple sites within the core of a commercial ISP network. The statistical summaries are generated at city based points of presence and gathered centrally to allow for spacial and topological correlation of security events. The architecture chosen was shown to be exible in its application. The system was used to detect the level of VoIP traffic present on the network through the implementation of packet size distribution analysis in a multi-gigabit environment. It was also used to detect unsolicited SMTP generators injecting messages into the core. ii Monitoring in a commercial network environment is subject to data protec- tion legislation. Accordingly the system presented processed only network and transport layer headers, all other data being discarded at the capture interface. The system described in this thesis was operational for a period of 6 months, during which a set of over 140 network anomalies, both malicious and benign were observed over a range of localities. The system design, example anomalies and metric analysis form the majority of this thesis

    On email spam filtering using support vector machine

    Get PDF
    Electronic mail is a major revolution taking place over traditional communication systems due to its convenient, economical, fast, and easy to use nature. A major bottleneck in electronic communications is the enormous dissemination of unwanted, harmful emails known as "spam emails". A major concern is the developing of suitable filters that can adequately capture those emails and achieve high performance rate. Machine learning (ML) researchers have developed many approaches in order to tackle this problem. Within the context of machine learning, support vector machines (SVM) have made a large contribution to the development of spam email filtering. Based on SVM, different schemes have been proposed through text classification approaches (TC). A crucial problem when using SVM is the choice of kernels as they directly affect the separation of emails in the feature space. We investigate the use of several distance-based kernels to specify spam filtering behaviors using SVM. However, most of used kernels concern continuous data, and neglect the structure of the text. In contrast to classical blind kernels, we propose the use of various string kernels for spam filtering. We show how effectively string kernels suit spam filtering problem. On the other hand, data preprocessing is a vital part of text classification where the objective is to generate feature vectors usable by SVM kernels. We detail a feature mapping variant in TC that yields improved performance for the standard SVM in filtering task. Furthermore, we propose an online active framework for spam filtering. We present empirical results from an extensive study of online, transductive, and online active methods for classifying spam emails in real time. We show that active online method using string kernels achieves higher precision and recall rates

    Wireless Efficiency Versus Net Neutrality

    Get PDF
    Symposium: Rough Consensus and Running Code: Integrating Engineering Principles into Internet Policy Debates, held at the University of Pennsylvania\u27s Center for Technology Innovation and Competition on May 6-7, 2010. This Article first addresses congestion and congestion control in the Internet. It shows how congestion control has always depended upon altruistic behavior by end users. Equipment failures, malicious acts, or abandonment of altruistic behavior can lead to severe congestion within the Internet. Consumers benefit when network operators are able to control such congestion. One tool for controlling such congestion is giving higher priority to some applications, such as telephone calls, and giving lower priority or blocking other applications, such as file sharing. The Article then turns to wireless networks and shows that in addition to congestion issues, priority routing in wireless can make available capacity that would otherwise go unused. Wireless systems that are aware of the application being carried in each packet can deliver more value to consumers than can dumb networks that treat all packets identically. Handsets are both complements to and substitutes for the network infrastructure of wireless networks and any analysis of handset bundling should consider this complementarity. Next, the Article reviews analogous issues in electrical power and satellite communications and shows how various forms of priority are used to increase the total value delivered to consumers by these systems. Finally, the Article observes that regulations that prohibit priority routing of packets and flows on the Internet will create incentives to operate multiple networks

    Distributed Denial of Service Attacks on Cloud Computing Environment‎

    Get PDF
    This paper aimed to identify the various kinds of distributed denial of service attacks (DDoS) attacks, their destructive capabilities, and most of all, how best these issues could be counter attacked and resolved for the benefit of all stakeholders along the cloud continuum, preferably as permanent solutions. A compilation of the various types of DDoS is done, their strike capabilities and most of all, how best cloud computing environment issues could be addressed and resolved for the benefit of all stakeholders along the cloud continuum. The key challenges against effective DDoS defense mechanism are also explored

    Online Privacy, Vulnerabilities, and Threats: A Manager’s Perspective

    Get PDF
    There are many potential threats that come with conducting business in an online environment. Management must find a way to neutralize or at least reduce these threats if the organization is going to maintain viability. This chapter is designed to give managers an understanding, as well as the vocabulary needed to have a working knowledge of online privacy, vulnerabilities, and threats. The chapter also highlights techniques that are commonly used to impede attacks and protect the privacy of the organization, its customers, and employees. With the advancements in computing technology, any and all conceivable steps should be taken to protect an organization’s data from outside and inside threats

    Defending networked resources against floods of unwelcome requests

    Get PDF
    Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, February 2008.Includes bibliographical references (p. 172-189).The Internet is afflicted by "unwelcome requests'" defined broadly as spurious claims on scarce resources. For example, the CPU and other resources at a server are targets of denial-of-service (DOS) attacks. Another example is spam (i.e., unsolicited bulk email); here, the resource is human attention. Absent any defense, a very small number of attackers can claim a very large fraction of the scarce resources. Traditional responses identify "bad" requests based on content (for example, spam filters analyze email text and embedded URLs). We argue that such approaches are inherently gameable because motivated attackers can make "bad" requests look "good". Instead, defenses should aim to allocate resources proportionally (so if lo% of the requesters are "bad", they should be limited to lo% of the scarce resources). To meet this goal, we present the design, implementation, analysis, and experimental evaluation of two systems. The first, speak-up, defends servers against application-level denial-of-service by encouraging all clients to automatically send more traffic. The "good" clients can thereby compete equally with the "bad" ones. Experiments with an implementation of speak-up indicate that it allocates a server's resources in rough proportion to clients' upload bandwidths, which is the intended result. The second system, DQE, controls spam with per-sender email quotas. Under DQE, senders attach stamps to emails. Receivers communicate with a well-known, untrusted enforcer to verify that stamps are fresh and to cancel stamps to prevent reuse. The enforcer is distributed over multiple hosts and is designed to tolerate arbitrary faults in these hosts, resist various attacks, and handle hundreds of billions of messages daily (two or three million stamp checks per second). Our experimental results suggest that our implementation can meet these goals with only a few thousand PCs.(cont) The enforcer occupies a novel design point: a set of hosts implement a simple storage abstraction but avoid neighbor maintenance, replica maintenance, and mutual trust. One connection between these systems is that DQE needs a DoS defense-and can use speak-up. We reflect on this connection, on why we apply speak-up to DoS and DQE to spam, and, more generally, on what problems call for which solutions.by Michael Walfish.Ph.D
    • …
    corecore