509 research outputs found

    KISS: Stochastic Packet Inspection Classifier for UDP Traffic

    Get PDF
    This paper proposes KISS, a novel Internet classifica- tion engine. Motivated by the expected raise of UDP traffic, which stems from the momentum of Peer-to-Peer (P2P) streaming appli- cations, we propose a novel classification framework that leverages on statistical characterization of payload. Statistical signatures are derived by the means of a Chi-Square-like test, which extracts the protocol "format," but ignores the protocol "semantic" and "synchronization" rules. The signatures feed a decision process based either on the geometric distance among samples, or on Sup- port Vector Machines. KISS is very accurate, and its signatures are intrinsically robust to packet sampling, reordering, and flow asym- metry, so that it can be used on almost any network. KISS is tested in different scenarios, considering traditional client-server proto- cols, VoIP, and both traditional and new P2P Internet applications. Results are astonishing. The average True Positive percentage is 99.6%, with the worst case equal to 98.1,% while results are al- most perfect when dealing with new P2P streaming applications

    iTeleScope: Intelligent Video Telemetry and Classification in Real-Time using Software Defined Networking

    Full text link
    Video continues to dominate network traffic, yet operators today have poor visibility into the number, duration, and resolutions of the video streams traversing their domain. Current approaches are inaccurate, expensive, or unscalable, as they rely on statistical sampling, middle-box hardware, or packet inspection software. We present {\em iTelescope}, the first intelligent, inexpensive, and scalable SDN-based solution for identifying and classifying video flows in real-time. Our solution is novel in combining dynamic flow rules with telemetry and machine learning, and is built on commodity OpenFlow switches and open-source software. We develop a fully functional system, train it in the lab using multiple machine learning algorithms, and validate its performance to show over 95\% accuracy in identifying and classifying video streams from many providers including Youtube and Netflix. Lastly, we conduct tests to demonstrate its scalability to tens of thousands of concurrent streams, and deploy it live on a campus network serving several hundred real users. Our system gives unprecedented fine-grained real-time visibility of video streaming performance to operators of enterprise and carrier networks at very low cost.Comment: 12 pages, 16 figure

    Application of advanced machine learning techniques to early network traffic classification

    Get PDF
    The fast-paced evolution of the Internet is drawing a complex context which imposes demanding requirements to assure end-to-end Quality of Service. The development of advanced intelligent approaches in networking is envisioning features that include autonomous resource allocation, fast reaction against unexpected network events and so on. Internet Network Traffic Classification constitutes a crucial source of information for Network Management, being decisive in assisting the emerging network control paradigms. Monitoring traffic flowing through network devices support tasks such as: network orchestration, traffic prioritization, network arbitration and cyberthreats detection, amongst others. The traditional traffic classifiers became obsolete owing to the rapid Internet evolution. Port-based classifiers suffer from significant accuracy losses due to port masking, meanwhile Deep Packet Inspection approaches have severe user-privacy limitations. The advent of Machine Learning has propelled the application of advanced algorithms in diverse research areas, and some learning approaches have proved as an interesting alternative to the classic traffic classification approaches. Addressing Network Traffic Classification from a Machine Learning perspective implies numerous challenges demanding research efforts to achieve feasible classifiers. In this dissertation, we endeavor to formulate and solve important research questions in Machine-Learning-based Network Traffic Classification. As a result of numerous experiments, the knowledge provided in this research constitutes an engaging case of study in which network traffic data from two different environments are successfully collected, processed and modeled. Firstly, we approached the Feature Extraction and Selection processes providing our own contributions. A Feature Extractor was designed to create Machine-Learning ready datasets from real traffic data, and a Feature Selection Filter based on fast correlation is proposed and tested in several classification datasets. Then, the original Network Traffic Classification datasets are reduced using our Selection Filter to provide efficient classification models. Many classification models based on CART Decision Trees were analyzed exhibiting excellent outcomes in identifying various Internet applications. The experiments presented in this research comprise a comparison amongst ensemble learning schemes, an exploratory study on Class Imbalance and solutions; and an analysis of IP-header predictors for early traffic classification. This thesis is presented in the form of compendium of JCR-indexed scientific manuscripts and, furthermore, one conference paper is included. In the present work we study a wide number of learning approaches employing the most advance methodology in Machine Learning. As a result, we identify the strengths and weaknesses of these algorithms, providing our own solutions to overcome the observed limitations. Shortly, this thesis proves that Machine Learning offers interesting advanced techniques that open prominent prospects in Internet Network Traffic Classification.Departamento de Teoría de la Señal y Comunicaciones e Ingeniería TelemáticaDoctorado en Tecnologías de la Información y las Telecomunicacione

    Reviewing Traffic ClassificationData Traffic Monitoring and Analysis

    Get PDF
    Traffic classification has received increasing attention in the last years. It aims at offering the ability to automatically recognize the application that has generated a given stream of packets from the direct and passive observation of the individual packets, or stream of packets, flowing in the network. This ability is instrumental to a number of activities that are of extreme interest to carriers, Internet service providers and network administrators in general. Indeed, traffic classification is the basic block that is required to enable any traffic management operations, from differentiating traffic pricing and treatment (e.g., policing, shaping, etc.), to security operations (e.g., firewalling, filtering, anomaly detection, etc.). Up to few years ago, almost any Internet application was using well-known transport layer protocol ports that easily allowed its identification. More recently, the number of applications using random or non-standard ports has dramatically increased (e.g. Skype, BitTorrent, VPNs, etc.). Moreover, often network applications are configured to use well-known protocol ports assigned to other applications (e.g. TCP port 80 originally reserved for Web traffic) attempting to disguise their presence. For these reasons, and for the importance of correctly classifying traffic flows, novel approaches based respectively on packet inspection, statistical and machine learning techniques, and behavioral methods have been investigated and are becoming standard practice. In this chapter, we discuss the main trend in the field of traffic classification and we describe some of the main proposals of the research community. We complete this chapter by developing two examples of behavioral classifiers: both use supervised machine learning algorithms for classifications, but each is based on different features to describe the traffic. After presenting them, we compare their performance using a large dataset, showing the benefits and drawback of each approac

    Machine learning based botnet identification traffic

    Get PDF
    The continued growth of the Internet has resulted in the increasing sophistication of toolkit and methods to conduct computer attacks and intrusions that are easy to use and publicly available to download, such as Zeus botnet toolkit. Botnets are responsible for many cyber-attacks, such as spam, distributed denial-of-service (DDoS), identity theft, and phishing. Most of existence botnet toolkits release updates for new features, development and support. This presents challenges in the detection and prevention of bots. Current botnet detection approaches mostly ineffective as botnets change their Command and Control (C&C) server structures, centralized (e.g., IRC, HTTP), distributed (e.g., P2P), and encryption deterrent. In this paper, based on real world data sets we present our preliminary research on predicting the new bots before they launch their attack. We propose a rich set of features of network traffic using Classification of Network Information Flow Analysis (CONIFA) framework to capture regularities in C&C communication channels and malicious traffic. We present a case study of applying the approach to a popular botnet toolkit, Zeus. The experimental evaluation suggest that it is possible to detect effectively botnets during the botnet C&C communication generated from new updated Zeus botnet toolkit by building the classifier using machine learning from an earlier version and before they launch their attacks using traffic behaviors. Also, show that there is similarity in C&C structures various Botnet toolkit versions and that the network characteristics of botnet C&C traffic is different from legitimate network traffic. Such methods could reduce many different resources needed to identify C&C communication channels and malicious traffic

    Encryption-agnostic classifiers of traffic originators and their application to anomaly detection

    Get PDF
    This paper presents an approach that leverages classical machine learning techniques to identify the tools from the packets sniffed, both for clear-text and encrypted traffic. This research aims to overcome the limitations to security monitoring systems posed by the widespread adoption of encrypted communications. By training three distinct classifiers, this paper shows that it is possible to detect, with excellent accuracy, the category of tools that generated the analyzed traffic (e.g., browsers vs. network stress tools), the actual tools (e.g., Firefox vs. Chrome vs. Edge), and the individual tool versions (e.g., Chrome 48 vs. Chrome 68). The paper provides hints that the classifiers are helpful for early detection of Distributed Denial of Service (DDoS) attacks, duplication of entire websites, and identification of sudden changes in users’ behavior, which might be the consequence of malware infection or data exfiltration

    Machine learning based botnet identification traffic

    Get PDF
    The continued growth of the Internet has resulted in the increasing sophistication of toolkit and methods to conduct computer attacks and intrusions that are easy to use and publicly available to download, such as Zeus botnet toolkit. Botnets are responsible for many cyber-attacks, such as spam, distributed denial-of-service (DDoS), identity theft, and phishing. Most of existence botnet toolkits release updates for new features, development and support. This presents challenges in the detection and prevention of bots. Current botnet detection approaches mostly ineffective as botnets change their Command and Control (C&C) server structures, centralized (e.g., IRC, HTTP), distributed (e.g., P2P), and encryption deterrent. In this paper, based on real world data sets we present our preliminary research on predicting the new bots before they launch their attack. We propose a rich set of features of network traffic using Classification of Network Information Flow Analysis (CONIFA) framework to capture regularities in C&C communication channels and malicious traffic. We present a case study of applying the approach to a popular botnet toolkit, Zeus. The experimental evaluation suggest that it is possible to detect effectively botnets during the botnet C&C communication generated from new updated Zeus botnet toolkit by building the classifier using machine learning from an earlier version and before they launch their attacks using traffic behaviors. Also, show that there is similarity in C&C structures various Botnet toolkit versions and that the network characteristics of botnet C&C traffic is different from legitimate network traffic. Such methods could reduce many different resources needed to identify C&C communication channels and malicious traffic

    APIC: A method for automated pattern identification and classification

    Get PDF
    Machine Learning (ML) is a transformative technology at the forefront of many modern research endeavours. The technology is generating a tremendous amount of attention from researchers and practitioners, providing new approaches to solving complex classification and regression tasks. While concepts such as Deep Learning have existed for many years, the computational power for realising the utility of these algorithms in real-world applications has only recently become available. This dissertation investigated the efficacy of a novel, general method for deploying ML in a variety of complex tasks, where best feature selection, data-set labelling, model definition and training processes were determined automatically. Models were developed in an iterative fashion, evaluated using both training and validation data sets. The proposed method was evaluated using three distinct case studies, describing complex classification tasks often requiring significant input from human experts. The results achieved demonstrate that the proposed method compares with, and often outperforms, less general, comparable methods designed specifically for each task. Feature selection, data-set annotation, model design and training processes were optimised by the method, where less complex, comparatively accurate classifiers with lower dependency on computational power and human expert intervention were produced. In chapter 4, the proposed method demonstrated improved efficacy over comparable systems, automatically identifying and classifying complex application protocols traversing IP networks. In chapter 5, the proposed method was able to discriminate between normal and anomalous traffic, maintaining accuracy in excess of 99%, while reducing false alarms to a mere 0.08%. Finally, in chapter 6, the proposed method discovered more optimal classifiers than those implemented by comparable methods, with classification scores rivalling those achieved by state-of-the-art systems. The findings of this research concluded that developing a fully automated, general method, exhibiting efficacy in a wide variety of complex classification tasks with minimal expert intervention, was possible. The method and various artefacts produced in each case study of this dissertation are thus significant contributions to the field of ML
    corecore