10 research outputs found

    An autonomic traffic analysis proposal using Machine Learning techniques

    Get PDF
    International audienceNetwork analysis has recently become in one of the most challenging tasks to handle due to the rapid growth of communication technologies. For network management, accurate identification and classification of network traffic is a key task. For example, identifying traffic from different applications is critical to manage bandwidth resources and to ensure Quality of Service objectives. Machine learning emerges as a suitable tool for traffic classification; however, it requires several steps that must be followed adequately in order to achieve the goals. In this paper, we proposed an architecture to perform traffic analysis based on Machine Learning techniques and autonomic computing. We analyze the procedures to perform Machine Learning over traffic network classification, and at the same time we give guidelines to introduce all these procedures into the architecture proposed. The main contribution of our proposal is the reconfiguration of the traffic classifier that will change according to the knowledge adquired from the traffic analysis process

    Applying machine learning to categorize distinct categories of network traffic

    Get PDF
    The recent rapid growth of the field of data science has made available to all fields opportunities to leverage machine learning. Computer network traffic classification has traditionally been performed using static, pre-written rules that are easily made ineffective if changes, legitimate or not, are made to the applications or protocols underlying a particular category of network traffic. This paper explores the problem of network traffic classification and analyzes the viability of having the process performed using a multitude of classical machine learning techniques against significant statistical similarities between classes of network traffic as opposed to traditional static traffic identifiers. To accomplish this, network data was captured, processed, and evaluated for 10 application labels under the categories of video conferencing, video streaming, video gaming, and web browsing as described later in Table 1. Flow-based statistical features for the dataset were derived from the network captures in accordance with the “Flow Data Feature Creation” section and were analyzed against a nearest centroid, k-nearest neighbors, Gaussian naïve Bayes, support vector machine, decision tree, random forest, and multi-layer perceptron classifier. Tools and techniques broadly available to organizations and enthusiasts were used. Observations were made on working with network data in a machine learning context, strengths and weaknesses of different models on such data, and the overall efficacy of the tested models. Ultimately, it was found that simple models freely available to anyone can achieve high accuracy, recall, and F1 scores in network traffic classification, with the best-performing model, random forest, having 89% accuracy, a macro average F1 score of .77, and a macro average recall of 76%, with the most common feature of successful classification being related to maximum packet sizes in a network flow

    Towards the Deployment of Machine Learning Solutions in Network Traffic Classification: A Systematic Survey

    Get PDF
    International audienceTraffic analysis is a compound of strategies intended to find relationships, patterns, anomalies, and misconfigurations, among others things, in Internet traffic. In particular, traffic classification is a subgroup of strategies in this field that aims at identifying the application's name or type of Internet traffic. Nowadays, traffic classification has become a challenging task due to the rise of new technologies, such as traffic encryption and encapsulation, which decrease the performance of classical traffic classification strategies. Machine Learning gains interest as a new direction in this field, showing signs of future success, such as knowledge extraction from encrypted traffic, and more accurate Quality of Service management. Machine Learning is fast becoming a key tool to build traffic classification solutions in real network traffic scenarios; in this sense, the purpose of this investigation is to explore the elements that allow this technique to work in the traffic classification field. Therefore, a systematic review is introduced based on the steps to achieve traffic classification by using Machine Learning techniques. The main aim is to understand and to identify the procedures followed by the existing works to achieve their goals. As a result, this survey paper finds a set of trends derived from the analysis performed on this domain; in this manner, the authors expect to outline future directions for Machine Learning based traffic classification

    Application of advanced machine learning techniques to early network traffic classification

    Get PDF
    The fast-paced evolution of the Internet is drawing a complex context which imposes demanding requirements to assure end-to-end Quality of Service. The development of advanced intelligent approaches in networking is envisioning features that include autonomous resource allocation, fast reaction against unexpected network events and so on. Internet Network Traffic Classification constitutes a crucial source of information for Network Management, being decisive in assisting the emerging network control paradigms. Monitoring traffic flowing through network devices support tasks such as: network orchestration, traffic prioritization, network arbitration and cyberthreats detection, amongst others. The traditional traffic classifiers became obsolete owing to the rapid Internet evolution. Port-based classifiers suffer from significant accuracy losses due to port masking, meanwhile Deep Packet Inspection approaches have severe user-privacy limitations. The advent of Machine Learning has propelled the application of advanced algorithms in diverse research areas, and some learning approaches have proved as an interesting alternative to the classic traffic classification approaches. Addressing Network Traffic Classification from a Machine Learning perspective implies numerous challenges demanding research efforts to achieve feasible classifiers. In this dissertation, we endeavor to formulate and solve important research questions in Machine-Learning-based Network Traffic Classification. As a result of numerous experiments, the knowledge provided in this research constitutes an engaging case of study in which network traffic data from two different environments are successfully collected, processed and modeled. Firstly, we approached the Feature Extraction and Selection processes providing our own contributions. A Feature Extractor was designed to create Machine-Learning ready datasets from real traffic data, and a Feature Selection Filter based on fast correlation is proposed and tested in several classification datasets. Then, the original Network Traffic Classification datasets are reduced using our Selection Filter to provide efficient classification models. Many classification models based on CART Decision Trees were analyzed exhibiting excellent outcomes in identifying various Internet applications. The experiments presented in this research comprise a comparison amongst ensemble learning schemes, an exploratory study on Class Imbalance and solutions; and an analysis of IP-header predictors for early traffic classification. This thesis is presented in the form of compendium of JCR-indexed scientific manuscripts and, furthermore, one conference paper is included. In the present work we study a wide number of learning approaches employing the most advance methodology in Machine Learning. As a result, we identify the strengths and weaknesses of these algorithms, providing our own solutions to overcome the observed limitations. Shortly, this thesis proves that Machine Learning offers interesting advanced techniques that open prominent prospects in Internet Network Traffic Classification.Departamento de Teoría de la Señal y Comunicaciones e Ingeniería TelemáticaDoctorado en Tecnologías de la Información y las Telecomunicacione

    USER PROFILING BASED ON NETWORK APPLICATION TRAFFIC MONITORING

    Get PDF
    There is increasing interest in identifying users and behaviour profiling from network traffic metadata for traffic engineering and security monitoring. However, user identification and behaviour profiling in real-time network management remains a challenge, as the activities and underlying interactions of network applications are constantly changing. User behaviour is also changing and adapting in parallel, due to changes in the online interaction environment. A major challenge is how to detect user activity among generic network traffic in terms of identifying the user and his/her changing behaviour over time. Another issue is that relying only on computer network information (Internet Protocol [IP] addresses) directly to identify individuals who generate such traffic is not reliable due to user mobility and IP mobility (resulting from the widespread use of the Dynamic Host Configuration Protocol [DHCP]) within a network. In this context, this project aims to identify and extract a set of features that may be adequate for use in identifying users based on their network application activity and timing resolution to describe user behaviour. The project also provides a procedure for traffic capturing and analysis to extract the required profiling parameters; the procedure includes capturing flow traffic and then performing statistical analysis to extract the required features. This will help network administrators and internet service providers to create user behaviour traffic profiles in order to make informed decisions about policing and traffic management and investigate various network security perspectives. The thesis explores the feasibility of user identification and behaviour profiling in order to be able to identify users independently of their IP address. In order to maintain privacy and overcome the issues associated with encryption (which exists on an increasing volume of network traffic), the proposed approach utilises data derived from generic flow network traffic (NetFlow information). A number of methods and techniques have been proposed in prior research for user identification and behaviour profiling from network traffic information, such as port-based monitoring and profiling, deep packet inspection (DPI) and statistical methods. However, the statistical methods proposed in this thesis are based on extracting relevant features from network traffic metadata, which are utilised by the research community to overcome the limitations that occur with port-based and DPI techniques. This research proposes a set of novel statistical timing features extracted by considering application-level flow sessions identified through Domain Name System (DNS) filtering criteria and timing resolution bins: one-hour time bins (0-23) and quarter- hour time bins (0-95). The novel time bin features are utilised to identify users by representing their 24-hour daily activities by analysing the application-level network traffic based on an automated technique. The raw network traffic is analysed based on the development of a features extraction process in terms of representing each user’s daily usage through a combination of timing features, including the flow session, timing and DNS filtering for the top 11 applications. In addition, media access control (MAC) and IP source mapping (in a truth table) is utilised to ensure that profiling is allocated to the correct host, even if the IP addresses change. The feature extraction process developed for this thesis focuses more on the user, rather than machine-to-machine traffic, and the research has sought to use this information to determine whether a behavioural profile could be developed to enable the identification of users. Network traffic was collected and processed using the aforementioned feature extraction process for 23 users for a period of 60 days (8 May-8 July 2018). The traffic was captured from the Centre for Cyber Security, Communications and Network Research (CSCAN) at the University of Plymouth. The results of identifying and profiling users from extracted timing features behaviour show that the system is capable of identifying users with an average true positive identification rate (TPIR) based on hourly time bin features for the whole population of ~86% and ~91% for individual users. Furthermore, the results show that the system has the ability to identify users based on quarter-hour time bin features, with an average TPIR of ~94% for the whole population and ~96% for the individual user.Royal Embassy of Saudi Arabia Cultural Burea

    Automatic network traffic classification

    Full text link
    The thesis addresses a number of critical problems in regard to fully automating the process of network traffic classification and protocol identification. Several effective solutions based on statistical analysis and machine learning techniques are proposed, which significantly reduce the requirements for human interventions in network traffic classification systems

    Optimizing Flow Routing Using Network Performance Analysis

    Get PDF
    Relevant conferences were attended at which work was often presented and several papers were published in the course of this project. • Muna Al-Saadi, Bogdan V Ghita, Stavros Shiaeles, Panagiotis Sarigiannidis. A novel approach for performance-based clustering and management of network traffic flows, IWCMC, ©2019 IEEE. • M. Al-Saadi, A. Khan, V. Kelefouras, D. J. Walker, and B. Al-Saadi: Unsupervised Machine Learning-Based Elephant and Mice Flow Identification, Computing Conference 2021. • M. Al-Saadi, A. Khan, V. Kelefouras, D. J. Walker, and B. Al-Saadi: SDN-Based Routing Framework for Elephant and Mice Flows Using Unsupervised Machine Learning, Network, 3(1), pp.218-238, 2023.The main task of a network is to hold and transfer data between its nodes. To achieve this task, the network needs to find the optimal route for data to travel by employing a particular routing system. This system has a specific job that examines each possible path for data and chooses the suitable one and transmit the data packets where it needs to go as fast as possible. In addition, it contributes to enhance the performance of network as optimal routing algorithm helps to run network efficiently. The clear performance advantage that provides by routing procedures is the faster data access. For example, the routing algorithm take a decision that determine the best route based on the location where the data is stored and the destination device that is asking for it. On the other hand, a network can handle many types of traffic simultaneously, but it cannot exceed the bandwidth allowed as the maximum data rate that the network can transmit. However, the overloading problem are real and still exist. To avoid this problem, the network chooses the route based on the available bandwidth space. One serious problem in the network is network link congestion and disparate load caused by elephant flows. Through forwarding elephant flows, network links will be congested with data packets causing transmission collision, congestion network, and delay in transmission. Consequently, there is not enough bandwidth for mice flows, which causes the problem of transmission delay. Traffic engineering (TE) is a network application that concerns with measuring and managing network traffic and designing feasible routing mechanisms to guide the traffic of the network for improving the utilization of network resources. The main function of traffic engineering is finding an obvious route to achieve the bandwidth requirements of the network consequently optimizing the network performance [1]. Routing optimization has a key role in traffic engineering by finding efficient routes to achieve the desired performance of the network [2]. Furthermore, routing optimization can be considered as one of the primary goals in the field of networks. In particular, this goal is directly related to traffic engineering, as it is based on one particular idea: to achieve that traffic is routed according to accurate traffic requirements [3]. Therefore, we can say that traffic engineering is one of the applications of multiple improvements to routing; routing can also be optimized based on other factors (not just on traffic requirements). In addition, these traffic requirements are variable depending on analyzed dataset that considered if it is data or traffic control. In this regard, the logical central view of the Software Defined Network (SDN) controller facilitates many aspects compared to traditional routing. The main challenge in all network types is performance optimization, but the situation is different in SDN because the technique is changed from distributed approach to a centralized one. The characteristics of SDN such as centralized control and programmability make the possibility of performing not only routing in traditional distributed manner but also routing in centralized manner. The first advantage of centralized routing using SDN is the existence of a path to exchange information between the controller and infrastructure devices. Consequently, the controller has the information for the entire network, flexible routing can be achieved. The second advantage is related to dynamical control of routing due to the capability of each device to change its configuration based on the controller commands [4]. This thesis begins with a wide review of the importance of network performance analysis and its role for understanding network behavior, and how it contributes to improve the performance of the network. Furthermore, it clarifies the existing solutions of network performance optimization using machine learning (ML) techniques in traditional networks and SDN environment. In addition, it highlights recent and ongoing studies of the problem of unfair use of network resources by a particular flow (elephant flow) and the possible solutions to solve this problem. Existing solutions are predominantly, flow routing-based and do not consider the relationship between network performance analysis and flow characterization and how to take advantage of it to optimize flow routing by finding the convenient path for each type of flow. Therefore, attention is given to find a method that may describe the flow based on network performance analysis and how to utilize this method for managing network performance efficiently and find the possible integration for the traffic controlling in SDN. To this purpose, characteristics of network flows is identified as a mechanism which may give insight into the diversity in flow features based on performance metrics and provide the possibility of traffic engineering enhancement using SDN environment. Two different feature sets with respect to network performance metrics are employed to characterize network traffic. Applying unsupervised machine learning techniques including Principal Component Analysis (PCA) and k-means cluster analysis to derive a traffic performance-based clustering model. Afterward, thresholding-based flow identification paradigm has been built using pre-defined parameters and thresholds. Finally, the resulting data clusters are integrated within a unified SDN architectural solution, which improves network management by finding the best flow routing based on the type of flow, to be evaluated against a number of traffic data sources and different performance experiments. The validation process of the novel framework performance has been done by making a performance comparison between SDN-Ryu controller and the proposed SDN-external application based on three factors: throughput, bandwidth,and data transfer rate by conducting two experiments. Furthermore, the proposed method has been validated by using different Data Centre Network (DCN) topologies to demonstrate the effectiveness of the network traffic management solution. The overall validation metrics shows real gains, the results show that 70% of the time, it has high performance with different flows. The proposed routing SDN traffic-engineering paradigm for a particular flow therefore, dynamically provisions network resources among different flow types

    Profiling and Identification of Web Applications in Computer Network

    Get PDF
    Characterising network traffic is a critical step for detecting network intrusion or misuse. The traditional way to identify the application associated with a set of traffic flows uses port number and DPI (Deep Packet Inspection), but it is affected by the use of dynamic ports and encryption. The research community proposed models for traffic classification that determined the most important requirements and recommendations for a successful approach. The suggested alternatives could be categorised into four techniques: port-based, packet payload based, host behavioural, and statistical-based. The traditional way to identifying traffic flows typically focuses on using IANA assigned port numbers and deep packet inspection (DPI). However, an increasing number of Internet applications nowadays that frequently use dynamic post assignments and encryption data traffic render these techniques in achieving real-time traffic identification. In recent years, two other techniques have been introduced, focusing on host behaviour and statistical methods, to avoid these limitations. The former technique is based on the idea that hosts generate different communication patterns at the transport layer; by extracting these behavioural patterns, activities and applications can be classified. However, it cannot correctly identify the application names, classifying both Yahoo and Gmail as email. Thereby, studies have focused on using statistical features approach for identifying traffic associated with applications based on machine learning algorithms. This method relies on characteristics of IP flows, minimising the overhead limitations associated with other schemes. Classification accuracy of statistical flow-based approaches, however, depends on the discrimination ability of the traffic features used. NetFlow represents the de-facto standard in monitoring and analysing network traffic, but the information it provides is not enough to describe the application behaviour. The primary challenge is to describe the activity within entirely and among network flows to understand application usage and user behaviour. This thesis proposes novel features to describe precisely a web application behaviour in order to segregate various user activities. Extracting the most discriminative features, which characterise web applications, is a key to gain higher accuracy without being biased by either users or network circumstances. This work investigates novel and superior features that characterize a behaviour of an application based on timing of arrival packets and flows. As part of describing the application behaviour, the research considered the on/off data transfer, defining characteristics for many typical applications, and the amount of data transferred or exchanged. Furthermore, the research considered timing and patterns for user events as part of a network application session. Using an extended set of traffic features output from traffic captures, a supervised machine learning classifier was developed. To this effect, the present work customised the popular tcptrace utility to generate classification features based on traffic burstiness and periods of inactivity for everyday Internet usage. A C5.0 decision tree classifier is applied using the proposed features for eleven different Internet applications, generated by ten users. Overall, the newly proposed features reported a significant level of accuracy (~98%) in classifying the respective applications. Afterwards, uncontrolled data collected from a real environment for a group of 20 users while accessing different applications was used to evaluate the proposed features. The evaluation tests indicated that the method has an accuracy of 87% in identifying the correct network application.Iraqi cultural Attach
    corecore