6 research outputs found

    On Internet Traffic Classification: A Two-Phased Machine Learning Approach

    Get PDF
    Traffic classification utilizing flow measurement enables operators to perform essential network management. Flow accounting methods such as NetFlow are, however, considered inadequate for classification requiring additional packet-level information, host behaviour analysis, and specialized hardware limiting their practical adoption. This paper aims to overcome these challenges by proposing two-phased machine learning classification mechanism with NetFlow as input. The individual flow classes are derived per application through k-means and are further used to train a C5.0 decision tree classifier. As part of validation, the initial unsupervised phase used flow records of fifteen popular Internet applications that were collected and independently subjected to k-means clustering to determine unique flow classes generated per application. The derived flow classes were afterwards used to train and test a supervised C5.0 based decision tree. The resulting classifier reported an average accuracy of 92.37% on approximately 3.4 million test cases increasing to 96.67% with adaptive boosting. The classifier specificity factor which accounted for differentiating content specific from supplementary flows ranged between 98.37% and 99.57%. Furthermore, the computational performance and accuracy of the proposed methodology in comparison with similar machine learning techniques lead us to recommend its extension to other applications in achieving highly granular real-time traffic classification

    Attention-based bidirectional GRU networks for efficient HTTPS traffic classification

    Get PDF
    This is the author accepted manuscript. The final version is available from the publisher via the DOI in this recordDistributed and pervasive web services have become a major platform for sharing information. However, the hypertext transfer protocol secure (HTTPS), which is a crucial web encryption technology for protecting the information security of users, creates a supervisory burden for network management (e.g., quality-of-service guarantees and traffic engineering). Identifying various types of encrypted traffic is crucial for cyber security and network management. In this paper, we propose a novel deep learning model called BGRUA to identify the web services running on HTTPS connections accurately. BGRUA utilizes a bidirectional gated recurrent unit (GRU) and attention mechanism to improve the accuracy of HTTPS traffic classification. The bidirectional GRU is used to extract the forward and backward features of the byte sequences in a session. The attention mechanism is adopted to assign weights to features according to their contributions to classification. Additionally, we investigate the effects of different hyperparameters on the performance of BGRUA and present a set of optimal values that can serve as a basis for future relevant studies. Comparisons to existing methods based on three typical datasets demonstrate that BGRUA outperforms state-of-the-art encrypted traffic classification approaches in terms of accuracy, precision, recall, and F1-score

    Modeling and Detection of Content and Packet Flow Anomalies at Enterprise Network Gateway

    Get PDF
    This dissertation investigates modeling techniques and computing algorithms for detection of anomalous contents and traffic flows of ingress Internet traffic at an enterprise network gateway. Anomalous contents refer to a large volume of ingress packets whose contents are not wanted by enterprise users, such as unsolicited electronic messages (UNE). UNE are often sent by Botnet farms for network resource exploitation, information stealing, and they incur high costs in bandwidth waste. Many products have been designed to block UNE, but most of them rely on signature database(s) for matching, and they cannot recognize unknown attacks. To address this limitation, in this dissertation I propose a Progressive E-Message Classifier (PEC) to timely classify message patterns that are commonly associated with UNE. On the basis of a scoring and aging engine, a real-time scoreboard keeps track of detected feature instances of the detection features until they are considered either as UNE or normal messages. A mathematical model has been designed to precisely depict system behaviors and then set detection parameters. The PEC performance is widely studied using different parameters based on several experiments. The objective of anomalous traffic flow detection is to detect selfish Transmission Control Protocol, TCP, flows which do not conform to one of the handful of congestion control protocols in adjusting their packet transmission rates in the face of network congestion. Given that none of the operational parameters in congestion control are carried in the transmitted packets, a gateway can only use packet arrival times to recover states of end to end congestion control rules, if any. We develop new techniques to estimate round trip time (RTT) using EWMA Lomb-Scargle periodogram, detect change of congestion windows by the CUSUM algorithm, and then finally predict detected congestion flow states using a prioritized decision chain. A high level finite state machine (FSM) takes the predictions as inputs to determine if a TCP flow follows a particular congestion control protocol. Multiple experiments show promising outcomes of classifying flows of different protocols based on the ratio of the aberrant transition count to normal transition count generated by FSM
    corecore