19 research outputs found

    Comparing P2PTV Traffic Classifiers

    Get PDF
    Peer-to-Peer IP Television (P2PTV) applications represent one of the fastest growing application classes on the Internet, both in terms of their popularity and in terms of the amount of traffic they generate. While network operators require monitoring tools that can effectively analyze the traffic produced by these systems, few techniques have been tested on these mostly closed-source, proprietary applications. In this paper we examine the properties of three traffic classifiers applied to the problem of identifying P2PTV traffic. We report on extensive experiments conducted on traffic traces with reliable ground truth information, highlighting the benefits and shortcomings of each approach. The results show that not only their performance in terms of accuracy can vary significantly, but also that their usability features suggest different effective aspects that can be integrate

    Reviewing Traffic ClassificationData Traffic Monitoring and Analysis

    Get PDF
    Traffic classification has received increasing attention in the last years. It aims at offering the ability to automatically recognize the application that has generated a given stream of packets from the direct and passive observation of the individual packets, or stream of packets, flowing in the network. This ability is instrumental to a number of activities that are of extreme interest to carriers, Internet service providers and network administrators in general. Indeed, traffic classification is the basic block that is required to enable any traffic management operations, from differentiating traffic pricing and treatment (e.g., policing, shaping, etc.), to security operations (e.g., firewalling, filtering, anomaly detection, etc.). Up to few years ago, almost any Internet application was using well-known transport layer protocol ports that easily allowed its identification. More recently, the number of applications using random or non-standard ports has dramatically increased (e.g. Skype, BitTorrent, VPNs, etc.). Moreover, often network applications are configured to use well-known protocol ports assigned to other applications (e.g. TCP port 80 originally reserved for Web traffic) attempting to disguise their presence. For these reasons, and for the importance of correctly classifying traffic flows, novel approaches based respectively on packet inspection, statistical and machine learning techniques, and behavioral methods have been investigated and are becoming standard practice. In this chapter, we discuss the main trend in the field of traffic classification and we describe some of the main proposals of the research community. We complete this chapter by developing two examples of behavioral classifiers: both use supervised machine learning algorithms for classifications, but each is based on different features to describe the traffic. After presenting them, we compare their performance using a large dataset, showing the benefits and drawback of each approac

    A Review on Features’ Robustness in High Diversity Mobile Traffic Classifications

    Get PDF
    Mobile traffics are becoming more dominant due to growing usage of mobile devices and proliferation of IoT. The influx of mobile traffics introduce some new challenges in traffic classifications; namely the diversity complexity and behavioral dynamism complexity. Existing traffic classifications methods are designed for classifying standard protocols and user applications with more deterministic behaviors in small diversity. Currently, flow statistics, payload signature and heuristic traffic attributes are some of the most effective features used to discriminate traffic classes. In this paper, we investigate the correlations of these features to the less-deterministic user application traffic classes based on corresponding classification accuracy. Then, we evaluate the impact of large-scale classification on feature's robustness based on sign of diminishing accuracy. Our experimental results consolidate the needs for unsupervised feature learning to address the dynamism of mobile application behavioral traits for accurate classification on rapidly growing mobile traffics

    GT: Picking up the Truth from the Ground for Internet Traffic

    Get PDF
    Much of Internet traffic modeling, firewall, and intrusion detection research requires traces where some ground truth regarding application and protocol is associated with each packet or flow. This paper presents the design, development and experimental evaluation of gt, an open source software toolset for associating ground truth information with Internet traffic traces. By probing the monitored host's kernel to obtain information on active Internet sessions, gt gathers ground truth at the application level. Preliminary exper- imental results show that gt's effectiveness comes at little cost in terms of overhead on the hosting machines. Furthermore, when coupled with other packet inspection mechanisms, gt can derive ground truth not only in terms of applications (e.g., e-mail), but also in terms of protocols (e.g., SMTP vs. POP3

    How to Address the Data Quality Issues in Regression Models: A Guided Process for Data Cleaning

    Get PDF
    Today, data availability has gone from scarce to superabundant. Technologies like IoT, trends in social media and the capabilities of smart-phones are producing and digitizing lots of data that was previously unavailable. This massive increase of data creates opportunities to gain new business models, but also demands new techniques and methods of data quality in knowledge discovery, especially when the data comes from different sources (e.g., sensors, social networks, cameras, etc.). The data quality process of the data set proposes conclusions about the information they contain. This is increasingly done with the aid of data cleaning approaches. Therefore, guaranteeing a high data quality is considered as the primary goal of the data scientist. In this paper, we propose a process for data cleaning in regression models (DC-RM). The proposed data cleaning process is evaluated through a real datasets coming from the UCI Repository of Machine Learning Databases. With the aim of assessing the data cleaning process, the dataset that is cleaned by DC-RM was used to train the same regression models proposed by the authors of UCI datasets. The results achieved by the trained models with the dataset produced by DC-RM are better than or equal to that presented by the datasets' authors.This work has been also supported by the Spanish Ministry of Economy, Industry and Competitiveness (Projects TRA2015-63708-R and TRA2016-78886-C3-1-R)

    On Internet Traffic Classification: A Two-Phased Machine Learning Approach

    Get PDF
    Traffic classification utilizing flow measurement enables operators to perform essential network management. Flow accounting methods such as NetFlow are, however, considered inadequate for classification requiring additional packet-level information, host behaviour analysis, and specialized hardware limiting their practical adoption. This paper aims to overcome these challenges by proposing two-phased machine learning classification mechanism with NetFlow as input. The individual flow classes are derived per application through k-means and are further used to train a C5.0 decision tree classifier. As part of validation, the initial unsupervised phase used flow records of fifteen popular Internet applications that were collected and independently subjected to k-means clustering to determine unique flow classes generated per application. The derived flow classes were afterwards used to train and test a supervised C5.0 based decision tree. The resulting classifier reported an average accuracy of 92.37% on approximately 3.4 million test cases increasing to 96.67% with adaptive boosting. The classifier specificity factor which accounted for differentiating content specific from supplementary flows ranged between 98.37% and 99.57%. Furthermore, the computational performance and accuracy of the proposed methodology in comparison with similar machine learning techniques lead us to recommend its extension to other applications in achieving highly granular real-time traffic classification

    Detection of encrypted traffic generated by peer-to-peer live streaming applications using deep packet inspection

    Get PDF
    The number of applications using the peer-to-peer (P2P) networking paradigm and their popularity has substantially grown over the last decade. They evolved from the le-sharing applications to media streaming ones. Nowadays these applications commonly encrypt the communication contents or employ protocol obfuscation techniques. In this dissertation, it was conducted an investigation to identify encrypted traf c ows generated by three of the most popular P2P live streaming applications: TVUPlayer, Livestation and GoalBit. For this work, a test-bed that could simulate a near real scenario was created, and traf c was captured from a great variety of applications. The method proposed resort to Deep Packet Inspection (DPI), so we needed to analyse the payload of the packets in order to nd repeated patterns, that later were used to create a set of SNORT rules that can be used to detect key network packets generated by these applications. The method was evaluated experimentally on the test-bed created for that purpose, being shown that its accuracy is of 97% for GoalBit.A popularidade e o número de aplicações que usam o paradigma de redes par-a-par (P2P) têm crescido substancialmente na última década. Estas aplicações deixaram de serem usadas simplesmente para partilha de ficheiros e são agora usadas também para distribuir conteúdo multimédia. Hoje em dia, estas aplicações têm meios de cifrar o conteúdo da comunicação ou empregar técnicas de ofuscação directamente no protocolo. Nesta dissertação, foi realizada uma investigação para identificar fluxos de tráfego encriptados, que foram gerados por três aplicações populares de distribuição de conteúdo multimédia em redes P2P: TVUPlayer, Livestation e GoalBit. Para este trabalho, foi criada uma plataforma de testes que pretendia simular um cenário quase real, e o tráfego que foi capturado, continha uma grande variedade de aplicações. O método proposto nesta dissertação recorre à técnica de Inspecção Profunda de Pacotes (DPI), e por isso, foi necessário 21nalisar o conteúdo dos pacotes a fim de encontrar padrões que se repetissem, e que iriam mais tarde ser usados para criar um conjunto de regras SNORT para detecção de pacotes chave· na rede, gerados por estas aplicações, afim de se poder correctamente classificar os fluxos de tráfego. Após descobrir que a aplicação Livestation deixou de funcionar com P2P, apenas as duas regras criadas até esse momento foram usadas. Quanto à aplicação TVUPlayer, foram criadas várias regras a partir do tráfego gerado por ela mesma e que tiveram uma boa taxa de precisão. Várias regras foram também criadas para a aplicação GoalBit em que foram usados quatro cenários: com e sem encriptação usando a opção de transmissão tracker, e com e sem encriptação usando a opção de transmissão sem necessidade de tracker (aqui foi usado o protocolo Kademlia). O método foi avaliado experimentalmente na plataforma de testes criada para o efeito, sendo demonstrado que a precisão do conjunto de regras para a aplicação GoallBit é de 97%.Fundação para a Ciência e a Tecnologia (FCT

    Attention-based bidirectional GRU networks for efficient HTTPS traffic classification

    Get PDF
    This is the author accepted manuscript. The final version is available from the publisher via the DOI in this recordDistributed and pervasive web services have become a major platform for sharing information. However, the hypertext transfer protocol secure (HTTPS), which is a crucial web encryption technology for protecting the information security of users, creates a supervisory burden for network management (e.g., quality-of-service guarantees and traffic engineering). Identifying various types of encrypted traffic is crucial for cyber security and network management. In this paper, we propose a novel deep learning model called BGRUA to identify the web services running on HTTPS connections accurately. BGRUA utilizes a bidirectional gated recurrent unit (GRU) and attention mechanism to improve the accuracy of HTTPS traffic classification. The bidirectional GRU is used to extract the forward and backward features of the byte sequences in a session. The attention mechanism is adopted to assign weights to features according to their contributions to classification. Additionally, we investigate the effects of different hyperparameters on the performance of BGRUA and present a set of optimal values that can serve as a basis for future relevant studies. Comparisons to existing methods based on three typical datasets demonstrate that BGRUA outperforms state-of-the-art encrypted traffic classification approaches in terms of accuracy, precision, recall, and F1-score

    Automatic network traffic classification

    Full text link
    The thesis addresses a number of critical problems in regard to fully automating the process of network traffic classification and protocol identification. Several effective solutions based on statistical analysis and machine learning techniques are proposed, which significantly reduce the requirements for human interventions in network traffic classification systems
    corecore