28 research outputs found

    USER PROFILING BASED ON NETWORK APPLICATION TRAFFIC MONITORING

    Get PDF
    There is increasing interest in identifying users and behaviour profiling from network traffic metadata for traffic engineering and security monitoring. However, user identification and behaviour profiling in real-time network management remains a challenge, as the activities and underlying interactions of network applications are constantly changing. User behaviour is also changing and adapting in parallel, due to changes in the online interaction environment. A major challenge is how to detect user activity among generic network traffic in terms of identifying the user and his/her changing behaviour over time. Another issue is that relying only on computer network information (Internet Protocol [IP] addresses) directly to identify individuals who generate such traffic is not reliable due to user mobility and IP mobility (resulting from the widespread use of the Dynamic Host Configuration Protocol [DHCP]) within a network. In this context, this project aims to identify and extract a set of features that may be adequate for use in identifying users based on their network application activity and timing resolution to describe user behaviour. The project also provides a procedure for traffic capturing and analysis to extract the required profiling parameters; the procedure includes capturing flow traffic and then performing statistical analysis to extract the required features. This will help network administrators and internet service providers to create user behaviour traffic profiles in order to make informed decisions about policing and traffic management and investigate various network security perspectives. The thesis explores the feasibility of user identification and behaviour profiling in order to be able to identify users independently of their IP address. In order to maintain privacy and overcome the issues associated with encryption (which exists on an increasing volume of network traffic), the proposed approach utilises data derived from generic flow network traffic (NetFlow information). A number of methods and techniques have been proposed in prior research for user identification and behaviour profiling from network traffic information, such as port-based monitoring and profiling, deep packet inspection (DPI) and statistical methods. However, the statistical methods proposed in this thesis are based on extracting relevant features from network traffic metadata, which are utilised by the research community to overcome the limitations that occur with port-based and DPI techniques. This research proposes a set of novel statistical timing features extracted by considering application-level flow sessions identified through Domain Name System (DNS) filtering criteria and timing resolution bins: one-hour time bins (0-23) and quarter- hour time bins (0-95). The novel time bin features are utilised to identify users by representing their 24-hour daily activities by analysing the application-level network traffic based on an automated technique. The raw network traffic is analysed based on the development of a features extraction process in terms of representing each user’s daily usage through a combination of timing features, including the flow session, timing and DNS filtering for the top 11 applications. In addition, media access control (MAC) and IP source mapping (in a truth table) is utilised to ensure that profiling is allocated to the correct host, even if the IP addresses change. The feature extraction process developed for this thesis focuses more on the user, rather than machine-to-machine traffic, and the research has sought to use this information to determine whether a behavioural profile could be developed to enable the identification of users. Network traffic was collected and processed using the aforementioned feature extraction process for 23 users for a period of 60 days (8 May-8 July 2018). The traffic was captured from the Centre for Cyber Security, Communications and Network Research (CSCAN) at the University of Plymouth. The results of identifying and profiling users from extracted timing features behaviour show that the system is capable of identifying users with an average true positive identification rate (TPIR) based on hourly time bin features for the whole population of ~86% and ~91% for individual users. Furthermore, the results show that the system has the ability to identify users based on quarter-hour time bin features, with an average TPIR of ~94% for the whole population and ~96% for the individual user.Royal Embassy of Saudi Arabia Cultural Burea

    The Dynamics of Internet Traffic: Self-Similarity, Self-Organization, and Complex Phenomena

    Full text link
    The Internet is the most complex system ever created in human history. Therefore, its dynamics and traffic unsurprisingly take on a rich variety of complex dynamics, self-organization, and other phenomena that have been researched for years. This paper is a review of the complex dynamics of Internet traffic. Departing from normal treatises, we will take a view from both the network engineering and physics perspectives showing the strengths and weaknesses as well as insights of both. In addition, many less covered phenomena such as traffic oscillations, large-scale effects of worm traffic, and comparisons of the Internet and biological models will be covered.Comment: 63 pages, 7 figures, 7 tables, submitted to Advances in Complex System

    Computation of traffic time series for large populations of IoT devices

    Get PDF
    En este artículo se estudian las tecnicas para clasificar paquetes de tráfico de red en múltiples clases orientadas a la realización de series temporales de tráfico en escenarios de un elevado numero de clases como pueden ser los proveedores de red para dispositivos IoT. Se muestra que usando técnicas basadas en DStries se pueden monitorizar en tiempo real redes con decenas de miles de dispositivos.In this work we study multi class packet classification algorithms to be used in network traffic time series extraction. This study is done for scenarios with a large number of time series to extract such as in monitoring IoT network providers. We show that using DStries based techniques, large networks with tens of thousands of devices can be monitored in real time.This work is funded by Spanish MINECO through project PIT (TEC2015-69417-C2-2-R)

    Investigating self-similarity and heavy tailed distributions on a large scale experimental facility

    Get PDF
    After seminal work by Taqqu et al. relating self-similarity to heavy tail distributions, a number of research articles verified that aggregated Internet traffic time series show self-similarity and that Internet attributes, like WEB file sizes and flow lengths, were heavy tailed. However, the validation of the theoretical prediction relating self-similarity and heavy tails remains unsatisfactorily addressed, being investigated either using numerical or network simulations, or from uncontrolled web traffic data. Notably, this prediction has never been conclusively verified on real networks using controlled and stationary scenarii, prescribing specific heavy-tail distributions, and estimating confidence intervals. In the present work, we use the potential and facilities offered by the large-scale, deeply reconfigurable and fully controllable experimental Grid5000 instrument, to investigate the prediction observability on real networks. To this end we organize a large number of controlled traffic circulation sessions on a nation-wide real network involving two hundred independent hosts. We use a FPGA-based measurement system, to collect the corresponding traffic at packet level. We then estimate both the self-similarity exponent of the aggregated time series and the heavy-tail index of flow size distributions, independently. Comparison of these two estimated parameters, enables us to discuss the practical applicability conditions of the theoretical prediction

    Investigating self-similarity and heavy-tailed distributions on a large scale experimental facility

    Get PDF
    International audienceAfter the seminal work by Taqqu et al. relating selfsimilarity to heavy-tailed distributions, a number of research articles verified that aggregated Internet traffic time series show self-similarity and that Internet attributes, like Web file sizes and flow lengths, were heavy-tailed. However, the validation of the theoretical prediction relating self-similarity and heavy tails remains unsatisfactorily addressed, being investigated either using numerical or network simulations, or from uncontrolled Web traffic data. Notably, this prediction has never been conclusively verified on real networks using controlled and stationary scenarii, prescribing specific heavy-tailed distributions, and estimating confidence intervals. With this goal in mind, we use the potential and facilities offered by the large-scale, deeply reconfigurable and fully controllable experimental Grid5000 instrument, to investigate the prediction observability on real networks. To this end we organize a large number of controlled traffic circulation sessions on a nation-wide real network involving two hundred independent hosts. We use a FPGA-based measurement system, to collect the corresponding traffic at packet level. We then estimate both the self-similarity exponent of the aggregated time series and the heavy-tail index of flow size distributions, independently. On the one hand, our results complement and validate with a striking accuracy some conclusions drawn from a series of pioneer studies. On the other hand, they bring in new insights on the controversial role of certain components of real networks

    High precision timing in passive measurements of data networks

    Get PDF
    Understanding, predicting, and improving network behaviour under a wide range of conditions requires accurate models of protocols, network devices, and link properties. Accurate models of the component parts comprising complex networks allows the plausible simulation of networks in other configurations, or under different loads. These models must be constructed on a solid foundation of reliable and accurate data taken from measurements of relevant facets of actual network behaviour. As network link speeds increase, it is argued that traditional network measurement techniques based primarily on software time-stamping and capture of packets will not scale to the required performance levels. Problems examined include the difficulty of gaining access to high speed network media to perform measurements, the insufficient resolution of time-stamping clocks for capturing fine detail in packet arrival times, the lack of synchronisation of clocks to global standards, the high and variable latency between packet arrival and time-stamping, and the occurrence of packet loss within the measurement system. A set of design requirements are developed to address these issues, especially in high-speed network measurement systems. A group at the University of Waikato including myself has developed a series of hardware based passive network measurement systems called ‘Dags’. Dags use re-programmable hardware and embedded processors to provide globally synchronised, low latency, reliable time-stamping of all packet arrivals on high-speed network links with sub-hundred nanosecond resolution. Packet loss within the measurement system is minimised by providing sufficient bandwidth throughout for worst case loads and buffering to allow for contention over shared resources. Any occurrence of packet loss despite these measures is reported, allowing the invalidation of portions of the dataset if necessary. I was responsible for writing both the interactive monitor and network measurement code executed by the Dag’s embedded processor, developing a Linux device driver including the software part of the ‘DUCK’ clock synchronisation system, and other ancillary software. It is shown that the accuracy and reliability of the Dag measurement system allows confidence that rare, unusual or unexpected features found in its measurements are genuine and do not simply reflect artifacts of the measurement equipment. With the use of a global clock reference such as the Global Positioning System, synchronised multi-point passive measurements can be made over large geographical distances. Both of these features are exploited to perform calibration measurements of RIPE NCC’s Test Traffic Measurement System for One-way-Delay over the Internet between New Zealand and the Netherlands. Accurate single point passive measurement is used to determine error distributions in Round Trip Times as measured by NLANR’s AMP project. The high resolution afforded by the Dag measurement system also allows the examination of the forwarding behaviour of individual network devices such as routers and firewalls at fine time-scales. The effects of load, queueing parameters, and pauses in packet forwarding can be measured, along with the impact on the network traffic itself. This facility is demonstrated by instrumenting routing equipment and a firewall which provide Internet connectivity to the University of Auckland, providing passive measurements of forwarding delay through the equipment.both the interactive monitor and network measurement code executed by the Dag’s embedded processor, developing a Linux device driver including the software part of the ‘DUCK’ clock synchronisation system, and other ancillary software. It is shown that the accuracy and reliability of the Dag measurement system allows confidence that rare, unusual or unexpected features found in its measurements are genuine and do not simply reflect artifacts of the measurement equipment. With the use of a global clock reference such as the Global Positioning System, synchronised multi-point passive measurements can be made over large geographical distances. Both of these features are exploited to perform calibration measurements of RIPE NCC’s Test Traffic Measurement System for One-way-Delay over the Internet between New Zealand and the Netherlands. Accurate single point passive measurement is used to determine error distributions in Round Trip Times as measured by NLANR’s AMP project. The high resolution afforded by the Dag measurement system also allows the examination of the forwarding behaviour of individual network devices such as routers and firewalls at fine time-scales. The effects of load, queueing parameters, and pauses in packet forwarding can be measured, along with the impact on the network traffic itself. This facility is demonstrated by instrumenting routing equipment and a firewall which provide Internet connectivity to the University of Auckland, providing passive measurements of forwarding delay through the equipment

    Big Data for Traffic Engineering in Software-Defined Networks

    Get PDF
    Software-defined networking overcomes the limitations of traditional networks by splitting the control plane from the data plane. The logic of the network is moved to a component called the controller that manages devices in the data plane. To implement this architecture, it has become the norm to use the OpenFlow (OF) protocol, which defines several counters maintained by network devices. These counters are the starting point for Traffic Engineering (TE) activities. TE monitors several network parameters, including network bandwidth utilization. A great challenge for TE is to collect and generate statistics about bandwidth utilization for monitoring and traffic analysis activities. This becomes even more challenging if fine-grained monitoring is required. Network management tasks such as network provisioning, capacity planning, load balancing, and anomaly detection can benefit from this fine-grained monitoring. Because the counters are updated for every packet that crosses the switch, they must be retrieved in a streaming fashion. This scenario suggests the use of Big Data streaming techniques to collect and process counter values. Therefore, this paper proposes an approach based on a fine-grained Big Data monitoring method to collect and generate traffic statistics using counter values. This research work can significantly leverage TE. The approach can provide a more detailed view of network resource utilization because it can deliver individual and aggregated statistical analyses of bandwidth consumption. Experimental results show the effectiveness of the proposed method

    Improving the accuracy of spoofed traffic inference in inter-domain traffic

    Get PDF
    Ascertaining that a network will forward spoofed traffic usually requires an active probing vantage point in that network, effectively preventing a comprehensive view of this global Internet vulnerability. We argue that broader visibility into the spoofing problem may lie in the capability to infer lack of Source Address Validation (SAV) compliance from large, heavily aggregated Internet traffic data, such as traffic observable at Internet Exchange Points (IXPs). The key idea is to use IXPs as observatories to detect spoofed packets, by leveraging Autonomous System (AS) topology knowledge extracted from Border Gateway Protocol (BGP) data to infer which source addresses should legitimately appear across parts of the IXP switch fabric. In this thesis, we demonstrate that the existing literature does not capture several fundamental challenges to this approach, including noise in BGP data sources, heuristic AS relationship inference, and idiosyncrasies in IXP interconnec- tivity fabrics. We propose Spoofer-IX, a novel methodology to navigate these challenges, leveraging Customer Cone semantics of AS relationships to guide precise classification of inter-domain traffic as In-cone, Out-of-cone ( spoofed ), Unverifiable, Bogon, and Unas- signed. We apply our methodology on extensive data analysis using real traffic data from two distinct IXPs in Brazil, a mid-size and a large-size infrastructure. In the mid-size IXP with more than 200 members, we find an upper bound volume of Out-of-cone traffic to be more than an order of magnitude less than the previous method inferred on the same data, revealing the practical importance of Customer Cone semantics in such analysis. We also found no significant improvement in deployment of SAV in networks using the mid-size IXP between 2017 and 2019. In hopes that our methods and tools generalize to use by other IXPs who want to avoid use of their infrastructure for launching spoofed-source DoS attacks, we explore the feasibility of scaling the system to larger and more diverse IXP infrastructures. To promote this goal, and broad replicability of our results, we make the source code of Spoofer-IX publicly available. This thesis illustrates the subtleties of scientific assessments of operational Internet infrastructure, and the need for a community focus on reproducing and repeating previous methods.A constatação de que uma rede encaminhará tráfego falsificado geralmente requer um ponto de vantagem ativo de medição nessa rede, impedindo efetivamente uma visão abrangente dessa vulnerabilidade global da Internet. Isto posto, argumentamos que uma visibilidade mais ampla do problema de spoofing pode estar na capacidade de inferir a falta de conformidade com as práticas de Source Address Validation (SAV) a partir de dados de tráfego da Internet altamente agregados, como o tráfego observável nos Internet Exchange Points (IXPs). A ideia chave é usar IXPs como observatórios para detectar pacotes falsificados, aproveitando o conhecimento da topologia de sistemas autônomos extraído dos dados do protocolo BGP para inferir quais endereços de origem devem aparecer legitimamente nas comunicações através da infra-estrutura de um IXP. Nesta tese, demonstramos que a literatura existente não captura diversos desafios fundamentais para essa abordagem, incluindo ruído em fontes de dados BGP, inferência heurística de relacionamento de sistemas autônomos e características específicas de interconectividade nas infraestruturas de IXPs. Propomos o Spoofer-IX, uma nova metodologia para superar esses desafios, utilizando a semântica do Customer Cone de relacionamento de sistemas autônomos para guiar com precisão a classificação de tráfego inter-domínio como In-cone, Out-of-cone ( spoofed ), Unverifiable, Bogon, e Unassigned. Aplicamos nossa metodologia em análises extensivas sobre dados reais de tráfego de dois IXPs distintos no Brasil, uma infraestrutura de médio porte e outra de grande porte. No IXP de tamanho médio, com mais de 200 membros, encontramos um limite superior do volume de tráfego Out-of-cone uma ordem de magnitude menor que o método anterior inferiu sob os mesmos dados, revelando a importância prática da semântica do Customer Cone em tal análise. Além disso, não encontramos melhorias significativas na implantação do Source Address Validation (SAV) em redes usando o IXP de tamanho médio entre 2017 e 2019. Na esperança de que nossos métodos e ferramentas sejam aplicáveis para uso por outros IXPs que desejam evitar o uso de sua infraestrutura para iniciar ataques de negação de serviço através de pacotes de origem falsificada, exploramos a viabilidade de escalar o sistema para infraestruturas IXP maiores e mais diversas. Para promover esse objetivo e a ampla replicabilidade de nossos resultados, disponibilizamos publicamente o código fonte do Spoofer-IX. Esta tese ilustra as sutilezas das avaliações científicas da infraestrutura operacional da Internet e a necessidade de um foco da comunidade na reprodução e repetição de métodos anteriores

    Internet traffic volumes characterization and forecasting

    Get PDF
    Internet usage increases every year and the need to estimate the growth of the generated traffic has become a major topic. Forecasting actual figures in advance is essential for bandwidth allocation, networking design and investment planning. In this thesis novel mathematical equations are presented to model and to predict long-term Internet traffic in terms of total aggregating volume, globally and more locally. Historical traffic data from consecutive years have revealed hidden numerical patterns as the values progress year over year and this trend can be well represented with appropriate mathematical relations. The proposed formulae have excellent fitting properties over long-history measurements and can indicate forthcoming traffic for the next years with an exceptionally low prediction error. In cases where pending traffic data have already become available, the suggested equations provide more successful results than the respective projections that come from worldwide leading research. The studies also imply that future traffic strongly depends on the past activity and on the growth of Internet users, provided that a big and representative sample of pertinent data exists from large geographical areas. To the best of my knowledge this work is the first to introduce effective prediction methods that exclusively rely on the static attributes and the progression properties of historical values
    corecore