14 research outputs found

    TiSEFE: Time Series Evolving Fuzzy Engine for Network Traffic Classification

    Get PDF
    Monitoring and analyzing network traffic are very crucial in discriminating the malicious attack. As the network traffic is becoming big, heterogeneous, and very fast, traffic analysis could be considered as big data analytic task. Recent research in big data analytic filed has produces several novel large-scale data processing systems. However, there is a need for a comprehensive data processing system to extract valuable insights from network traffic big data and learn the normal and attack network situations. This paper proposes a novel evolving fuzzy system to discriminate anomalies by inspecting the network traffic. After capturing traffic data, the system analyzes it to establish a model of normal network situation. The normal situation is a time series data of an ordered sequence of traffic information variable values at equally spaced time intervals. The performance has been analyzed by carrying out several experiments on real-world traffic dataset and under extreme difficult situation of high-speed networks. The results have proved the appropriateness of time series evolving fuzzy engine for network classification

    LENTA: Longitudinal Exploration for Network Traffic Analysis

    Get PDF
    In this work, we present LENTA (Longitudinal Exploration for Network Traffic Analysis), a system that supports the network analysts to easily identify traffic generated by services and applications running on the web, being them benign or possibly malicious. First, LENTA simplifies analysts' job by letting them observe few hundreds of clusters instead of the original hundred thousands of single URLs. Second, it implements a self-learning methodology, where a semi-supervised approach lets the system grow its knowledge, which is used in turn to automatically associate traffic to previously observed services and identify new traffic generated by possibly suspicious applications. This lets the analysts easily observe changes in the traffic, like the birth of new services, or unexpected activities. We follow a data driven approach, running LENTA on real data. Traffic is analyzed in batches of 24-hour worth of traffic. We show that LENTA allows the analyst to easily understand which services are running on their network, highlights malicious traffic and changes over time, greatly simplifying the view and understanding of the traffic

    S-RASTER: Contraction Clustering for Evolving Data Streams

    Get PDF
    Contraction Clustering (RASTER) is a single-pass algorithm for density-based clustering of 2D data. It can process arbitrary amounts of data in linear time and in constant memory, quickly identifying approximate clusters. It also exhibits good scalability in the presence of multiple CPU cores. RASTER exhibits very competitive performance compared to standard clustering algorithms, but at the cost of decreased precision. Yet, RASTER is limited to batch processing and unable to identify clusters that only exist temporarily. In contrast, S-RASTER is an adaptation of RASTER to the stream processing paradigm that is able to identify clusters in evolving data streams. This algorithm retains the main benefits of its parent algorithm, i.e. single-pass linear time cost and constant memory requirements for each discrete time step within a sliding window. The sliding window is efficiently pruned, and clustering is still performed in linear time. Like RASTER, S-RASTER trades off an often negligible amount of precision for speed. Our evaluation shows that competing algorithms are at least 50% slower. Furthermore, S-RASTER shows good qualitative results, based on standard metrics. It is very well suited to real-world scenarios where clustering does not happen continually but only periodically.Comment: 24 pages, 5 figures, 2 table

    SeLINA: a Self-Learning Insightful Network Analyzer

    Get PDF
    Understanding the behavior of a network from a large scale traffic dataset is a challenging problem. Big data frameworks offer scalable algorithms to extract information from raw data, but often require a sophisticated fine-tuning and a detailed knowledge of machine learning algorithms. To streamline this process, we propose SeLINA (Self-Learning Insightful Network Analyzer), a generic, self-tuning, simple tool to extract knowledge from network traffic measurements. SeLINA includes different data analytics techniques providing self-learning capabilities to state-of-the-art scalable approaches, jointly with parameter auto-selection to off-load the network expert from parameter tuning. We combine both unsupervised and supervised approaches to mine data with a scalable approach. SeLINA embeds mechanisms to check if the new data fits the model, to detect possible changes in the traffic, and to, possibly automatically, trigger model rebuilding. The result is a system that offers human-readable models of the data with minimal user intervention, supporting domain experts in extracting actionable knowledge and highlighting possibly meaningful interpretations. SeLINA's current implementation runs on Apache Spark. We tested it on large collections of realworld passive network measurements from a nationwide ISP, investigating YouTube and P2P traffic. The experimental results confirmed the ability of SeLINA to provide insights and detect changes in the data that suggest further analyse

    SeLINA: a Self-Learning Insightful Network Analyzer

    Get PDF
    Understanding the behavior of a network from a large scale traffic dataset is a challenging problem. Big data frameworks offer scalable algorithms to extract information from raw data, but often require a sophisticated fine-tuning and a detailed knowledge of machine learning algorithms. To streamline this process, we propose SeLINA (Self-Learning Insightful Network Analyzer), a generic, self-tuning, simple tool to extract knowledge from network traffic measurements. SeLINA includes different data analytics techniques providing self-learning capabilities to state-of-the-art scalable approaches, jointly with parameter auto-selection to off-load the network expert from parameter tuning. We combine both unsupervised and supervised approaches to mine data with a scalable approach. SeLINA embeds mechanisms to check if the new data fits the model, to detect possible changes in the traffic, and to, possibly automatically, trigger model rebuilding. The result is a system that offers human-readable models of the data with minimal user intervention, supporting domain experts in extracting actionable knowledge and highlighting possibly meaningful interpretations. SeLINA’s current implementation runs on Apache Spark. We tested it on large collections of realworld passive network measurements from a nationwide ISP, investigating YouTube and P2P traffic. The experimental results confirmed the ability of SeLINA to provide insights and detect changes in the data that suggest further analyses

    Entorno para la gestión de sondas de red de bajo coste

    Full text link
    Este trabajo de fin de grado consiste en la realización de un entorno web para la gestión de sondas Ethernet de bajo coste. Estas sondas, así como el protocolo que utilizan para enviar información, han sido desarrolladas en la Universidad Autónoma de Madrid. La información recogida por las sondas es enviada a un colector diseñado con el fin de clasificar los paquetes en función del tipo de medidas y técnicas empleadas. En este proyecto se diferencian dos tipos de medidas: activas y pasivas. Las medidas activas consisten en inyectar tráfico en la red, mientras que las pasivas colectan tráfico de un segmento de la misma. Para las medidas activas hemos utilizado tres técnicas: descarga de fichero, pares de paquetes y tren de paquetes. En las medidas pasivas colectamos el tráfico a través de dos enfoques: monitorización de flujos y monitorización a través de Multi Router Traffic Grapher (MRTG). Una vez clasificado cada paquete, el colector guarda en una base de datos los parámetros de calidad de servicio. Estos parámetros serán mostrados en forma de tabla en nuestro entorno web. El entorno web ha sido realizado a través de la plataforma Django. En él se muestra una introducción al global del proyecto, así como una breve definición de cada tipo y método de medida utilizados. Para cada uno de estos métodos de medida se presenta una tabla con los parámetros de calidad de servicio sustraídos de la red, dando la posibilidad de generar gráficas para algunos de los campos de estas tablas.This project consists of a web environment for managing low cost Ethernet probes. Both the probes and the protocol used to send information have been developed at Universidad Autónoma de Madrid. The information gathered by these probes is sent to a C program which has been designed to classify packets based on the type of measurement and techniques used. In this project we can distinguish two types of measurements: active and passive measurements. Active measurement techniques are based on the idea of injecting traffic into a network to measure its characteristics. Passive network measurement is based on the idea of collecting traffic data and processing it to estimate network parameters and analyze the measured network performance and behavior. For active measurement the collector works with three measurement techniques: file-transfer, packet-pair and packet train. Regarding passive measurements we collect measurements based on two approaches: flow level data and MRTG level data. Once all packets are classified, the C program stores in a database the parameters of quality of service (QoS). These parameters are shown in tabular way on our web environment. The web environment has been constructed using Django platform. The web environment shows a brief description of measurements, the distributed architectures and software we have used. Also shows a brief definition of each measurement method used. For each of these methods shows a table of QoS with the possibility of generating figures for some values of such tables

    A Survey on Big Data for Network Traffic Monitoring and Analysis

    Get PDF
    Network Traffic Monitoring and Analysis (NTMA) represents a key component for network management, especially to guarantee the correct operation of large-scale networks such as the Internet. As the complexity of Internet services and the volume of traffic continue to increase, it becomes difficult to design scalable NTMA applications. Applications such as traffic classification and policing require real-time and scalable approaches. Anomaly detection and security mechanisms require to quickly identify and react to unpredictable events while processing millions of heterogeneous events. At last, the system has to collect, store, and process massive sets of historical data for post-mortem analysis. Those are precisely the challenges faced by general big data approaches: Volume, Velocity, Variety, and Veracity. This survey brings together NTMA and big data. We catalog previous work on NTMA that adopt big data approaches to understand to what extent the potential of big data is being explored in NTMA. This survey mainly focuses on approaches and technologies to manage the big NTMA data, additionally briefly discussing big data analytics (e.g., machine learning) for the sake of NTMA. Finally, we provide guidelines for future work, discussing lessons learned, and research directions

    Large-scale network traffic monitoring with DBStream, a system for rolling big data analysis

    No full text
    The complexity of the Internet has rapidly increased, making it more important and challenging to design scalable network monitoring tools. Network monitoring typically requires rolling data analysis, i.e., continuously and incrementally updating (rolling-over) various reports and statistics over highvolume data streams. In this paper, we describe DBStream, which is an SQL-based system that explicitly supports incremental queries for rolling data analysis. We also present a performance comparison of DBStream with a parallel data processing engine (Spark), showing that, in some scenarios, a single DBStream node can outperform a cluster of ten Spark nodes on rolling network monitoring workloads. Although our performance evaluation is based on network monitoring data, our results can be generalized to other Big Data problems with high volume and velocit
    corecore