280 research outputs found

    Big Data Analytics for Flow-based Anomaly Detection in High-Speed Networks

    Get PDF
    The Cisco VNI Complete Forecast Highlights clearly states that the Internet traffic is growing in three different directions, Volume, Velocity, and Variety, bringing computer network into the big data era. At the same time, sophisticated network attacks are growing exponentially. Such growth making the existing signature-based security tools, like firewall and traditional intrusion detection systems, ineffective against new kind of attacks or variations of known attacks. In this dissertation, we propose an unsupervised method for network anomaly detection. This method is able to detect unknown and new malicious activities in high-speed network traffic. Our method uses an innovative detection algorithm able to identify the hosts responsible for anomalous flows by using a new statistical feature related to traffic flow. This feature is defined as the ratio between the number of flows generated by a host and the number of flows it receives. We evaluate our method with real backbone traffic traces from the Measurement and Analysis on the WIDE Internet (MAWI) archive. Furthermore, we compare the results of our method with MAWILab archive, a database that assists researchers to evaluate their traffic anomaly detection methods. The results point out that our method achieves an average positive prediction rate (i.e. Precision) of 90\% outperforming the four MAWILab detection methods in terms of false negative rate. We deploy three cluster configurations to evaluate the horizontal and vertical scalability performance of the proposed architecture and our method shows outstanding performance in terms of response time

    A Survey on Big Data for Network Traffic Monitoring and Analysis

    Get PDF
    Network Traffic Monitoring and Analysis (NTMA) represents a key component for network management, especially to guarantee the correct operation of large-scale networks such as the Internet. As the complexity of Internet services and the volume of traffic continue to increase, it becomes difficult to design scalable NTMA applications. Applications such as traffic classification and policing require real-time and scalable approaches. Anomaly detection and security mechanisms require to quickly identify and react to unpredictable events while processing millions of heterogeneous events. At last, the system has to collect, store, and process massive sets of historical data for post-mortem analysis. Those are precisely the challenges faced by general big data approaches: Volume, Velocity, Variety, and Veracity. This survey brings together NTMA and big data. We catalog previous work on NTMA that adopt big data approaches to understand to what extent the potential of big data is being explored in NTMA. This survey mainly focuses on approaches and technologies to manage the big NTMA data, additionally briefly discussing big data analytics (e.g., machine learning) for the sake of NTMA. Finally, we provide guidelines for future work, discussing lessons learned, and research directions

    Stream-Based IP Flow Analysis

    Get PDF
    As the complexity of Internet services, transmission speed, and data volume increases, current IP flow monitoring and analysis approaches cease to be sufficient, especially within high-speed and large-scale networks. Although IP flows consist only of selected network traffic features, their processing faces high computational demands, analysis delays, and large storage requirements. To address these challenges, we propose to improve the IP flow monitoring workflow by stream-based collection and analysis of IP flows utilizing a distributed data stream processing. This approach requires changing the paradigm of IP flow data monitoring and analysis, which is the main goal of our research. We analyze distributed stream processing systems, for which we design a novel performance benchmark to determine their suitability for stream-based processing of IP flow data. We define a stream-based workflow of IP flow collection and analysis based on the benchmark results, which we also implement as a publicly available and open-source framework Stream4Flow. Furthermore, we propose new analytical methods that leverage the stream-based IP flow data processing approach and extend network monitoring and threat detection capabilities

    A Performance Benchmark of NetFlow Data Analysis on Distributed Stream Processing Systems

    Get PDF
    Modern distributed stream processing systems can be potentially applied to real time network flow processing. However, differences in performance make some systems more suitable than others for being applied in this domain. We propose a novel performance benchmark which is based on a common security analysis algorithms of NetFlow data, to determine the suitability of distributed stream processing system. Three of the most used distributed stream processing systems are benchmarked and results are compared with identified NetFlow data processing challenges and requirements. Benchmark results show that each system reached a sufficient data processing speed using basic deployment scenario with little to no configuration tuning. Our benchmark, unlike any other, enables to determine a performance of processing small structured messages on any stream processing system

    A Big Data and machine learning approach for network monitoring and security

    Get PDF
    In the last decade the performances of 802.11 (Wi-Fi) devices skyrocketed. Today it is possible to realize gigabit wireless links spanning across kilometers at a fraction of the cost of the wired equivalent. In the same period, mesh network evolved from being experimental tools confined into university labs, to systems running in several real world scenarios. Mesh networks can now provide city-wide coverage and can compete on the market of Internet access. Yet, being wireless distributed networks, mesh networks are still hard to maintain and monitor. This paper explains how today we can perform monitoring, anomaly detection and root cause analysis in mesh networks using Big Data techniques. It first describes the architecture of a modern mesh network, it justifies the use of Big Data techniques and provides a design for the storage and analysis of Big Data produced by a large-scale mesh network. While proposing a generic infrastructure, we focus on its application in the security domain
    corecore