2 research outputs found

    A Survey on Big Data for Network Traffic Monitoring and Analysis

    Get PDF
    Network Traffic Monitoring and Analysis (NTMA) represents a key component for network management, especially to guarantee the correct operation of large-scale networks such as the Internet. As the complexity of Internet services and the volume of traffic continue to increase, it becomes difficult to design scalable NTMA applications. Applications such as traffic classification and policing require real-time and scalable approaches. Anomaly detection and security mechanisms require to quickly identify and react to unpredictable events while processing millions of heterogeneous events. At last, the system has to collect, store, and process massive sets of historical data for post-mortem analysis. Those are precisely the challenges faced by general big data approaches: Volume, Velocity, Variety, and Veracity. This survey brings together NTMA and big data. We catalog previous work on NTMA that adopt big data approaches to understand to what extent the potential of big data is being explored in NTMA. This survey mainly focuses on approaches and technologies to manage the big NTMA data, additionally briefly discussing big data analytics (e.g., machine learning) for the sake of NTMA. Finally, we provide guidelines for future work, discussing lessons learned, and research directions

    Snooping Wikipedia Vandals with MapReduce

    No full text
    In this paper, we present and validate an algorithmable to accurately identify anomalous behaviors on online andcollaborative social networks, based on their interaction withother fellows. We focus on Wikipedia, where accurate groundtruth for the classification of vandals can be reliably gatheredby manual inspection of the page edit history. We develop adistributed crawler and classifier tasks, both implemented inMapReduce, with whom we are able to explore a very largedataset, consisting of over 5 millions articles collaborativelyedited by 14 millions authors, resulting in over 8 billion pairwiseinteractions. We represent Wikipedia as a signed network, wherepositive arcs imply constructive interaction between editors. Wethen isolate a set of high reputation editors (i.e., nodes havingmany positive incoming links) and classify the remaining onesbased on their interactions with high reputation editors. Wedemonstrate our approach not only to be practically relevant(due to the size of our dataset), but also feasible (as it requiresfew MapReduce iteration) and accurate (over 95% true positiverate). At the same time, we are able to classify only about halfof the dataset editors (recall of 50%) for which we outline somesolution under study.</p
    corecore