1,247 research outputs found

    Detection and localization of change-points in high-dimensional network traffic data

    Full text link
    We propose a novel and efficient method, that we shall call TopRank in the following paper, for detecting change-points in high-dimensional data. This issue is of growing concern to the network security community since network anomalies such as Denial of Service (DoS) attacks lead to changes in Internet traffic. Our method consists of a data reduction stage based on record filtering, followed by a nonparametric change-point detection test based on UU-statistics. Using this approach, we can address massive data streams and perform anomaly detection and localization on the fly. We show how it applies to some real Internet traffic provided by France-T\'el\'ecom (a French Internet service provider) in the framework of the ANR-RNRT OSCAR project. This approach is very attractive since it benefits from a low computational load and is able to detect and localize several types of network anomalies. We also assess the performance of the TopRank algorithm using synthetic data and compare it with alternative approaches based on random aggregation.Comment: Published in at http://dx.doi.org/10.1214/08-AOAS232 the Annals of Applied Statistics (http://www.imstat.org/aoas/) by the Institute of Mathematical Statistics (http://www.imstat.org

    Privacy-Friendly Mobility Analytics using Aggregate Location Data

    Get PDF
    Location data can be extremely useful to study commuting patterns and disruptions, as well as to predict real-time traffic volumes. At the same time, however, the fine-grained collection of user locations raises serious privacy concerns, as this can reveal sensitive information about the users, such as, life style, political and religious inclinations, or even identities. In this paper, we study the feasibility of crowd-sourced mobility analytics over aggregate location information: users periodically report their location, using a privacy-preserving aggregation protocol, so that the server can only recover aggregates -- i.e., how many, but not which, users are in a region at a given time. We experiment with real-world mobility datasets obtained from the Transport For London authority and the San Francisco Cabs network, and present a novel methodology based on time series modeling that is geared to forecast traffic volumes in regions of interest and to detect mobility anomalies in them. In the presence of anomalies, we also make enhanced traffic volume predictions by feeding our model with additional information from correlated regions. Finally, we present and evaluate a mobile app prototype, called Mobility Data Donors (MDD), in terms of computation, communication, and energy overhead, demonstrating the real-world deployability of our techniques.Comment: Published at ACM SIGSPATIAL 201

    Anomaly Detection in Network Streams Through a Distributional Lens

    Get PDF
    Anomaly detection in computer networks yields valuable information on events relating to the components of a network, their states, the users in a network and their activities. This thesis provides a unified distribution-based methodology for online detection of anomalies in network traffic streams. The methodology is distribution-based in that it regards the traffic stream as a time series of distributions (histograms), and monitors metrics of distributions in the time series. The effectiveness of the methodology is demonstrated in three application scenarios. First, in 802.11 wireless traffic, we show the ability to detect certain classes of attacks using the methodology. Second, in information network update streams (specifically in Wikipedia) we show the ability to detect the activity of bots, flash events, and outages, as they occur. Third, in Voice over IP traffic streams, we show the ability to detect covert channels that exfiltrate confidential information out of the network. Our experiments show the high detection rate of the methodology when compared to other existing methods, while maintaining a low rate of false positives. Furthermore, we provide algorithmic results that enable efficient and scalable implementation of the above methodology, to accomodate the massive data rates observed in modern infomation streams on the Internet. Through these applications, we present an extensive study of several aspects of the methodology. We analyze the behavior of metrics we consider, providing justification of our choice of those metrics, and how they can be used to diagnose anomalies. We provide insight into the choice of parameters, like window length and threshold, used in anomaly detection

    Symmetry degree measurement and its applications to anomaly detection

    Get PDF
    IEEE Anomaly detection is an important technique used to identify patterns of unusual network behavior and keep the network under control. Today, network attacks are increasing in terms of both their number and sophistication. To avoid causing significant traffic patterns and being detected by existing techniques, many new attacks tend to involve gradual adjustment of behaviors, which always generate incomplete sessions due to their running mechanisms. Accordingly, in this work, we employ the behavior symmetry degree to profile the anomalies and further identify unusual behaviors. We first proposed a symmetry degree to identify the incomplete sessions generated by unusual behaviors; we then employ a sketch to calculate the symmetry degree of internal hosts to improve the identification efficiency for online applications. To reduce the memory cost and probability of collision, we divide the IP addresses into four segments that can be used as keys of the hash functions in the sketch. Moreover, to further improve detection accuracy, a threshold selection method is proposed for dynamic traffic pattern analysis. The hash functions in the sketch are then designed using Chinese remainder theory, which can analytically trace the IP addresses associated with the anomalies. We tested the proposed techniques based on traffic data collected from the northwest center of CERNET (China Education and Research Network); the results show that the proposed methods can effectively detect anomalies in large-scale networks

    Detecting Anomalies in VoIP traffic usign Principal Components Analysis

    Get PDF
    The idea of using a method based on Principal Components Analysis to detect anomalies in network's traffic was first introduced by A. Lakina, M. Crovella and C. Diot in an article published in 2004 called “Diagnosing Network­Wide Traffic Anomalies” [1]. They proposed a general method to diagnose traffic anomalies, using PCA to effectively separate the high­dimensional space occupied by a set of network traffic measurements into disjoint subspaces corresponding to normal and anomalous network conditions. This algorithm was tested in subsequent works, taking into consideration different characteristics of IP traffic over a network (such as byte counts, packet counts, IP­flow counts, etc...) [2]. The proposal of using entropy as a summarization tool inside the algorithm led to significant advances in terms or possibility of analyzing massive data sources [3]; but this type of AD method still lacked the possibility of recognizing the users responsible of the anomalies detected. This last step was obtained using random aggregations of the IP flows, by means of sketches [4], leading to better performances in the detection of anomalies and to the possibility of identifying the responsible IP flows. This version of the algorithm has been implemented by C. Callegari and L. Gazzarini, in Universitá di Pisa, in an AD software, described in [5], for analyzing IP traffic traces and detecting anomalies in them. Our work consisted in adapting this software (designed for working with IP traffic traces) for using it with VoIP Call Data Records, in order to test its applicability as an Anomaly Detection system for voice traffic. We then used our modified version of the software to scan a real VoIP traffic trace, obtained by a telephonic operator, in order to analyze the software's performances in a real environment situation. We used two different types of analysis on the same traffic trace, in order to understand software's features and limits, other than its possibility of application in AD problematics. As we discovered that the software's performances are heavily dependent on the input parameters used in the analysis, we concluded with several tests performed using artificially created anomalies, in order to understand the relationships between each input parameter's value and the software's capability of detecting different types of anomalies. The different analysis performed, in the ending, led us to some considerations upon the possibility of applying this PCA's based software as an Anomaly Detector in VoIP environments. At the best of our knowledge this is the first time a technique based on Principal Components Analysis is used to detect anomalous users in VoIP traffic; in more detail our contribution consisted in: • Creating a version of an AD software based on PCA that could be used on VoIP traffic traces • Testing the software's performances on a real traffic trace, obtained by a telephonic operator • From the first tests, analyzing the appropriate parameters' values that permitted us to obtain results that could be useful for detecting anomalous users in a VoIP environment Observing the types of users detected using the software on this trace and classify them, according to their behavior during the whole duration of the trace Analyzing how the parameters' choice impact the type of detections obtained from the analysis and testing which are the best choices for detecting each type of anomalous users Proposing a new kind of application of the software that avoids the biggest limitation of the first type of analysis (that we will see that is the impossibility of detecting more than one anomalous user per time­bin) Testing the software's performances with this new type of analysis, observing also how this different type of applications impacts the results' dependence from the input parameters Comparing the software's ability of detecting anomalous users with another type of AD software that works on the same type of trace (VoIP SEAL) Modifying the trace in order to obtain, from the real trace, a version cleaned from all the detectable anomalies, in order to add in that trace artificial anomalies Testing the software's performances in detecting different type of artificial anomalies Analyzing in more detail the software's sensibility from the input parameters, when used for detecting artificially created anomalies Comparing results and observations obtained from these different types of analysis to derive a global analysis of the characteristics of an Anomaly Detector based on Principal Components Analysis, its values and its lacks when applying it on a VoIP trace The structure of our work is the following: 1. We will start analyzing the PCA theory, describing the structure of the algorithm used in our software, his features and the type of data it needs to be used as an Anomaly Detection system for VoIP traffic. 2. Then, after shortly describing the type of trace we used to test our software, we will introduce the first type of analysis performed, the single round analysis, pointing out the results obtained and their dependence from the parameters' values. 3. In the following section we will focus on a different type of analysis, the multiple round analysis, that we introduced to test the software's performances, removing its biggest limitation (the impossibility of detecting more than one user per time­bin); we will describe the results obtained, comparing them with the ones obtained with the single round analysis, check their dependence from the parameters and compare the performances with the ones obtained using another type of AD software (VoIP SEAL) on the same trace. 4. We will then consider the results and observations obtained testing our software using artificial anomalies added on a “cleaned” version of our original trace (in which we removed all the anomalous users detectable with our software), comparing the software's performances in detecting different types of anomalies and analyzing in detail their dependence from the parameters' values. 5. At last we will describe our conclusions, derived using all the observations obtained with different types of analysis, about the applicability of a software based on PCA as an Anomaly Detector in a VoIP environment
    • …
    corecore