16 research outputs found

    Flow Fair Sampling Based on Multistage Bloom Filters

    Get PDF
    Network traffic distribution is heavy-tailed. Most of network flows are short and carry very few packets, and the number of large flows is small. Traditional random sampling tends to sample more large flows than short ones. However, many applications depend on per-flow traffic other than just large flows. A flow fair sampling based on multistage Bloom filters is proposed. The total measurement interval is divided into n child time intervals. In each child time interval, employ multistage Bloom filters to query the incoming packet’s flow whether exists in flow information table or not, if exists, sample the packet with static sampling rate which is inversely proportional to the estimation flow traffic up to the previous time interval. If it is a new flow’s first packet, create its flow information and insert it into the multistage Bloom filters. The results show that the proposed algorithm is accurate especially for short flows and easy to extend

    Unsupervised host behavior classification from connection patterns

    Get PDF
    International audienceA novel host behavior classification approach is proposed as a preliminary step toward traffic classification and anomaly detection in network communication. Though many attempts described in the literature were devoted to flow or application classifications, these approaches are not always adaptable to operational constraints of traffic monitoring (expected to work even without packet payload, without bidirectionality, on highspeed networks or from flow reports only...). Instead, the classification proposed here relies on the leading idea that traffic is relevantly analyzed in terms of host typical behaviors: typical connection patterns of both legitimate applications (data sharing, downloading,...) and anomalous (eventually aggressive) behaviors are obtained by profiling traffic at the host level using unsupervised statistical classification. Classification at the host level is not reducible to flow or application classification, and neither is the contrary: they are different operations which might have complementary roles in network management. The proposed host classification is based on a nine-dimensional feature space evaluating host Internet connectivity, dispersion and exchanged traffic content. A Minimum Spanning Tree (MST) clustering technique is developed that does not require any supervised learning step to produce a set of statistically established typical host behaviors. Not relying on a priori defined classes of known behaviors enables the procedure to discover new host behaviors, that potentially were never observed before. This procedure is applied to traffic collected over the entire year 2008 on a transpacific (Japan/USA) link. A cross-validation of this unsupervised classification against a classical port-based inspection and a state-of-the-art method provides assessment of the meaningfulness and the relevance of the obtained classes for host behaviors

    Sketch Guided Sampling - Using On-Line Estimates of Flow Size for Adaptive Data Collection

    No full text

    Sketch Guided Sampling — Using On-Line Estimates of Flow Size for Adaptive Data Collection

    No full text
    Abstract — Monitoring the traffic in high-speed networks is a data intensive problem. Uniform packet sampling is the most popular technique for reducing the amount of data the network monitoring hardware/software has to process. However, uniform sampling captures far less information than can be potentially obtained with the same overall sampling rate. This is because uniform sampling (unnecessarily) draws the vast majority of samples from large flows, and very few from small and medium flows. This information loss on small and medium flows significantly affects the accuracy of the estimation of various network statistics. In this work, we develop a new packet sampling methodology called “sketch-guided sampling ” (SGS), which offers better statistics than obtainable from uniform sampling, given the same number of raw samples gathered. Its main idea is to make the probability with which an incoming packet is sampled a decreasing sampling function ¢ of the size of the flow the packet belongs to. This way our scheme is able to significantly increase the packet sampling rate of the small and medium flows at slight expense of the large flows, resulting in much more accurate estimations of various network statistics. However, the exact sizes of all flows are available only if we keep per-flow information for every flow, which is prohibitively expensive for highspeed links. Our SGS scheme solves this problem by using a small (lossy) synopsis data structure called counting sketch to encode the approximate sizes of all flows. Our evaluation on real-world Internet traffic traces shows that our sampling theory based the approximate flow size estimates from the counting sketch works almost as well as if we know the exact sizes of the flows. I
    corecore