9 research outputs found

    Faster and More Accurate Measurement through Additive-Error Counters

    Full text link
    Counters are a fundamental building block for networking applications such as load balancing, traffic engineering, and intrusion detection, which require estimating flow sizes and identifying heavy hitter flows. Existing works suggest replacing counters with shorter multiplicative error \emph{estimators} that improve the accuracy by fitting more of them within a given space. However, such estimators impose a computational overhead that degrades the measurement throughput. Instead, we propose \emph{additive} error estimators, which are simpler, faster, and more accurate when used for network measurement. Our solution is rigorously analyzed and empirically evaluated against several other measurement algorithms on real Internet traces. For a given error target, we improve the speed of the uncompressed solutions by 5×5\times-30×30\times, and the space by up to 4×4\times. Compared with existing state-of-the-art estimators, our solution is 9× 9\times-35×35\times faster while being considerably more accurate.Comment: To appear in IEEE INFOCOM 202

    Discount Counting for Fast Flow Statistics on Flow Size and Flow Volume

    Full text link

    Robust and Scalable Sampling Algorithms for Network Measurement

    Get PDF
    Recent growth of the Internet in both scale and complexity has imposed a number of difficult challenges on existing measurement techniques and approaches, which are essential for both network management and many ongoing research projects. For any measurement algorithm, achieving both accuracy and scalability is very challenging given hard resource constraints (e.g., bandwidth, delay, physical memory, and CPU speed). My dissertation research tackles this problem by first proposing a novel mechanism called residual sampling, which intentionally introduces a predetermined amount of bias into the measurement process. We show that such biased sampling can be extremely scalable; moreover, we develop residual estimation algorithms that can unbiasedly recover the original information from the sampled data. Utilizing these results, we further develop two versions of the residual sampling mechanism: a continuous version for characterizing the user lifetime distribution in large-scale peer-to-peer networks and a discrete version for monitoring flow statistics (including per-flow counts and the flow size distribution) in high-speed Internet routers. For the former application in P2P networks, this work presents two methods: ResIDual-based Estimator (RIDE), which takes single-point snapshots of the system and assumes systems with stationary arrivals, and Uniform RIDE (U-RIDE), which takes multiple snapshots and adapts to systems with arbitrary (including non-stationary) arrival processes. For the latter application in traffic monitoring, we introduce Discrete RIDE (D-RIDE), which allows one to sample each flow with a geometric random variable. Our numerous simulations and experiments with P2P networks and real Internet traces confirm that these algorithms are able to make accurate estimation about the monitored metrics and simultaneously meet the requirements of hard resource constraints. These results show that residual sampling indeed provides an ideal solution to balancing between accuracy and scalability

    Adaptive Sampling and Statistical Inference for Anomaly Detection

    Get PDF
    Given the rising threat of malware and the increasing inadequacy of signature-based solutions, online performance monitoring has emerged as a critical component of the security infrastructure of data centers and networked systems. Most of the systems that require monitoring are usually large-scale, highly dynamic and time-evolving. These facts add to the complexity of both monitoring and the underlying techniques for anomaly detection. Furthermore, one cannot ignore the costs associated with monitoring and detection which can interfere with the normal operation of a system and deplete the supply of resources available for the system. Therefore, securing modern systems calls for efficient monitoring strategies and anomaly detection techniques that can deal with massive data with high efficiency and report unusual events effectively. This dissertation contributes new algorithms and implementation strategies toward a significant improvement in the effectiveness and efficiency of two components of security infrastructure: (1) system monitoring and (2) anomaly detection. For system monitoring purposes, we develop two techniques which reduce the cost associated with information collection: i) a non-sampling technique and ii) a sampling technique. The non-sampling technique is based on compression and employs the best basis algorithm to automatically select the basis for compressing the data according to the structure of the data. The sampling technique improves upon compressive sampling, a recent signal processing technique for acquiring data at low cost. This enhances the technique of compressive sampling by employing it in an adaptive-rate model wherein the sampling rate for compressive sampling is adaptively tuned to the data being sampled. Our simulation results on measurements collected from a data center show that these two data collection techniques achieve small information loss with reduced monitoring cost. The best basis algorithm can select the basis in which the data is most concisely represented, allowing a reduced sample size for monitoring. The adaptive-rate model for compressive sampling allows us to save 70% in sample size, compared with the constant-rate model. For anomaly detection, this dissertation develops three techniques to allow efficient detection of anomalies. In the first technique, we exploit the properties maintained in the samples of compressive sampling and apply state-of-the-art anomaly detection techniques directly to compressed measurements. Simulation results show that the detection rate of abrupt changes using the compressed measurements is greater than 95% when the size of the measurements is only 18%. In our second approach, we characterize performance-related measurements as a stream of covariance matrices, one for each designated window of time, and then propose a new metric to quantify changes in the covariance matrices. The observed changes are then employed to infer anomalies in the system. In our third approach, anomalies in a system are detected using a low-complexity distributed algorithm when only steams of raw measurement vectors, one for each time window, are available and distributed among multiple locations. We apply our techniques on real network traffic data and show that these two techniques furnish existing methods with more details about the anomalous changes.Ph.D., Electrical Engineering -- Drexel University, 201
    corecore