9 research outputs found
Faster and More Accurate Measurement through Additive-Error Counters
Counters are a fundamental building block for networking applications such as
load balancing, traffic engineering, and intrusion detection, which require
estimating flow sizes and identifying heavy hitter flows. Existing works
suggest replacing counters with shorter multiplicative error \emph{estimators}
that improve the accuracy by fitting more of them within a given space.
However, such estimators impose a computational overhead that degrades the
measurement throughput. Instead, we propose \emph{additive} error estimators,
which are simpler, faster, and more accurate when used for network measurement.
Our solution is rigorously analyzed and empirically evaluated against several
other measurement algorithms on real Internet traces. For a given error target,
we improve the speed of the uncompressed solutions by -, and
the space by up to . Compared with existing state-of-the-art
estimators, our solution is - faster while being
considerably more accurate.Comment: To appear in IEEE INFOCOM 202
Robust and Scalable Sampling Algorithms for Network Measurement
Recent growth of the Internet in both scale and complexity has imposed a number of difficult challenges on existing measurement techniques and approaches, which
are essential for both network management and many ongoing research projects. For
any measurement algorithm, achieving both accuracy and scalability is very challenging given hard resource constraints (e.g., bandwidth, delay, physical memory, and
CPU speed). My dissertation research tackles this problem by first proposing a novel
mechanism called residual sampling, which intentionally introduces a predetermined
amount of bias into the measurement process. We show that such biased sampling
can be extremely scalable; moreover, we develop residual estimation algorithms that
can unbiasedly recover the original information from the sampled data. Utilizing
these results, we further develop two versions of the residual sampling mechanism:
a continuous version for characterizing the user lifetime distribution in large-scale
peer-to-peer networks and a discrete version for monitoring flow statistics (including
per-flow counts and the flow size distribution) in high-speed Internet routers. For the
former application in P2P networks, this work presents two methods: ResIDual-based
Estimator (RIDE), which takes single-point snapshots of the system and assumes
systems with stationary arrivals, and Uniform RIDE (U-RIDE), which takes multiple snapshots and adapts to systems with arbitrary (including non-stationary) arrival
processes. For the latter application in traffic monitoring, we introduce Discrete
RIDE (D-RIDE), which allows one to sample each flow with a geometric random variable. Our numerous simulations and experiments with P2P networks and real
Internet traces confirm that these algorithms are able to make accurate estimation
about the monitored metrics and simultaneously meet the requirements of hard resource constraints. These results show that residual sampling indeed provides an ideal
solution to balancing between accuracy and scalability
Adaptive Sampling and Statistical Inference for Anomaly Detection
Given the rising threat of malware and the increasing inadequacy of signature-based solutions, online performance monitoring has emerged as a critical component of the security infrastructure of data centers and networked systems. Most of the systems that require monitoring are usually large-scale, highly dynamic and time-evolving. These facts add to the complexity of both monitoring and the underlying techniques for anomaly detection. Furthermore, one cannot ignore the costs associated with monitoring and detection which can interfere with the normal operation of a system and deplete the supply of resources available for the system. Therefore, securing modern systems calls for efficient monitoring strategies and anomaly detection techniques that can deal with massive data with high efficiency and report unusual events effectively. This dissertation contributes new algorithms and implementation strategies toward a significant improvement in the effectiveness and efficiency of two components of security infrastructure: (1) system monitoring and (2) anomaly detection. For system monitoring purposes, we develop two techniques which reduce the cost associated with information collection: i) a non-sampling technique and ii) a sampling technique. The non-sampling technique is based on compression and employs the best basis algorithm to automatically select the basis for compressing the data according to the structure of the data. The sampling technique improves upon compressive sampling, a recent signal processing technique for acquiring data at low cost. This enhances the technique of compressive sampling by employing it in an adaptive-rate model wherein the sampling rate for compressive sampling is adaptively tuned to the data being sampled. Our simulation results on measurements collected from a data center show that these two data collection techniques achieve small information loss with reduced monitoring cost. The best basis algorithm can select the basis in which the data is most concisely represented, allowing a reduced sample size for monitoring. The adaptive-rate model for compressive sampling allows us to save 70% in sample size, compared with the constant-rate model. For anomaly detection, this dissertation develops three techniques to allow efficient detection of anomalies. In the first technique, we exploit the properties maintained in the samples of compressive sampling and apply state-of-the-art anomaly detection techniques directly to compressed measurements. Simulation results show that the detection rate of abrupt changes using the compressed measurements is greater than 95% when the size of the measurements is only 18%. In our second approach, we characterize performance-related measurements as a stream of covariance matrices, one for each designated window of time, and then propose a new metric to quantify changes in the covariance matrices. The observed changes are then employed to infer anomalies in the system. In our third approach, anomalies in a system are detected using a low-complexity distributed algorithm when only steams of raw measurement vectors, one for each time window, are available and distributed among multiple locations. We apply our techniques on real network traffic data and show that these two techniques furnish existing methods with more details about the anomalous changes.Ph.D., Electrical Engineering -- Drexel University, 201