2,466 research outputs found
From Map to Dist: the Evolution of a Large-Scale Wlan Monitoring System
The edge of the Internet is increasingly becoming wireless. Therefore, monitoring the wireless edge is important to understanding the security and performance aspects of the Internet experience. We have designed and implemented a large-scale WLAN monitoring system, the Distributed Internet Security Testbed (DIST), at Dartmouth College. It is equipped with distributed arrays of âsniffersâ that cover 210 diverse campus locations and more than 5,000 users. In this paper, we describe our approach, designs and solutions for addressing the technical challenges that have resulted from efficiency, scalability, security, and management perspectives. We also present extensive evaluation results on a production network, and summarize the lessons learned
Large-scale Wireless Local-area Network Measurement and Privacy Analysis
The edge of the Internet is increasingly becoming wireless. Understanding the wireless edge is therefore important for understanding the performance and security aspects of the Internet experience. This need is especially necessary for enterprise-wide wireless local-area networks (WLANs) as organizations increasingly depend on WLANs for mission- critical tasks. To study a live production WLAN, especially a large-scale network, is a difficult undertaking. Two fundamental difficulties involved are (1) building a scalable network measurement infrastructure to collect traces from a large-scale production WLAN, and (2) preserving user privacy while sharing these collected traces to the network research community. In this dissertation, we present our experience in designing and implementing one of the largest distributed WLAN measurement systems in the United States, the Dartmouth Internet Security Testbed (DIST), with a particular focus on our solutions to the challenges of efficiency, scalability, and security. We also present an extensive evaluation of the DIST system. To understand the severity of some potential trace-sharing risks for an enterprise-wide large-scale wireless network, we conduct privacy analysis on one kind of wireless network traces, a user-association log, collected from a large-scale WLAN. We introduce a machine-learning based approach that can extract and quantify sensitive information from a user-association log, even though it is sanitized. Finally, we present a case study that evaluates the tradeoff between utility and privacy on WLAN trace sanitization
Delay Parameter Selection in Permutation Entropy Using Topological Data Analysis
Permutation Entropy (PE) is a powerful tool for quantifying the
predictability of a sequence which includes measuring the regularity of a time
series. Despite its successful application in a variety of scientific domains,
PE requires a judicious choice of the delay parameter . While another
parameter of interest in PE is the motif dimension , Typically is
selected between and with or giving optimal results for the
majority of systems. Therefore, in this work we focus solely on choosing the
delay parameter. Selecting is often accomplished using trial and error
guided by the expertise of domain scientists. However, in this paper, we show
that persistent homology, the flag ship tool from Topological Data Analysis
(TDA) toolset, provides an approach for the automatic selection of . We
evaluate the successful identification of a suitable from our TDA-based
approach by comparing our results to a variety of examples in published
literature
A Framework for Adversarially Robust Streaming Algorithms
We investigate the adversarial robustness of streaming algorithms. In this
context, an algorithm is considered robust if its performance guarantees hold
even if the stream is chosen adaptively by an adversary that observes the
outputs of the algorithm along the stream and can react in an online manner.
While deterministic streaming algorithms are inherently robust, many central
problems in the streaming literature do not admit sublinear-space deterministic
algorithms; on the other hand, classical space-efficient randomized algorithms
for these problems are generally not adversarially robust. This raises the
natural question of whether there exist efficient adversarially robust
(randomized) streaming algorithms for these problems.
In this work, we show that the answer is positive for various important
streaming problems in the insertion-only model, including distinct elements and
more generally -estimation, -heavy hitters, entropy estimation, and
others. For all of these problems, we develop adversarially robust
-approximation algorithms whose required space matches that of
the best known non-robust algorithms up to a multiplicative factor (and in some cases even up to a constant
factor). Towards this end, we develop several generic tools allowing one to
efficiently transform a non-robust streaming algorithm into a robust one in
various scenarios.Comment: Conference version in PODS 2020. Version 3 addressing journal
referees' comments; improved exposition of sketch switchin
Anomaly Detection in Network Streams Through a Distributional Lens
Anomaly detection in computer networks yields valuable information on events relating to the components of a network, their states, the users in a network and their activities. This thesis provides a unified distribution-based methodology for online detection of anomalies in network traffic streams. The methodology is distribution-based in that it regards the traffic stream as a time series of distributions (histograms), and monitors metrics of distributions in the time series. The effectiveness of the methodology is demonstrated in three application scenarios. First, in 802.11 wireless traffic, we show the ability to detect certain classes of attacks using the methodology. Second, in information network update streams (specifically in Wikipedia) we show the ability to detect the activity of bots, flash events, and outages, as they occur. Third, in Voice over IP traffic streams, we show the ability to detect covert channels that exfiltrate confidential information out of the network. Our experiments show the high detection rate of the methodology when compared to other existing methods, while maintaining a low rate of false positives. Furthermore, we provide algorithmic results that enable efficient and scalable implementation of the above methodology, to accomodate the massive data rates observed in modern infomation streams on the Internet. Through these applications, we present an extensive study of several aspects of the methodology. We analyze the behavior of metrics we consider, providing justification of our choice of those metrics, and how they can be used to diagnose anomalies. We provide insight into the choice of parameters, like window length and threshold, used in anomaly detection
Continuous Monitoring of l_p Norms in Data Streams
In insertion-only streaming, one sees a sequence of indices a_1, a_2, ..., a_m in [n]. The stream defines a sequence of m frequency vectors x(1), ..., x(m) each in R^n, where x(t) is the frequency vector of items after seeing the first t indices in the stream. Much work in the streaming literature focuses on estimating some function f(x(m)). Many applications though require obtaining estimates at time t of f(x(t)), for every t in [m]. Naively this guarantee is obtained by devising an algorithm with failure probability less than 1/m, then performing a union bound over all stream updates to guarantee that all m estimates are simultaneously accurate with good probability. When f(x) is some l_p norm of x, recent works have shown that this union bound is wasteful and better space complexity is possible for the continuous monitoring problem, with the strongest known results being for p=2. In this work, we improve the state of the art for all 0<p<2, which we obtain via a novel analysis of Indyk\u27s p-stable sketch
A Distributed Information Divergence Estimation over Data Streams
International audienceIn this paper, we consider the setting of large scale distributed systems, in which each node needs to quickly process a huge amount of data received in the form of a stream that may have been tampered with by an adversary. In this situation, a fundamental problem is how to detect and quantify the amount of work performed by the adversary. To address this issue, we propose a novel algorithm AnKLe for estimating the Kullback-Leibler divergence of an observed stream compared with the expected one. AnKLe combines sampling techniques and information-theoretic methods. It is very efficient, both in terms of space and time complexities, and requires only a single pass over the data stream. We show that AnKLe is an (Δ, ÎŽ)-approximation algorithm with a space complexity Ă(1/Δ + 1/Δ^2) bits in "most" cases, and Ă(1/Δ + (nâΔâ1)/Δ^2) otherwise, where n is the number of distinct data items in a stream. Moreover, we propose a distributed version of AnKLe that requires at most O (rl (log n + 1)) bits of communication between the l participating nodes, where r is number of rounds of the algorithm. Experimental results show that the estimation provided by AnKLe remains accurate even for different adversarial settings for which the quality of other methods dramatically decreases
- âŠ