1,756 research outputs found

    Faster and More Accurate Measurement through Additive-Error Counters

    Full text link
    Counters are a fundamental building block for networking applications such as load balancing, traffic engineering, and intrusion detection, which require estimating flow sizes and identifying heavy hitter flows. Existing works suggest replacing counters with shorter multiplicative error \emph{estimators} that improve the accuracy by fitting more of them within a given space. However, such estimators impose a computational overhead that degrades the measurement throughput. Instead, we propose \emph{additive} error estimators, which are simpler, faster, and more accurate when used for network measurement. Our solution is rigorously analyzed and empirically evaluated against several other measurement algorithms on real Internet traces. For a given error target, we improve the speed of the uncompressed solutions by 5×5\times-30×30\times, and the space by up to 4×4\times. Compared with existing state-of-the-art estimators, our solution is 9× 9\times-35×35\times faster while being considerably more accurate.Comment: To appear in IEEE INFOCOM 202

    Space-efficient data sketching algorithms for network applications

    Get PDF
    Sketching techniques are widely adopted in network applications. Sketching algorithms “encode” data into succinct data structures that can later be accessed and “decoded” for various purposes, such as network measurement, accounting, anomaly detection and etc. Bloom filters and counter braids are two well-known representatives in this category. Those sketching algorithms usually need to strike a tradeoff between performance (how much information can be revealed and how fast) and cost (storage, transmission and computation). This dissertation is dedicated to the research and development of several sketching techniques including improved forms of stateful Bloom Filters, Statistical Counter Arrays and Error Estimating Codes. Bloom filter is a space-efficient randomized data structure for approximately representing a set in order to support membership queries. Bloom filter and its variants have found widespread use in many networking applications, where it is important to minimize the cost of storing and communicating network data. In this thesis, we propose a family of Bloom Filter variants augmented by rank-indexing method. We will show such augmentation can bring a significant reduction of space and also the number of memory accesses, especially when deletions of set elements from the Bloom Filter need to be supported. Exact active counter array is another important building block in many sketching algorithms, where storage cost of the array is of paramount concern. Previous approaches reduce the storage costs while either losing accuracy or supporting only passive measurements. In this thesis, we propose an exact statistics counter array architecture that can support active measurements (real-time read and write). It also leverages the aforementioned rank-indexing method and exploits statistical multiplexing to minimize the storage costs of the counter array. Error estimating coding (EEC) has recently been established as an important tool to estimate bit error rates in the transmission of packets over wireless links. In essence, the EEC problem is also a sketching problem, since the EEC codes can be viewed as a sketch of the packet sent, which is decoded by the receiver to estimate bit error rate. In this thesis, we will first investigate the asymptotic bound of error estimating coding by viewing the problem from two-party computation perspective and then investigate its coding/decoding efficiency using Fisher information analysis. Further, we develop several sketching techniques including Enhanced tug-of-war(EToW) sketch and the generalized EEC (gEEC)sketch family which can achieve around 70% reduction of sketch size with similar estimation accuracies. For all solutions proposed above, we will use theoretical tools such as information theory and communication complexity to investigate how far our proposed solutions are away from the theoretical optimal. We will show that the proposed techniques are asymptotically or empirically very close to the theoretical bounds.PhDCommittee Chair: Xu, Jun; Committee Member: Feamster, Nick; Committee Member: Li, Baochun; Committee Member: Romberg, Justin; Committee Member: Zegura, Ellen W

    No NAT'd User left Behind: Fingerprinting Users behind NAT from NetFlow Records alone

    Full text link
    It is generally recognized that the traffic generated by an individual connected to a network acts as his biometric signature. Several tools exploit this fact to fingerprint and monitor users. Often, though, these tools assume to access the entire traffic, including IP addresses and payloads. This is not feasible on the grounds that both performance and privacy would be negatively affected. In reality, most ISPs convert user traffic into NetFlow records for a concise representation that does not include, for instance, any payloads. More importantly, large and distributed networks are usually NAT'd, thus a few IP addresses may be associated to thousands of users. We devised a new fingerprinting framework that overcomes these hurdles. Our system is able to analyze a huge amount of network traffic represented as NetFlows, with the intent to track people. It does so by accurately inferring when users are connected to the network and which IP addresses they are using, even though thousands of users are hidden behind NAT. Our prototype implementation was deployed and tested within an existing large metropolitan WiFi network serving about 200,000 users, with an average load of more than 1,000 users simultaneously connected behind 2 NAT'd IP addresses only. Our solution turned out to be very effective, with an accuracy greater than 90%. We also devised new tools and refined existing ones that may be applied to other contexts related to NetFlow analysis

    File Detection on Network Traffic Using Approximate Matching

    Get PDF
    In recent years, Internet technologies changed enormously and allow faster Internet connections, higher data rates and mobile usage. Hence, it is possible to send huge amounts of data / files easily which is often used by insiders or attackers to steal intellectual property. As a consequence, data leakage prevention systems (DLPS) have been developed which analyze network traffic and alert in case of a data leak. Although the overall concepts of the detection techniques are known, the systems are mostly closed and commercial. Within this paper we present a new technique for network traffic analysis based on approximate matching (a.k.a fuzzy hashing) which is very common in digital forensics to correlate similar files. This paper demonstrates how to optimize and apply them on single network packets. Our contribution is a straightforward concept which does not need a comprehensive configuration: hash the file and store the digest in the database. Within our experiments we obtained false positive rates between 10−4 and 10−5 and an algorithm throughput of over 650 Mbit/s

    PerfWeb: How to Violate Web Privacy with Hardware Performance Events

    Full text link
    The browser history reveals highly sensitive information about users, such as financial status, health conditions, or political views. Private browsing modes and anonymity networks are consequently important tools to preserve the privacy not only of regular users but in particular of whistleblowers and dissidents. Yet, in this work we show how a malicious application can infer opened websites from Google Chrome in Incognito mode and from Tor Browser by exploiting hardware performance events (HPEs). In particular, we analyze the browsers' microarchitectural footprint with the help of advanced Machine Learning techniques: k-th Nearest Neighbors, Decision Trees, Support Vector Machines, and in contrast to previous literature also Convolutional Neural Networks. We profile 40 different websites, 30 of the top Alexa sites and 10 whistleblowing portals, on two machines featuring an Intel and an ARM processor. By monitoring retired instructions, cache accesses, and bus cycles for at most 5 seconds, we manage to classify the selected websites with a success rate of up to 86.3%. The results show that hardware performance events can clearly undermine the privacy of web users. We therefore propose mitigation strategies that impede our attacks and still allow legitimate use of HPEs
    corecore