326 research outputs found

    The Hunting of the Bump: On Maximizing Statistical Discrepancy

    Full text link
    Anomaly detection has important applications in biosurveilance and environmental monitoring. When comparing measured data to data drawn from a baseline distribution, merely, finding clusters in the measured data may not actually represent true anomalies. These clusters may likely be the clusters of the baseline distribution. Hence, a discrepancy function is often used to examine how different measured data is to baseline data within a region. An anomalous region is thus defined to be one with high discrepancy. In this paper, we present algorithms for maximizing statistical discrepancy functions over the space of axis-parallel rectangles. We give provable approximation guarantees, both additive and relative, and our methods apply to any convex discrepancy function. Our algorithms work by connecting statistical discrepancy to combinatorial discrepancy; roughly speaking, we show that in order to maximize a convex discrepancy function over a class of shapes, one needs only maximize a linear discrepancy function over the same set of shapes. We derive general discrepancy functions for data generated from a one- parameter exponential family. This generalizes the widely-used Kulldorff scan statistic for data from a Poisson distribution. We present an algorithm running in O(1ϵn2log2n)O(\smash[tb]{\frac{1}{\epsilon} n^2 \log^2 n}) that computes the maximum discrepancy rectangle to within additive error ϵ\epsilon, for the Kulldorff scan statistic. Similar results hold for relative error and for discrepancy functions for data coming from Gaussian, Bernoulli, and gamma distributions. Prior to our work, the best known algorithms were exact and ran in time O(n4)\smash[t]{O(n^4)}.Comment: 11 pages. A short version of this paper will appear in SODA06. This full version contains an additional short appendi

    HH-IPG: Leveraging Inter-Packet Gap Metrics in P4 Hardware for Heavy Hitter Detection

    Get PDF
    The research community has recently proposed several solutions based on modern programmable switches to detect entirely in the data plane the flows exceeding pre-determined thra eshold in a time window, i.e., Heavy Hitters (HH). This is commonly achieved by dividing the network stream into fixed time slots and identifying each separately without considering the traffic trends from previous intervals. In this work, we show that using specified time windows can lead to high inaccuracies. We make a case for rethinking how switches analyze the incoming packets and propose to leverage per-flow Inter Packet Gap (IPG) analytics instead of using flow counters for HH detection. We propose an algorithm and present a P4 pipeline design using this new metric in mind. We implement our solution on P4 hardware and experimentally evaluate it against real traffic traces. We show that our results are more accurate than related work by up to 20% while reducing the control channel overhead by up to two orders of magnitude. Finally, we showcase a QoS-oriented application of the proposed dataplane-only IPG-based HH detection in a mobile network scenario

    Enabling event-triggered data plane monitoring

    Get PDF
    We propose a push-based approach to network monitoring that allows the detection, within the dataplane, of traffic aggregates. Notifications from the switch to the controller are sent only if required, avoiding the transmission or processing of unnecessary data. Furthermore, the dataplane iteratively refines the responsible IP prefixes, allowing the controller to receive information with a flexible granularity. We implemented our solution, Elastic Trie, in P4 and for two different FPGA devices. We evaluated it with packet traces from an ISP backbone. Our approach can spot changes in the traffic patterns and detect (with 95% of accuracy) either hierarchical heavy hitters with less than 8KB or superspreaders with less than 300KB of memory, respectively. Additionally, it reduces controller-dataplane communication overheads by up to two orders of magnitude with respect to state-of-the-art solutions

    Measurements As First-class Artifacts

    Full text link
    The emergence of programmable switches has sparked a significant amount of work on new techniques to perform more powerful measurement tasks, for instance, to obtain fine-grained traffic and performance statistics. Previous work has focused on the efficiency of these measurements alone and has neglected flexibility, resulting in solutions that are hard to reuse or repurpose and that often overlap in functionality or goals. In this paper, we propose the use of a set of reusable primitive building blocks that can be composed to express measurement tasks in a concise and simple way. We describe the rationale for the design of our primitives, that we have named MAFIA (Measurements As FIrst-class Artifacts), and using several examples we illustrate how they can be combined to realize a comprehensive range of network measurement tasks. Writing MAFIA code does not require expert knowledge of low-level switch architecture details. Using a prototype implementation of MAFIA, we demonstrate the applicability of our approach and show that the use of our primitives results in compiled code that is comparable in size and resource usage with manually written specialized P4 code and can be run in current hardware.Comment: Infocom 2019 extended versio

    An Elephant in the Room: Using Sampling for Detecting Heavy-Hitters in Programmable Switches

    Get PDF
    The ability to detect elephant flows in the forwarding device itself, i.e., a switch, facilitates the deployment of new advanced applications such as load-balancing, per-flow QoS management, etc. Sketches and Space Saving summarization techniques are used for elephant flow detection. However, their memory and computing requirements force the cooperation of an external controller device, due to the scarce resources of current programmable switches. To overcome this limitation, we adapt Sketch and Space Saving elephant flow detection techniques to operate with instant notification and sampled traffic. We evaluate the performance of the resulting techniques with three real traffic traces. The use of sampling allows the identification of a large share of the total traffic corresponding to the elephant flows with a low memory footprint and a reduction of the computing requirements in two orders of magnitude compared to unsampled versions. In turn, we observe a slight increase in the number of false positives and the number of flow notifications.The work of Alberto García-Martínez and Marcelo Bagnulo was supported by the TRUE5G Project ('Evolución hacia redes y servicios auto-gestionados para el 5G del futuro') by the Spanish National Research Agency under Grant PID2019-108713RB-C52/AEI/10.13039/501100011033
    corecore