326 research outputs found
The Hunting of the Bump: On Maximizing Statistical Discrepancy
Anomaly detection has important applications in biosurveilance and
environmental monitoring. When comparing measured data to data drawn from a
baseline distribution, merely, finding clusters in the measured data may not
actually represent true anomalies. These clusters may likely be the clusters of
the baseline distribution. Hence, a discrepancy function is often used to
examine how different measured data is to baseline data within a region. An
anomalous region is thus defined to be one with high discrepancy.
In this paper, we present algorithms for maximizing statistical discrepancy
functions over the space of axis-parallel rectangles. We give provable
approximation guarantees, both additive and relative, and our methods apply to
any convex discrepancy function. Our algorithms work by connecting statistical
discrepancy to combinatorial discrepancy; roughly speaking, we show that in
order to maximize a convex discrepancy function over a class of shapes, one
needs only maximize a linear discrepancy function over the same set of shapes.
We derive general discrepancy functions for data generated from a one-
parameter exponential family. This generalizes the widely-used Kulldorff scan
statistic for data from a Poisson distribution. We present an algorithm running
in that computes the maximum
discrepancy rectangle to within additive error , for the Kulldorff
scan statistic. Similar results hold for relative error and for discrepancy
functions for data coming from Gaussian, Bernoulli, and gamma distributions.
Prior to our work, the best known algorithms were exact and ran in time
.Comment: 11 pages. A short version of this paper will appear in SODA06. This
full version contains an additional short appendi
HH-IPG: Leveraging Inter-Packet Gap Metrics in P4 Hardware for Heavy Hitter Detection
The research community has recently proposed several solutions based on modern programmable switches to detect entirely in the data plane the flows exceeding pre-determined thra eshold in a time window, i.e., Heavy Hitters (HH). This is commonly achieved by dividing the network stream into fixed time slots and identifying each separately without considering the traffic trends from previous intervals. In this work, we show that using specified time windows can lead to high inaccuracies. We make a case for rethinking how switches analyze the incoming packets and propose to leverage per-flow Inter Packet Gap (IPG) analytics instead of using flow counters for HH detection. We propose an algorithm and present a P4 pipeline design using this new metric in mind. We implement our solution on P4 hardware and experimentally evaluate it against real traffic traces. We show that our results are more accurate than related work by up to 20% while reducing the control channel overhead by up to two orders of magnitude. Finally, we showcase a QoS-oriented application of the proposed dataplane-only IPG-based HH detection in a mobile network scenario
Enabling event-triggered data plane monitoring
We propose a push-based approach to network monitoring that allows the detection, within the dataplane, of traffic aggregates. Notifications from the switch to the controller are sent only if required, avoiding the transmission or processing of unnecessary data. Furthermore, the dataplane iteratively refines the responsible IP prefixes, allowing the controller to receive information with a flexible granularity. We implemented our solution, Elastic Trie, in P4 and for two different FPGA devices. We evaluated it with packet traces from an ISP backbone. Our approach can spot changes in the traffic patterns and detect (with 95% of accuracy) either hierarchical heavy hitters with less than 8KB or superspreaders with less than 300KB of memory, respectively. Additionally, it reduces controller-dataplane communication overheads by up to two orders of magnitude with respect to state-of-the-art solutions
Measurements As First-class Artifacts
The emergence of programmable switches has sparked a significant amount of
work on new techniques to perform more powerful measurement tasks, for
instance, to obtain fine-grained traffic and performance statistics. Previous
work has focused on the efficiency of these measurements alone and has
neglected flexibility, resulting in solutions that are hard to reuse or
repurpose and that often overlap in functionality or goals.
In this paper, we propose the use of a set of reusable primitive building
blocks that can be composed to express measurement tasks in a concise and
simple way. We describe the rationale for the design of our primitives, that we
have named MAFIA (Measurements As FIrst-class Artifacts), and using several
examples we illustrate how they can be combined to realize a comprehensive
range of network measurement tasks. Writing MAFIA code does not require expert
knowledge of low-level switch architecture details. Using a prototype
implementation of MAFIA, we demonstrate the applicability of our approach and
show that the use of our primitives results in compiled code that is comparable
in size and resource usage with manually written specialized P4 code and can be
run in current hardware.Comment: Infocom 2019 extended versio
An Elephant in the Room: Using Sampling for Detecting Heavy-Hitters in Programmable Switches
The ability to detect elephant flows in the forwarding device itself, i.e., a switch, facilitates the deployment of new advanced applications such as load-balancing, per-flow QoS management, etc. Sketches and Space Saving summarization techniques are used for elephant flow detection. However, their memory and computing requirements force the cooperation of an external controller device, due to the scarce resources of current programmable switches. To overcome this limitation, we adapt Sketch and Space Saving elephant flow detection techniques to operate with instant notification and sampled traffic. We evaluate the performance of the resulting techniques with three real traffic traces. The use of sampling allows the identification of a large share of the total traffic corresponding to the elephant flows with a low memory footprint and a reduction of the computing requirements in two orders of magnitude compared to unsampled versions. In turn, we observe a slight increase in the number of false positives and the number of flow notifications.The work of Alberto García-Martínez and Marcelo Bagnulo was supported by the TRUE5G Project ('Evolución hacia redes y servicios auto-gestionados para el 5G del futuro') by the Spanish National Research Agency under Grant PID2019-108713RB-C52/AEI/10.13039/501100011033
- …