research

Flowtree: Enabling Distributed Flow Summarization at Scale

Abstract

© ACM 2018. This is the author's version of the work. It is posted here for your personal use. Not for redistribution. The definitive Version of Record was published in ACM SIGCOMM 2018, http://dx.doi.org/10.1145/3234200.3234225.NetFlow and IPFIX raw ow captures are insightful yet, due to their large volume, challenging to timely analyze and query. In particular, if these captures span long time periods or are collected at remote locations, storing or transferring them for analysis becomes increasingly expensive. Enabling efficient execution of a large range of queries over ow captures while reducing storage and transfer volume requires working with mergeable succinct summaries that capture the most essential features of flows dynamically. However, the problem of building such structures is yet unmet. In this work, we introduce a self-adjusting data structure of generalized flows, called Flowtree, that (1) re- duces the storage requirements by more than 95% while providing highly accurate answers for popular hierarchical flows, (2) minimizes transfer cost of ow summaries, and (3) supports several operators with distributed execution and summarization across time and multiple sites. The evaluation of our solution on different network traces confirms that Flowtree can accurately and promptly answer questions about flows using different feature sets.EC/H2020/679158/EU/Resolving the Tussle in the Internet: Mapping, Architecture, and Policy Making/ResolutioNetDFG, FE 570/4-1, Gottfried Wilhelm Leibniz-Preis 2011BMBF, 01IS14013A, BBDC - Berliner Kompetenzzentrum für Big Dat

    Similar works