Traffic analysis is important to the operation of IP networks. The
input to the analysis is raw data such as packet header traces or NetFlow
records and the output is often the size aggregates such as the traffic
generated by various applications or by individual customers. Storing the raw
data allows the flexibility of running arbitrary new analyses in the future,
but the sheer amount of raw data is often a challenge. Sampling based
techniques such as smart sampling aim at reducing the amount of raw data while
preserving the ability of future analyses to accurately estimate the traffic of
any large aggregate. There are three important measures of the traffic of an
aggregate: the number of bytes, the number of packets and the number of flows.
Current data reduction solutions allow estimating only one of these measures.
In this paper we propose the idea of unified summaries that allow the analyses
to get unbiased estimates for all three measures. Our unified summary that
takes as input flow records is based on smart sampling and the one that reads
in packet header traces is based on sample and hold. The most important
contributions of this paper are the development of novel unbiased statistical
estimators for the number of flows, the development of methods for combining
summaries measuring bytes and packets using less memory than separate
summaries, and experimental evaluation of the proposed solutions based on
traces of traffic.Pre-2018 CSE ID: CS2004-079