Pretrained models for network traffic can utilize large-scale raw data to
learn the essential characteristics of network traffic, and generate
distinguishable results for input traffic without considering specific
downstream tasks. Effective pretrained models can significantly optimize the
training efficiency and effectiveness of downstream tasks, such as traffic
classification, attack detection, resource scheduling, protocol analysis, and
traffic generation. Despite the great success of pretraining in natural
language processing, there is no work in the network field. Considering the
diverse demands and characteristics of network traffic and network tasks, it is
non-trivial to build a pretrained model for network traffic and we face various
challenges, especially the heterogeneous headers and payloads in the
multi-pattern network traffic and the different dependencies for contexts of
diverse downstream network tasks.
To tackle these challenges, in this paper, we make the first attempt to
provide a generative pretrained model for both traffic understanding and
generation tasks. We propose the multi-pattern network traffic modeling to
construct unified text inputs and support both traffic understanding and
generation tasks. We further optimize the adaptation effect of the pretrained
model to diversified tasks by shuffling header fields, segmenting packets in
flows, and incorporating diverse task labels with prompts. Expensive
experiments demonstrate the effectiveness of our NetGPT in a range of traffic
understanding and generation tasks, and outperform state-of-the-art baselines
by a wide margin