Evaluating machine learning algorithms for automated network application identification

Abstract

The identification of network applications that create traffic flows is vital to the areas of network management and surveillance. Current popular methods such as port number and payload-based identification are inadequate and exhibit a number of shortfalls. A potential solution is the use of machine learning techniques to identify network applications based on payload independent statistical features. In this paper we evaluate and compare the efficiency and performance of different feature selection and machine learning techniques based on flow data obtained from a number of public traffic traces. We also provide insights into which flow features are the most useful. Furthermore, we investigate the influence of other factors such as flow timeout and size of the training data set. We find significant performance differences between different algorithms and identify several algorithms that provide accurate (up to 99% accuracy) and fast classification

    Similar works