Methodologies for Mobile and Encrypted Traffic Classification via Machine Learning Approaches

Abstract

The widespread use of handheld devices (e.g., smartphones) has led to a significant evolution in the way the users connect to the Internet and access contents or services. This entails a substantial change in the nature of network traffic. Traffic classification---the set of techniques suited to infer the applications generating network traffic---is currently the enabler for gathering valuable information for different stakeholders in the Internet traffic delivery supply chain. This includes its application for network management (e.g., service differentiation/blocking and quality-of-service enforcement), network security, and user profiling. On top of that, traffic classification highlights compelling privacy issues related to (the share of) this information in thorny scenarios (e.g., healthcare apps and enterprise environments). Nonetheless, the proliferation of encryption (e.g., anonymity tools) hinders the suitability of solutions based on cleartext traffic inspection and thus challenges current classifiers. Also, the moving-target nature of mobile traffic, due to the daily-expanding set of apps sharing common third-party services, accelerates the performance degradation of design solutions based on standard machine learning approaches. As such, this Thesis presents a set of novel methodologies for mobile traffic classification that can operate under the encrypted-traffic assumption and advances the state-of-the-art from multiple viewpoints. In detail, the present dissertation devises innovative machine learning approaches based on multi- and hierarchical-classification. Furthermore, it pioneers the adoption of the deep learning paradigm to design practical and effective mobile traffic classifiers through the automatic extraction of features reflecting complex data patterns. Then, to overcome the complexity of these solutions, a distributed deployment based on the big-data framework is investigated. Such analysis highlights the non-transparent nature of the big-data accelerator when applied to the training phase of deep learning classifiers, shedding light on intrinsic trade-offs. Extensive experimental evaluations are conducted to assess the performance of proposed approaches and compare them with most related state-of-the-art solutions. This goal is achieved by the definition of a common benchmark encompassing public datasets. In this regard, a novel architecture is designed and implemented to capture and label our publicly-released dataset

    Similar works