Discovery of Web Attacks by Inspecting HTTPS Network Traffic with Machine Learning and Similarity Search

Abstract

Tese de mestrado, Segurança Informática, Universidade de Lisboa, Faculdade de Ciências, 2022Web applications are the building blocks of many services, from social networks to banks. Network security threats have remained a permanent concern since the advent of data communication. Not withstanding, security breaches are still a serious problem since web applications incorporate both company information and private client data. Traditional Intrusion Detection Systems (IDS) inspect the payload of the packets looking for known intrusion signatures or deviations from nor mal behavior. However, this Deep Packet Inspection (DPI) approach cannot inspect encrypted network traffic of Hypertext Transfer Protocol Secure (HTTPS), a protocol that has been widely adopted nowadays to protect data communication. We are interested in web application attacks, and to accurately detect them, we must access the payload. Network flows are able to aggregate flows of traffic with common properties, so they can be employed for inspecting large amounts of traffic. The main objective of this thesis is to develop a system to discover anomalous HTTPS traffic and confirm that the payloads included in it contains web applications attacks. We propose a new reliable method and system to identify traffic that may include web application attacks by analysing HTTPS network flows (netflows) and discovering payload content similarities. We resort to unsupervised machine learning algorithms to cluster netflows and identify anomalous traffic and to Locality Sensitive Hashing (LSH) algorithms to create a Similarity Search Engine (SSE) capable of correctly identifying the presence of known web applications attacks over this traffic. We involve the system in a continuous improvement process to keep a reliable detection as new web applications attacks are discovered. We evaluated the system, which showed that it could detect anomalous traffic, the SSE was able to confirm the presence of web attacks into that anomalous traffic, and the continuous improvement process was able to increase the accuracy of the SSE

    Similar works