Search CORE

3 research outputs found

Classifying tor traffic using character analysis

Author: Choorod Pitpimon
Publication venue
Publication date
Field of study

Tor is a privacy-preserving network that enables users to browse the Internet anonymously. Although the prospect of such anonymity is welcomed in many quarters, Tor can also be used for malicious purposes, prompting the need to monitor Tor network connections. Most traffic classification methods depend on flow-based features, due to traffic encryption. However, these features can be less reliable due to issues like asymmetric routing, and processing multiple packets can be time-intensive. In light of Tor’s sophisticated multilayered payload encryption compared with nonTor encryption, our research explored patterns in the encrypted data of both networks, challenging conventional encryption theory which assumes that ciphertexts should not be distinguishable from random strings of equal length. Our novel approach leverages machine learning to differentiate Tor from nonTor traffic using only the encrypted payload. We focused on extracting statistical hex character-based features from their encrypted data. For consistent findings, we drew from two datasets: a public one, which was divided into eight application types for more granular insight and a private one. Both datasets covered Tor and nonTor traffic. We developed a custom Python script called Charcount to extract relevant data and features accurately. To verify our results’ robustness, we utilized both Weka and scikit-learn for classification. In our first line of research, we conducted hex character analysis on the encrypted payloads of both Tor and nonTor traffic using statistical testing. Our investigation revealed a significant differentiation rate between Tor and nonTor traffic of 95.42% for the public dataset and 100% for the private dataset. The second phase of our study aimed to distinguish between Tor and nonTor traffic using machine learning, focusing on encrypted payload features that are independent of length. In our evaluations, the public dataset yielded an average accuracy of 93.56% when classified with the Decision Tree (DT) algorithm in scikit-learn, and 95.65% with the j48 algorithm in Weka. For the private dataset, the accuracies were 95.23% and 97.12%, respectively. Additionally, we found that the combination of WrapperSubsetEval+BestFirst with the J48 classifier both enhanced accuracy and optimized processing efficiency. In conclusion, our study contributes to both demonstrating the distinction between Tor and nonTor traffic and achieving efficient classification of both types of traffic using features derived exclusively from a single encrypted payload packet. This work holds significant implications for cybersecurity and points towards further advancements in the field.Tor is a privacy-preserving network that enables users to browse the Internet anonymously. Although the prospect of such anonymity is welcomed in many quarters, Tor can also be used for malicious purposes, prompting the need to monitor Tor network connections. Most traffic classification methods depend on flow-based features, due to traffic encryption. However, these features can be less reliable due to issues like asymmetric routing, and processing multiple packets can be time-intensive. In light of Tor’s sophisticated multilayered payload encryption compared with nonTor encryption, our research explored patterns in the encrypted data of both networks, challenging conventional encryption theory which assumes that ciphertexts should not be distinguishable from random strings of equal length. Our novel approach leverages machine learning to differentiate Tor from nonTor traffic using only the encrypted payload. We focused on extracting statistical hex character-based features from their encrypted data. For consistent findings, we drew from two datasets: a public one, which was divided into eight application types for more granular insight and a private one. Both datasets covered Tor and nonTor traffic. We developed a custom Python script called Charcount to extract relevant data and features accurately. To verify our results’ robustness, we utilized both Weka and scikit-learn for classification. In our first line of research, we conducted hex character analysis on the encrypted payloads of both Tor and nonTor traffic using statistical testing. Our investigation revealed a significant differentiation rate between Tor and nonTor traffic of 95.42% for the public dataset and 100% for the private dataset. The second phase of our study aimed to distinguish between Tor and nonTor traffic using machine learning, focusing on encrypted payload features that are independent of length. In our evaluations, the public dataset yielded an average accuracy of 93.56% when classified with the Decision Tree (DT) algorithm in scikit-learn, and 95.65% with the j48 algorithm in Weka. For the private dataset, the accuracies were 95.23% and 97.12%, respectively. Additionally, we found that the combination of WrapperSubsetEval+BestFirst with the J48 classifier both enhanced accuracy and optimized processing efficiency. In conclusion, our study contributes to both demonstrating the distinction between Tor and nonTor traffic and achieving efficient classification of both types of traffic using features derived exclusively from a single encrypted payload packet. This work holds significant implications for cybersecurity and points towards further advancements in the field

STAX (Strathclyde Repository)

Classifying Tor traffic encrypted payload using machine learning

Author: Choorod Pitpimon
Fernando Anil
Weir George
Publication venue
Publication date: 08/02/2024
Field of study

Tor, a network offering Internet anonymity, presented both positive and potentially malicious applications, leading to the need for efficient Tor traffic monitoring. While most current traffic classification methods rely on flow-based features, these can be unreliable due to factors like asymmetric routing, and the use of multiple packets for feature computation can lead to processing delays. Recognising the multi-layered encryption of Tor compared to nonTor encrypted payloads, our study explored distinct patterns in their encrypted data. We introduced a novel method using Deep Packet Inspection and machine learning to differentiate between Tor and nonTor traffic based solely on encrypted payload. In the first strand of our research, we investigated hex character analysis of the Tor and nonTor encrypted payloads through statistical testing across 8 groups of application types. Remarkably, our investigation revealed a significant differentiation rate of 94.53% between Tor and nonTor traffic. In the second strand of our research, we aimed to distinguish Tor and nonTor traffic using machine learning, based on encrypted payload features. This proposed feature-based approach proved effective, as evidenced by our classification performance, which attained an average accuracy rate of 95.65% across these 8 groups of applications. Thereby, this study contributes to the efficient classification of Tor and nonTor traffic through features derived solely from a single encrypted payload packet, independent of its position in the traffic flow

University of Strathclyde Institutional Repository

Classifying Tor Traffic Encrypted Payload Using Machine Learning

Author: Anil Fernando
George Weir
Pitpimon Choorod
Publication venue: IEEE
Publication date: 01/01/2024
Field of study

Directory of Open Access Journals