61 research outputs found

    PREDICTING THE UNKNOWN: MACHINE LEARNING TECHNIQUES FOR VIDEO FINGERPRINTING ATTACKS OVER TOR

    Get PDF
    In recent years, anonymization services such as Tor have become a popular resource for terrorist organizations and violent extremist groups. These adversaries use Tor to access the Dark Web to distribute video media as a way to recruit, train, and incite violence and acts of terrorism worldwide. This research strives to address this issue by examining and analyzing the use and development of video fingerprinting attacks using deep learning models. These high-performing deep learning models are called Deep Fingerprinting, which is used to predict video patterns with high accuracy in a closed-world setting. We pose ourselves as the adversary by passively observing raw network traffic as a user downloads a short video from YouTube. Based on traffic patterns, we can deduce what video the user was streaming with higher accuracy than previously obtained. In addition, our results include identifying the genre of the video. Our results suggest that an adversary may predict the video a user downloads over Tor with up to 83% accuracy, even when the user applies additional defenses to protect online privacy. By comparing different Deep Fingerprinting models with one another, we can better understand which models perform better from both the attacker and user’s perspective.Lieutenant, United States NavyApproved for public release. Distribution is unlimited

    Information Leakage Attacks and Countermeasures

    Get PDF
    The scientific community has been consistently working on the pervasive problem of information leakage, uncovering numerous attack vectors, and proposing various countermeasures. Despite these efforts, leakage incidents remain prevalent, as the complexity of systems and protocols increases, and sophisticated modeling methods become more accessible to adversaries. This work studies how information leakages manifest in and impact interconnected systems and their users. We first focus on online communications and investigate leakages in the Transport Layer Security protocol (TLS). Using modern machine learning models, we show that an eavesdropping adversary can efficiently exploit meta-information (e.g., packet size) not protected by the TLS’ encryption to launch fingerprinting attacks at an unprecedented scale even under non-optimal conditions. We then turn our attention to ultrasonic communications, and discuss their security shortcomings and how adversaries could exploit them to compromise anonymity network users (even though they aim to offer a greater level of privacy compared to TLS). Following up on these, we delve into physical layer leakages that concern a wide array of (networked) systems such as servers, embedded nodes, Tor relays, and hardware cryptocurrency wallets. We revisit location-based side-channel attacks and develop an exploitation neural network. Our model demonstrates the capabilities of a modern adversary but also presents an inexpensive tool to be used by auditors for detecting such leakages early on during the development cycle. Subsequently, we investigate techniques that further minimize the impact of leakages found in production components. Our proposed system design distributes both the custody of secrets and the cryptographic operation execution across several components, thus making the exploitation of leaks difficult

    Towards private and robust machine learning for information security

    Get PDF
    Many problems in information security are pattern recognition problems. For example, determining if a digital communication can be trusted amounts to certifying that the communication does not carry malicious or secret content, which can be distilled into the problem of recognising the difference between benign and malicious content. At a high level, machine learning is the study of how patterns are formed within data, and how learning these patterns generalises beyond the potentially limited data pool at a practitioner’s disposal, and so has become a powerful tool in information security. In this work, we study the benefits machine learning can bring to two problems in information security. Firstly, we show that machine learning can be used to detect which websites are visited by an internet user over an encrypted connection. By analysing timing and packet size information of encrypted network traffic, we train a machine learning model that predicts the target website given a stream of encrypted network traffic, even if browsing is performed over an anonymous communication network. Secondly, in addition to studying how machine learning can be used to design attacks, we study how it can be used to solve the problem of hiding information within a cover medium, such as an image or an audio recording, which is commonly referred to as steganography. How well an algorithm can hide information within a cover medium amounts to how well the algorithm models and exploits areas of redundancy. This can again be reduced to a pattern recognition problem, and so we apply machine learning to design a steganographic algorithm that efficiently hides a secret message with an image. Following this, we proceed with discussions surrounding why machine learning is not a panacea for information security, and can be an attack vector in and of itself. We show that machine learning can leak private and sensitive information about the data it used to learn, and how malicious actors can exploit vulnerabilities in these learning algorithms to compel them to exhibit adversarial behaviours. Finally, we examine the problem of the disconnect between image recognition systems learned by humans and by machine learning models. While human classification of an image is relatively robust to noise, machine learning models do not possess this property. We show how an attacker can cause targeted misclassifications against an entire data distribution by exploiting this property, and go onto introduce a mitigation that ameliorates this undesirable trait of machine learning

    Website Fingerprinting using Deep Learning

    Get PDF
    Website fingerprinting (WF) enables a local eavesdropper to determine which websites a user is visiting over an encrypted connection. State-of-the-art WF attacks have been shown to be effective even against Tor. Recently, lightweight WF defenses for Tor have been proposed that substantially degrade existing attacks: WTF-PAD and Walkie-Talkie. In this work, we explore the impact of recent advances in deep learning on WF attacks and defenses. We first present Deep Fingerprinting (DF), a new WF attack based on deep learning, and we evaluate this attack against WTF-PAD and Walkie-Talkie. The DF attack attains over 98% accuracy on Tor traffic without defenses, making it the state-of-the-art WF attack at the time of publishing this work. DF is the only attack that is effective against WTF-PAD with over 90% accuracy, and against Walkie-Talkie, DF achieves a top-2 accuracy of 98%. In the more realistic open-world setting, our attack remains effective. These findings highlight the need for defenses that protect against attacks like DF that use advanced deep learning techniques. Since DF requires large amounts of training data that is regularly updated, some may argue that is it is not practical for the weaker attacker model typically assumed in WF. Additionally, most WF attacks make strong assumptions about the testing and training data have similar distributions and being collected from the same type of network at about the same time. Thus, we next examine ways that an attacker could reduce the difficulty of performing an attack by leveraging N-shot learning, in which just a few training samples are needed to identify a given class. In particular, we propose a new WF attack called Triplet Fingerprinting (TF) that uses triplet networks for N-shot learning. We evaluate this attack in challenging settings such as where the training and testing data are from multiple years apart and collected on different networks, and we find that the TF attack remains effective in such settings with 85% accuracy or better. We also show that the TF attack is also effective in the open world and outperforms transfer learning. Finally, in response to the DF and TF attacks, we propose the CAM-Pad defense: a novel WF defense utilizing the Grad-CAM visual explanation technique. Grad-CAM can be used to identify regions of particular sensitivity in the data and provide insight into the features that the model has learned, providing more understanding about how the DF attack makes its prediction. The defense is based on a dynamic flow-padding defense, making it practical for deployment in Tor. The defense can reduce the attacker\u27s accuracy using the DF attack from 98% to 67%, which is much better than the WTF-PAD defense, with a packet overhead of approximately 80%

    Analytics over Encrypted Traffic and Defenses

    Get PDF
    Encrypted traffic flows have been known to leak information about their underlying content through statistical properties such as packet lengths and timing. While traffic fingerprinting attacks exploit such information leaks and threaten user privacy by disclosing website visits, videos streamed, and user activity on messaging platforms, they can also be helpful in network management and intelligence services. Most recent and best-performing such attacks are based on deep learning models. In this thesis, we identify multiple limitations in the currently available attacks and defenses against them. First, these deep learning models do not provide any insights into their decision-making process. Second, most attacks that have achieved very high accuracies are still limited by unrealistic assumptions that affect their practicality. For example, most attacks assume a closed world setting and focus on traffic classification after event completion. Finally, current state-of-the-art defenses still incur high overheads to provide reasonable privacy, which limits their applicability in real-world applications. In order to address these limitations, we first propose an inline traffic fingerprinting attack based on variable-length sequence modeling to facilitate real-time analytics. Next, we attempt to understand the inner workings of deep learning-based attacks with the dual goals of further improving attacks and designing efficient defenses against such attacks. Then, based on the observations from this analysis, we propose two novel defenses against traffic fingerprinting attacks that provide privacy under more realistic constraints and at lower bandwidth overheads. Finally, we propose a robust framework for open set classification that targets network traffic with this added advantage of being more suitable for deployment in resource-constrained in-network devices

    TorSH: Obfuscating consumer Internet-of-Things traffic with a collaborative smart-home router network

    Get PDF
    When consumers install Internet-connected smart devices in their homes, metadata arising from the communications between these devices and their cloud-based service providers enables adversaries privy to this traffic to profile users, even when adequate encryption is used. Internet service providers (ISPs) are one potential adversary privy to users’ incom- ing and outgoing Internet traffic and either currently use this insight to assemble and sell consumer advertising profiles or may in the future do so. With existing defenses against such profiling falling short of meeting user preferences and abilities, there is a need for a novel solution that empowers consumers to defend themselves against profiling by ISP-like actors and that is more in tune with their wishes. In this thesis, we present The Onion Router for Smart Homes (TorSH), a network of smart-home routers working collaboratively to defend smart-device traffic from analysis by ISP-like adversaries. We demonstrate that TorSH succeeds in deterring such profiling while preserving smart-device experiences and without encumbering latency-sensitive, non-smart-device experiences like web browsing

    Towards More Effective Traffic Analysis in the Tor Network.

    Get PDF
    University of Minnesota Ph.D. dissertation. February 2021. Major: Computer Science. Advisor: Nicholas Hopper. 1 computer file (PDF); xiii, 161 pages.Tor is perhaps the most well-known anonymous network, used by millions of daily users to hide their sensitive internet activities from servers, ISPs, and potentially, nation-state adversaries. Tor provides low-latency anonymity by routing traffic through a series of relays using layered encryption to prevent any single entity from learning the source and destination of a connection through the content alone. Nevertheless, in low-latency anonymity networks, the timing and volume of traffic sent between the network and end systems (clients and servers) can be used for traffic analysis. For example, recent work applying traffic analysis to Tor has focused on website fingerprinting, which can allow an attacker to identify which website a client has downloaded based on the traffic between the client and the entry relay. Along with website fingerprinting, end-to-end flow correlation attacks have been recognized as the core traffic analysis in Tor. This attack assumes that an adversary observes traffic flows entering the network (Tor flow) and leaving the network (exit flow) and attempts to correlate these flows by pairing each user with a likely destination. The research in this thesis explores the extent to which the traffic analysis technique can be applied to more sophisticated fingerprinting scenarios using state-of-the-art machine-learning algorithms and deep learning techniques. The thesis breaks down four research problems. First, the applicability of machine-learning-based website fingerprinting is examined to a search query keyword fingerprinting and improve the applicability by discovering new features. Second, a variety of fingerprinting applications are introduced using deep-learning-based website fingerprinting. Third, the work presents data-limited fingerprinting by leveraging a generative deep-learning technique called a generative adversarial network that can be optimized in scenarios with limited amounts of training data. Lastly, a novel deep-learning architecture and training strategy are proposed to extract features of highly correlated Tor and exit flow pairs, which will reduce the number of false positives between pairs of flows

    Last-Mile TLS Interception: Analysis and Observation of the Non-Public HTTPS Ecosystem

    Get PDF
    Transport Layer Security (TLS) is one of the most widely deployed cryptographic protocols on the Internet that provides confidentiality, integrity, and a certain degree of authenticity of the communications between clients and servers. Following Snowden's revelations on US surveillance programs, the adoption of TLS has steadily increased. However, encrypted traffic prevents legitimate inspection. Therefore, security solutions such as personal antiviruses and enterprise firewalls may intercept encrypted connections in search for malicious or unauthorized content. Therefore, the end-to-end property of TLS is broken by these TLS proxies (a.k.a. middleboxes) for arguably laudable reasons; yet, may pose a security risk. While TLS clients and servers have been analyzed to some extent, such proxies have remained unexplored until recently. We propose a framework for analyzing client-end TLS proxies, and apply it to 14 consumer antivirus and parental control applications as they break end-to-end TLS connections. Overall, the security of TLS connections was systematically worsened compared to the guarantees provided by modern browsers. Next, we aim at exploring the non-public HTTPS ecosystem, composed of locally-trusted proxy-issued certificates, from the user's perspective and from several countries in residential and enterprise settings. We focus our analysis on the long tail of interception events. We characterize the customers of network appliances, ranging from small/medium businesses and institutes to hospitals, hotels, resorts, insurance companies, and government agencies. We also discover regional cases of traffic interception malware/adware that mostly rely on the same Software Development Kit (i.e., NetFilter). Our scanning and analysis techniques allow us to identify more middleboxes and intercepting apps than previously found from privileged server vantages looking at billions of connections. We further perform a longitudinal study over six years of the evolution of a prominent traffic-intercepting adware found in our dataset: Wajam. We expose the TLS interception techniques it has used and the weaknesses it has introduced on hundreds of millions of user devices. This study also (re)opens the neglected problem of privacy-invasive adware, by showing how adware evolves sometimes stronger than even advanced malware and poses significant detection and reverse-engineering challenges. Overall, whether beneficial or not, TLS interception often has detrimental impacts on security without the end-user being alerted
    • …
    corecore