212 research outputs found

    Towards Provable Network Traffic Measurement and Analysis via Semi-Labeled Trace Datasets

    Get PDF
    Research in network traffic measurement and analysis is a long-lasting field with growing interest from both scientists and the industry. However, even after so many years, results replication, criticism, and review are still rare. We face not only a lack of research standards, but also inaccessibility of appropriate datasets that can be used for methods development and evaluation. Therefore, a lot of potentially high-quality research cannot be verified and is not adopted by the industry or the community. The aim of this paper is to overcome this controversy with a unique solution based on a combination of distinct approaches proposed by other research works. Unlike these studies, we focus on the whole issue covering all areas of data anonymization, authenticity, recency, publicity, and their usage for research provability. We believe that these challenges can be solved by utilization of semi-labeled datasets composed of real-world network traffic and annotated units with interest-related packet traces only. In this paper, we outline the basic ideas of the methodology from unit trace collection and semi-labeled dataset creation to its usage for research evaluation. We strive for this proposal to start a discussion of the approach and help to overcome some of the challenges the research faces today

    Trace-Share: Towards Provable Network Traffic Measurement and Analysis

    Get PDF
    Research in network traffic measurement and analysis is a long-lasting field with growing interest from both scientists and the industry. However, even after so many years, results replication, criticism, and review are still rare. The aim of our research is to overcome the mentioned controversy with focus on the whole issue covering all areas of data anonymization, authenticity, recency, publicity, and their usage for research provability. We believe that these challenges can be solved by utilization of semi-labeled datasets composed of real-world network traffic and annotated units with interest-related packet traces only

    Cyber Situation Awareness via IP Flow Monitoring

    Get PDF
    Cyber situation awareness has been recognized as a vital requirement for effective cyber defense. Cyber situation awareness allows cybersecurity operators to identify, understand, and anticipate incoming threats. Achieving and maintaining the cyber situation awareness is a challenging task given the continuous evolution of the computer networks, increasing volume and speeds of the data in a network, and rising number of threats to network security. Our work contributes to the continuous evolution of cyber situation awareness by the research of novel approaches to the perception and comprehension of a computer network. We concentrate our research efforts on the domain of IP flow network monitoring. We propose improvements to the IP flow monitoring techniques that enable the enhanced perception of a computer network. Further, we conduct detailed analyses of network traffic, which allows for an in-depth understanding of host behavior in a computer network. Last but not least, we propose a novel approach to IP flow network monitoring that enables real-time cyber situation awareness

    Stream-Based IP Flow Analysis

    Get PDF
    As the complexity of Internet services, transmission speed, and data volume increases, current IP flow monitoring and analysis approaches cease to be sufficient, especially within high-speed and large-scale networks. Although IP flows consist only of selected network traffic features, their processing faces high computational demands, analysis delays, and large storage requirements. To address these challenges, we propose to improve the IP flow monitoring workflow by stream-based collection and analysis of IP flows utilizing a distributed data stream processing. This approach requires changing the paradigm of IP flow data monitoring and analysis, which is the main goal of our research. We analyze distributed stream processing systems, for which we design a novel performance benchmark to determine their suitability for stream-based processing of IP flow data. We define a stream-based workflow of IP flow collection and analysis based on the benchmark results, which we also implement as a publicly available and open-source framework Stream4Flow. Furthermore, we propose new analytical methods that leverage the stream-based IP flow data processing approach and extend network monitoring and threat detection capabilities

    Analytics over Encrypted Traffic and Defenses

    Get PDF
    Encrypted traffic flows have been known to leak information about their underlying content through statistical properties such as packet lengths and timing. While traffic fingerprinting attacks exploit such information leaks and threaten user privacy by disclosing website visits, videos streamed, and user activity on messaging platforms, they can also be helpful in network management and intelligence services. Most recent and best-performing such attacks are based on deep learning models. In this thesis, we identify multiple limitations in the currently available attacks and defenses against them. First, these deep learning models do not provide any insights into their decision-making process. Second, most attacks that have achieved very high accuracies are still limited by unrealistic assumptions that affect their practicality. For example, most attacks assume a closed world setting and focus on traffic classification after event completion. Finally, current state-of-the-art defenses still incur high overheads to provide reasonable privacy, which limits their applicability in real-world applications. In order to address these limitations, we first propose an inline traffic fingerprinting attack based on variable-length sequence modeling to facilitate real-time analytics. Next, we attempt to understand the inner workings of deep learning-based attacks with the dual goals of further improving attacks and designing efficient defenses against such attacks. Then, based on the observations from this analysis, we propose two novel defenses against traffic fingerprinting attacks that provide privacy under more realistic constraints and at lower bandwidth overheads. Finally, we propose a robust framework for open set classification that targets network traffic with this added advantage of being more suitable for deployment in resource-constrained in-network devices

    Website Fingerprinting using Deep Learning

    Get PDF
    Website fingerprinting (WF) enables a local eavesdropper to determine which websites a user is visiting over an encrypted connection. State-of-the-art WF attacks have been shown to be effective even against Tor. Recently, lightweight WF defenses for Tor have been proposed that substantially degrade existing attacks: WTF-PAD and Walkie-Talkie. In this work, we explore the impact of recent advances in deep learning on WF attacks and defenses. We first present Deep Fingerprinting (DF), a new WF attack based on deep learning, and we evaluate this attack against WTF-PAD and Walkie-Talkie. The DF attack attains over 98% accuracy on Tor traffic without defenses, making it the state-of-the-art WF attack at the time of publishing this work. DF is the only attack that is effective against WTF-PAD with over 90% accuracy, and against Walkie-Talkie, DF achieves a top-2 accuracy of 98%. In the more realistic open-world setting, our attack remains effective. These findings highlight the need for defenses that protect against attacks like DF that use advanced deep learning techniques. Since DF requires large amounts of training data that is regularly updated, some may argue that is it is not practical for the weaker attacker model typically assumed in WF. Additionally, most WF attacks make strong assumptions about the testing and training data have similar distributions and being collected from the same type of network at about the same time. Thus, we next examine ways that an attacker could reduce the difficulty of performing an attack by leveraging N-shot learning, in which just a few training samples are needed to identify a given class. In particular, we propose a new WF attack called Triplet Fingerprinting (TF) that uses triplet networks for N-shot learning. We evaluate this attack in challenging settings such as where the training and testing data are from multiple years apart and collected on different networks, and we find that the TF attack remains effective in such settings with 85% accuracy or better. We also show that the TF attack is also effective in the open world and outperforms transfer learning. Finally, in response to the DF and TF attacks, we propose the CAM-Pad defense: a novel WF defense utilizing the Grad-CAM visual explanation technique. Grad-CAM can be used to identify regions of particular sensitivity in the data and provide insight into the features that the model has learned, providing more understanding about how the DF attack makes its prediction. The defense is based on a dynamic flow-padding defense, making it practical for deployment in Tor. The defense can reduce the attacker\u27s accuracy using the DF attack from 98% to 67%, which is much better than the WTF-PAD defense, with a packet overhead of approximately 80%

    Information Leakage Attacks and Countermeasures

    Get PDF
    The scientific community has been consistently working on the pervasive problem of information leakage, uncovering numerous attack vectors, and proposing various countermeasures. Despite these efforts, leakage incidents remain prevalent, as the complexity of systems and protocols increases, and sophisticated modeling methods become more accessible to adversaries. This work studies how information leakages manifest in and impact interconnected systems and their users. We first focus on online communications and investigate leakages in the Transport Layer Security protocol (TLS). Using modern machine learning models, we show that an eavesdropping adversary can efficiently exploit meta-information (e.g., packet size) not protected by the TLS’ encryption to launch fingerprinting attacks at an unprecedented scale even under non-optimal conditions. We then turn our attention to ultrasonic communications, and discuss their security shortcomings and how adversaries could exploit them to compromise anonymity network users (even though they aim to offer a greater level of privacy compared to TLS). Following up on these, we delve into physical layer leakages that concern a wide array of (networked) systems such as servers, embedded nodes, Tor relays, and hardware cryptocurrency wallets. We revisit location-based side-channel attacks and develop an exploitation neural network. Our model demonstrates the capabilities of a modern adversary but also presents an inexpensive tool to be used by auditors for detecting such leakages early on during the development cycle. Subsequently, we investigate techniques that further minimize the impact of leakages found in production components. Our proposed system design distributes both the custody of secrets and the cryptographic operation execution across several components, thus making the exploitation of leaks difficult

    Towards More Effective Traffic Analysis in the Tor Network.

    Get PDF
    University of Minnesota Ph.D. dissertation. February 2021. Major: Computer Science. Advisor: Nicholas Hopper. 1 computer file (PDF); xiii, 161 pages.Tor is perhaps the most well-known anonymous network, used by millions of daily users to hide their sensitive internet activities from servers, ISPs, and potentially, nation-state adversaries. Tor provides low-latency anonymity by routing traffic through a series of relays using layered encryption to prevent any single entity from learning the source and destination of a connection through the content alone. Nevertheless, in low-latency anonymity networks, the timing and volume of traffic sent between the network and end systems (clients and servers) can be used for traffic analysis. For example, recent work applying traffic analysis to Tor has focused on website fingerprinting, which can allow an attacker to identify which website a client has downloaded based on the traffic between the client and the entry relay. Along with website fingerprinting, end-to-end flow correlation attacks have been recognized as the core traffic analysis in Tor. This attack assumes that an adversary observes traffic flows entering the network (Tor flow) and leaving the network (exit flow) and attempts to correlate these flows by pairing each user with a likely destination. The research in this thesis explores the extent to which the traffic analysis technique can be applied to more sophisticated fingerprinting scenarios using state-of-the-art machine-learning algorithms and deep learning techniques. The thesis breaks down four research problems. First, the applicability of machine-learning-based website fingerprinting is examined to a search query keyword fingerprinting and improve the applicability by discovering new features. Second, a variety of fingerprinting applications are introduced using deep-learning-based website fingerprinting. Third, the work presents data-limited fingerprinting by leveraging a generative deep-learning technique called a generative adversarial network that can be optimized in scenarios with limited amounts of training data. Lastly, a novel deep-learning architecture and training strategy are proposed to extract features of highly correlated Tor and exit flow pairs, which will reduce the number of false positives between pairs of flows
    • …
    corecore