56 research outputs found

    Streaming Temporal Graphs: Subgraph Matching

    Full text link
    We investigate solutions to subgraph matching within a temporal stream of data. We present a high-level language for describing temporal subgraphs of interest, the Streaming Analytics Language (SAL). SAL programs are translated into C++ code that is run in parallel on a cluster. We call this implementation of SAL the Streaming Analytics Machine (SAM). SAL programs are succinct, requiring about 20 times fewer lines of code than using the SAM library directly, or writing an implementation using Apache Flink. To benchmark SAM we calculate finding temporal triangles within streaming netflow data. Also, we compare SAM to an implementation written for Flink. We find that SAM is able to scale to 128 nodes or 2560 cores, while Apache Flink has max throughput with 32 nodes and degrades thereafter. Apache Flink has an advantage when triangles are rare, with max aggregate throughput for Flink at 32 nodes greater than the max achievable rate of SAM. In our experiments, when triangle occurrence was faster than five per second per node, SAM performed better. Both frameworks may miss results due to latencies in network communication. SAM consistently reported an average of 93.7% of expected results while Flink decreases from 83.7% to 52.1% as we increase to the maximum size of the cluster. Overall, SAM can obtain rates of 91.8 billion netflows per day.Comment: Big Data 201

    Preserving Both Privacy and Utility in Network Trace Anonymization

    Full text link
    As network security monitoring grows more sophisticated, there is an increasing need for outsourcing such tasks to third-party analysts. However, organizations are usually reluctant to share their network traces due to privacy concerns over sensitive information, e.g., network and system configuration, which may potentially be exploited for attacks. In cases where data owners are convinced to share their network traces, the data are typically subjected to certain anonymization techniques, e.g., CryptoPAn, which replaces real IP addresses with prefix-preserving pseudonyms. However, most such techniques either are vulnerable to adversaries with prior knowledge about some network flows in the traces, or require heavy data sanitization or perturbation, both of which may result in a significant loss of data utility. In this paper, we aim to preserve both privacy and utility through shifting the trade-off from between privacy and utility to between privacy and computational cost. The key idea is for the analysts to generate and analyze multiple anonymized views of the original network traces; those views are designed to be sufficiently indistinguishable even to adversaries armed with prior knowledge, which preserves the privacy, whereas one of the views will yield true analysis results privately retrieved by the data owner, which preserves the utility. We present the general approach and instantiate it based on CryptoPAn. We formally analyze the privacy of our solution and experimentally evaluate it using real network traces provided by a major ISP. The results show that our approach can significantly reduce the level of information leakage (e.g., less than 1\% of the information leaked by CryptoPAn) with comparable utility

    Fingerprinting the Smart Home: Detection of Smart Assistants Based on Network Activity

    Get PDF
    As the concept of the Smart Home is being embraced globally, IoT devices such as the Amazon Echo, Google Home, and Nest Thermostat are becoming a part of more and more households. In the data-driven world we live in today, internet service providers (ISPs) and companies are collecting large amounts of data and using it to learn about their customers. As a result, it is becoming increasingly important to understand what information ISPs are capable of collecting. IoT devices in particular exhibit distinct behavior patterns and specific functionality which make them especially likely to reveal sensitive information. Collection of this data provides valuable information and can have some serious privacy implications. In this work I present an approach to fingerprinting IoT devices behind private networks while only examining last-mile internet traffic . Not only does this attack only rely on traffic that would be available to an ISP, it does not require changes to existing infrastructure. Further, it does not rely on packet contents, and therefore works despite encryption. Using a database of 64 million packets logged over 15 weeks I was able to train machine learning models to classify the Amazon Echo Dot, Amazon Echo Show, Eufy Genie, and Google Home consistently. This approach combines unsupervised and supervised learning and achieves a precision of 99.95\%, equating to one false positive per 2,000 predictions. Finally, I discuss the implication of identifying devices within a home

    Immunology Inspired Detection of Data Theft from Autonomous Network Activity

    Get PDF
    The threat of data theft posed by self-propagating, remotely controlled bot malware is increasing. Cyber criminals are motivated to steal sensitive data, such as user names, passwords, account numbers, and credit card numbers, because these items can be parlayed into cash. For anonymity and economy of scale, bot networks have become the cyber criminal’s weapon of choice. In 2010 a single botnet included over one million compromised host computers, and one of the largest botnets in 2011 was specifically designed to harvest financial data from its victims. Unfortunately, current intrusion detection methods are unable to effectively detect data extraction techniques employed by bot malware. The research described in this Dissertation Report addresses that problem. This work builds on a foundation of research regarding artificial immune systems (AIS) and botnet activity detection. This work is the first to isolate and assess features derived from human computer interaction in the detection of data theft by bot malware and is the first to report on a novel use of the HTTP protocol by a contemporary variant of the Zeus bot

    Cleaning Your House First: Shifting the Paradigm on How to Secure Networks

    Get PDF
    International audienceThe standard paradigm when securing networks is to filter ingress traffic to the domain to be protected. Even though many tools and techniques have been developed and employed over the recent years for this purpose, we are still far from having secure networks. In this work, we propose a paradigm shift on the way we secure networks, by investigating whether it would not be efficient to filter egress traffic as well. The main benefit of this approach is the possibility to mitigate malicious activities before they reach the Internet. To evaluate our proposal, we have developed a prototype and conducted experiments using NetFlow data from the University of Twente

    I Know Why You Went to the Clinic: Risks and Realization of HTTPS Traffic Analysis

    Full text link
    Revelations of large scale electronic surveillance and data mining by governments and corporations have fueled increased adoption of HTTPS. We present a traffic analysis attack against over 6000 webpages spanning the HTTPS deployments of 10 widely used, industry-leading websites in areas such as healthcare, finance, legal services and streaming video. Our attack identifies individual pages in the same website with 89% accuracy, exposing personal details including medical conditions, financial and legal affairs and sexual orientation. We examine evaluation methodology and reveal accuracy variations as large as 18% caused by assumptions affecting caching and cookies. We present a novel defense reducing attack accuracy to 27% with a 9% traffic increase, and demonstrate significantly increased effectiveness of prior defenses in our evaluation context, inclusive of enabled caching, user-specific cookies and pages within the same website
    • …
    corecore