56 research outputs found
Streaming Temporal Graphs: Subgraph Matching
We investigate solutions to subgraph matching within a temporal stream of
data. We present a high-level language for describing temporal subgraphs of
interest, the Streaming Analytics Language (SAL). SAL programs are translated
into C++ code that is run in parallel on a cluster. We call this implementation
of SAL the Streaming Analytics Machine (SAM). SAL programs are succinct,
requiring about 20 times fewer lines of code than using the SAM library
directly, or writing an implementation using Apache Flink. To benchmark SAM we
calculate finding temporal triangles within streaming netflow data. Also, we
compare SAM to an implementation written for Flink. We find that SAM is able to
scale to 128 nodes or 2560 cores, while Apache Flink has max throughput with 32
nodes and degrades thereafter. Apache Flink has an advantage when triangles are
rare, with max aggregate throughput for Flink at 32 nodes greater than the max
achievable rate of SAM. In our experiments, when triangle occurrence was faster
than five per second per node, SAM performed better. Both frameworks may miss
results due to latencies in network communication. SAM consistently reported an
average of 93.7% of expected results while Flink decreases from 83.7% to 52.1%
as we increase to the maximum size of the cluster. Overall, SAM can obtain
rates of 91.8 billion netflows per day.Comment: Big Data 201
Preserving Both Privacy and Utility in Network Trace Anonymization
As network security monitoring grows more sophisticated, there is an
increasing need for outsourcing such tasks to third-party analysts. However,
organizations are usually reluctant to share their network traces due to
privacy concerns over sensitive information, e.g., network and system
configuration, which may potentially be exploited for attacks. In cases where
data owners are convinced to share their network traces, the data are typically
subjected to certain anonymization techniques, e.g., CryptoPAn, which replaces
real IP addresses with prefix-preserving pseudonyms. However, most such
techniques either are vulnerable to adversaries with prior knowledge about some
network flows in the traces, or require heavy data sanitization or
perturbation, both of which may result in a significant loss of data utility.
In this paper, we aim to preserve both privacy and utility through shifting the
trade-off from between privacy and utility to between privacy and computational
cost. The key idea is for the analysts to generate and analyze multiple
anonymized views of the original network traces; those views are designed to be
sufficiently indistinguishable even to adversaries armed with prior knowledge,
which preserves the privacy, whereas one of the views will yield true analysis
results privately retrieved by the data owner, which preserves the utility. We
present the general approach and instantiate it based on CryptoPAn. We formally
analyze the privacy of our solution and experimentally evaluate it using real
network traces provided by a major ISP. The results show that our approach can
significantly reduce the level of information leakage (e.g., less than 1\% of
the information leaked by CryptoPAn) with comparable utility
Fingerprinting the Smart Home: Detection of Smart Assistants Based on Network Activity
As the concept of the Smart Home is being embraced globally, IoT devices such as the Amazon Echo, Google Home, and Nest Thermostat are becoming a part of more and more households. In the data-driven world we live in today, internet service providers (ISPs) and companies are collecting large amounts of data and using it to learn about their customers. As a result, it is becoming increasingly important to understand what information ISPs are capable of collecting. IoT devices in particular exhibit distinct behavior patterns and specific functionality which make them especially likely to reveal sensitive information. Collection of this data provides valuable information and can have some serious privacy implications. In this work I present an approach to fingerprinting IoT devices behind private networks while only examining last-mile internet traffic . Not only does this attack only rely on traffic that would be available to an ISP, it does not require changes to existing infrastructure. Further, it does not rely on packet contents, and therefore works despite encryption. Using a database of 64 million packets logged over 15 weeks I was able to train machine learning models to classify the Amazon Echo Dot, Amazon Echo Show, Eufy Genie, and Google Home consistently. This approach combines unsupervised and supervised learning and achieves a precision of 99.95\%, equating to one false positive per 2,000 predictions. Finally, I discuss the implication of identifying devices within a home
Immunology Inspired Detection of Data Theft from Autonomous Network Activity
The threat of data theft posed by self-propagating, remotely controlled bot malware is increasing. Cyber criminals are motivated to steal sensitive data, such as user names, passwords, account numbers, and credit card numbers, because these items can be parlayed into cash. For anonymity and economy of scale, bot networks have become the cyber criminal’s weapon of choice. In 2010 a single botnet included over one million compromised host computers, and one of the largest botnets in 2011 was specifically designed to harvest financial data from its victims. Unfortunately, current intrusion detection methods are unable to effectively detect data extraction techniques employed by bot malware. The research described in this Dissertation Report addresses that problem. This work builds on a foundation of research regarding artificial immune systems (AIS) and botnet activity detection. This work is the first to isolate and assess features derived from human computer interaction in the detection of data theft by bot malware and is the first to report on a novel use of the HTTP protocol by a contemporary variant of the Zeus bot
Cleaning Your House First: Shifting the Paradigm on How to Secure Networks
International audienceThe standard paradigm when securing networks is to filter ingress traffic to the domain to be protected. Even though many tools and techniques have been developed and employed over the recent years for this purpose, we are still far from having secure networks. In this work, we propose a paradigm shift on the way we secure networks, by investigating whether it would not be efficient to filter egress traffic as well. The main benefit of this approach is the possibility to mitigate malicious activities before they reach the Internet. To evaluate our proposal, we have developed a prototype and conducted experiments using NetFlow data from the University of Twente
I Know Why You Went to the Clinic: Risks and Realization of HTTPS Traffic Analysis
Revelations of large scale electronic surveillance and data mining by
governments and corporations have fueled increased adoption of HTTPS. We
present a traffic analysis attack against over 6000 webpages spanning the HTTPS
deployments of 10 widely used, industry-leading websites in areas such as
healthcare, finance, legal services and streaming video. Our attack identifies
individual pages in the same website with 89% accuracy, exposing personal
details including medical conditions, financial and legal affairs and sexual
orientation. We examine evaluation methodology and reveal accuracy variations
as large as 18% caused by assumptions affecting caching and cookies. We present
a novel defense reducing attack accuracy to 27% with a 9% traffic increase, and
demonstrate significantly increased effectiveness of prior defenses in our
evaluation context, inclusive of enabled caching, user-specific cookies and
pages within the same website
- …