7 research outputs found
The Challenges of Effectively Anonymizing Network Data
The availability of realistic network data plays a significant role in fostering collaboration and ensuring U.S. technical leadership in network security research. Unfortunately, a host of technical, legal, policy, and privacy issues limit the ability of operators to produce datasets for information security testing. In an effort to help overcome these limitations, several data collection efforts (e.g., CRAWDAD[14], PREDICT [34]) have been established in the past few years. The key principle used in all of these efforts to assure low-risk, high-value data is that of trace anonymization—the process of sanitizing data before release so that potentially sensitive information cannot be extracted
Preserving Both Privacy and Utility in Network Trace Anonymization
As network security monitoring grows more sophisticated, there is an
increasing need for outsourcing such tasks to third-party analysts. However,
organizations are usually reluctant to share their network traces due to
privacy concerns over sensitive information, e.g., network and system
configuration, which may potentially be exploited for attacks. In cases where
data owners are convinced to share their network traces, the data are typically
subjected to certain anonymization techniques, e.g., CryptoPAn, which replaces
real IP addresses with prefix-preserving pseudonyms. However, most such
techniques either are vulnerable to adversaries with prior knowledge about some
network flows in the traces, or require heavy data sanitization or
perturbation, both of which may result in a significant loss of data utility.
In this paper, we aim to preserve both privacy and utility through shifting the
trade-off from between privacy and utility to between privacy and computational
cost. The key idea is for the analysts to generate and analyze multiple
anonymized views of the original network traces; those views are designed to be
sufficiently indistinguishable even to adversaries armed with prior knowledge,
which preserves the privacy, whereas one of the views will yield true analysis
results privately retrieved by the data owner, which preserves the utility. We
present the general approach and instantiate it based on CryptoPAn. We formally
analyze the privacy of our solution and experimentally evaluate it using real
network traces provided by a major ISP. The results show that our approach can
significantly reduce the level of information leakage (e.g., less than 1\% of
the information leaked by CryptoPAn) with comparable utility
Reference models for network trace anonymization
Network security research can benefit greatly from testing environments that are capable of generating realistic, repeatable and configurable background traffic. In order to conduct network security experiments on systems such as Intrusion Detection Systems and Intrusion Prevention Systems, researchers require isolated testbeds capable of recreating actual network environments, complete with infrastructure and traffic details. Unfortunately, due to privacy and flexibility concerns, actual network traffic is rarely shared by organizations as sensitive information, such as IP addresses, device identity and behavioral information can be inferred from the traffic. Trace data anonymization is one solution to this problem. The research community has responded to this sanitization problem with anonymization tools that aim to remove sensitive information from network traces, and attacks on anonymized traces that aim to evaluate the efficacy of the anonymization schemes. However there is continued lack of a comprehensive model that distills all elements of the sanitization problem in to a functional reference model.;In this thesis we offer such a comprehensive functional reference model that identifies and binds together all the entities required to formulate the problem of network data anonymization. We build a new information flow model that illustrates the overly optimistic nature of inference attacks on anonymized traces. We also provide a probabilistic interpretation of the information model and develop a privacy metric for anonymized traces. Finally, we develop the architecture for a highly configurable, multi-layer network trace collection and sanitization tool. In addition to addressing privacy and flexibility concerns, our architecture allows for uniformity of anonymization and ease of data aggregation
Building Cross-Cluster Security Models for Edge-Core Environments Involving Multiple Kubernetes Clusters
With the emergence of 5G networks and their large scale applications such as IoT and autonomous vehicles, telecom operators are increasingly offloading the computation closer to customers (i.e., on the edge). Such edge-core environments usually involve multiple Kubernetes clusters potentially owned by different providers. Confidentiality and privacy concerns could prevent those providers from sharing data freely with each other, which makes it challenging to perform common security tasks such as security verification and attack/anomaly detection across different clusters. In this work, we propose CCSM, a solution for building cross-cluster security models to enable various security analyses, while preserving confidentiality and privacy for each cluster. We design a six-step methodology to model both the cross-cluster communication and cross-cluster event dependency, and we apply those models to different security use cases. We implement our solution based on a 5G edge-core environment that involves multiple Kubernetes clusters, and our experi�mental results demonstrate its efficiency (e.g., less than 8 s of processing time for a model with 3,600 edges and nodes) and accuracy (e.g., more than 96% for cross-cluster event prediction
Protecting Audit Data using Segmentation-Based Anonymization for Multi-Tenant Cloud Auditing (SEGGUARD)
With the rise of security concerns regarding cloud computing, the importance of security auditing, conducted either in-house or by a third party, has become evident more than ever. However, the input data required for auditing a multi-tenant cloud environment typically contains sensitive information, such as the topology of the underlying cloud infrastructure. Additionally, audit results intended for one tenant may unexpectedly reveal private information, such as unpatched security flaws, about other tenants. How to anonymize audit data and results in order to prevent such information leakage is a novel challenge that has received little attention. Directly applying most existing anonymization techniques to such a context would either lead to insufficient protection or render the data unsuitable for auditing. In this thesis, we propose SegGuard, a novel anonymization approach that protects the sensitive information in both the audit data and auditing results, while assuring the data utility for effective auditing. Specifically, SegGuard prevents cross-tenant information leakage through per-tenant encryption, and it prevents information leakage to auditors through an innovative way of applying property-preserving anonymization. We apply SegGuard on audit data collected from an OpenStack cloud, and evaluate its effectiveness and efficiency using both synthetic and real data. Our experimental results demonstrate that SegGuard can reduce information leakage to a negligible level (e.g., less than 1% for an adversary with 50% pre-knowledge) with a practical response time (e.g., 62 seconds to anonymize a cloud virtual infrastructure with 25,000 virtual machines)
Anonymization of Event Logs for Network Security Monitoring
A managed security service provider (MSSP) must collect security event logs from
their customers’ network for monitoring and cybersecurity protection. These logs
need to be processed by the MSSP before displaying it to the security operation
center (SOC) analysts. The employees generate event logs during their working hours
at the customers’ site. One challenge is that collected event logs consist of personally
identifiable information (PII) data; visible in clear text to the SOC analysts or any
user with access to the SIEM platform.
We explore how pseudonymization can be applied to security event logs to help
protect individuals’ identities from the SOC analysts while preserving data utility
when possible. We compare the impact of using different pseudonymization functions
on sensitive information or PII. Non-deterministic methods provide higher level of
privacy but reduced utility of the data.
Our contribution in this thesis is threefold. First, we study available architectures
with different threat models, including their strengths and weaknesses. Second, we
study pseudonymization functions and their application to PII fields; we benchmark
them individually, as well as in our experimental platform. Last, we obtain valuable
feedbacks and lessons from SOC analysts based on their experience.
Existing works[43, 44, 48, 39] are generally restricting to the anonymization of
the IP traces, which is only one part of the SOC analysts’ investigation of PCAP
files inspection. In one of the closest work[47], the authors provide useful, practical
anonymization methods for the IP addresses, ports, and raw logs
Novel Approaches to Preserving Utility in Privacy Enhancing Technologies
Significant amount of individual information are being collected and analyzed today through a wide variety of applications across different industries. While pursuing better utility by discovering
knowledge from the data, an individual’s privacy may be compromised during an analysis: corporate networks monitor their online behavior, advertising companies collect and share their private
information, and cybercriminals cause financial damages through security breaches. To this end,
the data typically goes under certain anonymization techniques, e.g., CryptoPAn [Computer Networks’04], which replaces real IP addresses with prefix-preserving pseudonyms, or Differentially
Private (DP) [ICALP’06] techniques which modify the answer to a query by adding a zero-mean
noise distributed according to, e.g., a Laplace distribution. Unfortunately, most such techniques
either are vulnerable to adversaries with prior knowledge, e.g., some network flows in the data, or
require heavy data sanitization or perturbation, both of which may result in a significant loss of data
utility. Therefore, the fundamental trade-off between privacy and utility (i.e., analysis accuracy) has
attracted significant attention in various settings [ICALP’06, ACM CCS’14]. In line with this track
of research, in this dissertation we aim to build utility-maximized and privacy-preserving tools for
Internet communications. Such tools can be employed not only by dissidents and whistleblowers,
but also by ordinary Internet users on a daily basis. To this end, we combine the development of
practical systems with rigorous theoretical analysis, and incorporate techniques from various disciplines such as computer networking, cryptography, and statistical analysis. During the research,
we proposed three different frameworks in some well-known settings outlined in the following.
First, we propose The Multi-view Approach to preserve both privacy and utility in network trace
anonymization, Second, The R2DP Approach which is a novel technique on differentially private
mechanism design with maximized utility, and Third, The DPOD Approach that is a novel framework on privacy preserving Anomaly detection in the outsourcing setting